Sourced from http://objectmix.com/awk/26995-retrieve-string-between-parentheses.html
$ cat some_file | gawk '{if (match($0,/\((.*)\)/,f)) print f[1]}' | cut -d ':' -f1 | sort | uniq
The above command will be helpful when reviewing stack trace output from command line to filter out and identify all the referenced .java files listed.
For example:
$ cat stacktrace_1.txt | gawk '{if (match($0,/\((.*)\)/,f)) print f[1]}' | cut -d ':' -f1 | sort | uniq
FilterFileSystem.java
Job.java
JobSubmissionFiles.java
JobSubmitter.java
Native Method
ProcessBuilder.java
RawLocalFileSystem.java
Shell.java
Subject.java
ToolRunner.java
UserGroupInformation.java
Tuesday, August 12, 2014
Friday, August 8, 2014
Extract a list of tarballs from the pwd into their own separate folders
- Extract all .tar.gz files residing in current dir into their own folders based upon the tarball’s filename.
- Recommended naming convention for .tar.gz file is: hostname-otherdetails.tar.gz (NOTE: required details is the first hyphen which is used to separate the fields, using the first field as the target directory where the corresponding tarball will be extracted to
- To change the parsing separating character, change the hyphen reference in the cut -d portion of the statement to another character, such as an underscore— For example, cut -d ‘_’ -f1
$ for i in *.tar.gz; do mkdir `echo $i | cut -d '-' -f1`; tar xvfz $i -C `echo $i | cut -d '-' -f1`; done
Collecting MR1 task logs from TTs in a single command
$tar cvfzh /tmp/`hostname`-job_201407311402-tasklogs.tar.gz /var/log/hadoop-0.20-mapreduce/userlogs/job_201407311402_0001/*
Reason why we include the -h option in tar:
The following location typically includes symlinks to the actual location of the task logs, which typically resides in the working space of the TT:
/var/log/hadoop-0.20-mapreduce/userlogs/job_201407311402_0001/
$ls -l
lrwxrwxrwx 1 mapred mapred 81 Aug 8 07:54 attempt_201407311402_0001_m_000000_0 -> /mapred/local/userlogs/job_201407311402_0001/attempt_201407311402_0001_m_000000_0