Tuesday, August 12, 2014

Read a file, extract contents encapsulated in parentheses

Sourced from http://objectmix.com/awk/26995-retrieve-string-between-parentheses.html

$ cat some_file | gawk '{if (match($0,/\((.*)\)/,f)) print f[1]}' | cut -d ':' -f1 | sort | uniq

The above command will be helpful when reviewing stack trace output from command line to filter out and identify all the referenced .java files listed.

For example:

$ cat stacktrace_1.txt | gawk '{if (match($0,/\((.*)\)/,f)) print f[1]}' | cut -d ':' -f1 | sort | uniq
FilterFileSystem.java
Job.java
JobSubmissionFiles.java
JobSubmitter.java
Native Method
ProcessBuilder.java
RawLocalFileSystem.java
Shell.java
Subject.java
ToolRunner.java
UserGroupInformation.java

Friday, August 8, 2014

Extract a list of tarballs from the pwd into their own separate folders


  •  Extract all .tar.gz files residing in current dir into their own folders based upon the tarball’s filename.
  • Recommended naming convention for .tar.gz file is:  hostname-otherdetails.tar.gz (NOTE:  required details is the first hyphen which is used to separate the fields, using the first field as the target directory where the corresponding tarball will be extracted to
  • To change the parsing separating character, change the hyphen reference in the cut -d portion of the statement to another character, such as an underscore— For example,  cut -d ‘_’ -f1 



$ for i in *.tar.gz; do mkdir `echo $i | cut -d '-' -f1`; tar xvfz $i -C `echo $i | cut -d '-' -f1`; done


Collecting MR1 task logs from TTs in a single command


$tar cvfzh /tmp/`hostname`-job_201407311402-tasklogs.tar.gz /var/log/hadoop-0.20-mapreduce/userlogs/job_201407311402_0001/*

Reason why we include the -h option in tar:

The following location typically includes symlinks to the actual location of the task logs, which typically resides in the working space of the TT:
/var/log/hadoop-0.20-mapreduce/userlogs/job_201407311402_0001/ 

$ls -l
lrwxrwxrwx 1 mapred mapred  81 Aug  8 07:54 attempt_201407311402_0001_m_000000_0 -> /mapred/local/userlogs/job_201407311402_0001/attempt_201407311402_0001_m_000000_0