Tuesday, July 8, 2014

Multiply / Sum column data in awk/sed

Sourced from here:
http://unix.stackexchange.com/questions/115998/how-do-i-multiply-and-sum-column-data-using-awk-and-or-sed

I have several columns of data. I always have the same number of rows (say 5). In the 2nd column, I want to multipy the first value by 5, then the second value by 4, the third value by 3, etc. I then want to sum these values, and divide by the sum of the values in the second column. How would I do this in sed and/or awk?
Example:
4 5 7 1 2 3
5 1 2 3 1 2
4 2 3 6 1 2
3 4 1 6 3 3
2 3 1 2 1 6
Answer: (5*5 + 4*1 + 3*2 + 2*4 + 1*3)/(5 + 1 + 2 + 4 + 3) = 3.067


Solution 1:
Replace 6 with total (number of lines + 1) if needed:
awk '{mult+=$2*(6-NR); sum+=$2;} END {print mult/sum;}' yourfile.txt 
Displays: 3.06667


Solution 2:
If in case total number of lines are not known
$ cat file
4 5 7 1 2 3
5 1 2 3 1 2
4 2 3 6 1 2
3 4 1 6 3 3
2 3 1 2 1 6

$ awk 'FNR==NR{t = NR+1;next}{mult+=$2*(t-FNR);sum+=$2} 
       END{print mult/sum}' file{,}
3.06667
Explanation
awk 'FNR == NR{t=NR+1;next}
FNR variable --> Current line number in the current file
NR variable-->The total number of lines seen so far
The condition FNR == NR can only be true while awk is reading the first file of argument
t=NR+1 --> variable t holds line count +1 extra since it's required in our current context
next--> Stops the processing of the current input record and proceeds with the next input record
mult+=$2*(t-FNR) --> variable mult holds some of (5*(t-FNR) + ..... + 4*1 + 3*2 + 2*4 + 1*3)
sum+=$2 --> variable sum holds sum of column2
END--> AnEND rule is executed, once, after all the input has been read
print mult/sum --> Finally print mult/sum
file{,} --> you can also file file file but i used shortcut this bash trick you can say, try echo file{,} and echo file{,,} on your terminal.

No comments:

Post a Comment