So I wanted to parse lsof, to see on what ports was a machine accepting connections. Normally one would write something like:
# lsof -Pn -i | grep LISTEN | awk '{print $9}' | cut -d: -f2 | sort -n | uniq 22 111 6066 7011 7015 7077 8080 10050 35735 37480 39118 44262 44444 52539
You get a sorted list of the open ports and are done with it. But why invoke four different programs to do extraction and sorting, when gawk is a complete programming language? Yes it is possible to do it with gawk in one go (and learn something in the process):
# lsof -Pn -i | awk '/LISTEN/ { split($9, a, ":"); b[a[2]] = 1; } END { n = asorti(b, c, "@ind_num_asc"); for (i = 1; i <= n; i++) { print c[i]; } }' 22 111 6066 7011 7015 7077 8080 10050 35735 37480 39118 44262 44444 52539
The /LISTEN/
effectively greps the lsof output for lines containing LISTEN
and executes on them the code in curly braces to its right. Which splits the 9th column into an array using :
as a delimiter. In awk arrays are indexed from 1 and the indices are strings (make a note of that).
END
is a special match that executes the code in curly braces to its right after we’ve finished reading the input data. So, here is where the printing is done. Using the asorti() function we obtain a new array, indexed based on the values of the indices. We use @ind_num_asc
to ensure that the order is 1, 5, 10, 15 and not 1, 10, 15, 5 as it would, should the indices be treated as strings. Finally, we can print the elements from the new array.
This would not be easily possible with awk / nawk, because as the gawk manual says:
In most awk implementations, sorting an array requires writing a sort() function. This can be educational for exploring different sorting algorithms, but usually that’s not the point of the program. gawk provides the built-in asort() and asorti() functions.
Somehow this reminds me of Knuth vs McIlory but of course I am neither.