Wasting time with gawk while parsing lsof output

So I wanted to parse lsof, to see on what ports was a machine accepting connections. Normally one would write something like:

# lsof -Pn -i | grep LISTEN | awk '{print $9}' | cut -d: -f2 | sort -n | uniq
22
111
6066
7011
7015
7077
8080
10050
35735
37480
39118
44262
44444
52539

You get a sorted list of the open ports and are done with it. But why invoke four different programs to do extraction and sorting, when gawk is a complete programming language? Yes it is possible to do it with gawk in one go (and learn something in the process):

# lsof -Pn -i | awk '/LISTEN/ { split($9, a, ":"); b[a[2]] = 1; } END { n = asorti(b, c, "@ind_num_asc"); for (i = 1; i <= n; i++) { print c[i]; } }'
22
111
6066
7011
7015
7077
8080
10050
35735
37480
39118
44262
44444
52539

The /LISTEN/ effectively greps the lsof output for lines containing LISTEN and executes on them the code in curly braces to its right. Which splits the 9th column into an array using : as a delimiter. In awk arrays are indexed from 1 and the indices are strings (make a note of that).

END is a special match that executes the code in curly braces to its right after we’ve finished reading the input data. So, here is where the printing is done. Using the asorti() function we obtain a new array, indexed based on the values of the indices. We use @ind_num_asc to ensure that the order is 1, 5, 10, 15 and not 1, 10, 15, 5 as it would, should the indices be treated as strings. Finally, we can print the elements from the new array.

This would not be easily possible with awk / nawk, because as the gawk manual says:

In most awk implementations, sorting an array requires writing a sort() function. This can be educational for exploring different sorting algorithms, but usually that’s not the point of the program. gawk provides the built-in asort() and asorti() functions.

Somehow this reminds me of Knuth vs McIlory but of course I am neither.

cursive

Figlet, a program to make large letters out of ordinary text was featured in Lobsters and it reminded me of cursive, a similar program that we used back when ASCII art was the only decoration one could add to their email signature. And since it seems that cursive is not carried by any Linux distribution, I set out to find the sources. Which I did thanks to FreeBSD.

I downloaded the source and tried to compile it, and it required xstr. Yet another old program! I think the last time one can see it, is OpenBSD-5.5 manual. I mean even the 5.6 changelog writes:

“mkstr was intended for the limited architecture of the PDP 11 family.” Time moves on, memory gets cheaper. There’s no need for mkstr or xstr.

Hacks like xstr and mkstr might be of interest to embedded, or otherwise deprived, systems people as cool hacks and peculiarities, but really no actual need to maintain them.

So I set out to make the cursive source compilable with xstr again (it compiles without it just fine, just type make lcursive instead). And the minimally changed source code, to eradicate compiler warnings, of a program that can happily sign your email since 1985 is now on Github:

   __
  /  `
 /--   ____  o ____  ,
(___, / / (_/_(_) (_/___
           /       /
         -'       '

Shoutouts to my good friend Panagiotis C. who was the person who showed me cursive and figlet some 25 years ago or so.

In sed matching \d might not be what you would expect

A friend asked me the other day whether a certain “search and replace” operation over a credit card number could be done with sed: Given a number like 5105 1051 0510 5100, replace the first three components with something and leave the last one intact.

So my first take on this was:

# echo 5105 1051 0510 5100 | sed -e 's/^\([0-9]\{4\} \)\{3\}/lala /'
lala 5100

which works, but is not very legible. So here is taking advantage of the -r flag, if your modern sed supports it:

# echo 5105 1051 0510 5100 | sed -re 's/^([[:digit:]]{4} ){3}/lala /' 
lala 5100

So my friend asked, why not use \d instead of [[:digit:]] (or even [0-9])?

# echo 5105 1051 0510 5100 | sed -re 's/^(\d{4} ){3}/lala /' 
5105 1051 0510 5100

Why does this not work? Because as it is pointed in the manual:

In addition, this version of sed supports several escape characters (some of which are multi-character) to insert non-printable characters in scripts (\a, \c, \d, \o, \r, \t, \v, \x). These can cause similar problems with scripts written for other seds.

There. I guess that is why I still do not make much use of the -r flag and prefer to escape parentheses when doing matches in sed.

a newbie does list comprehensions

Formatting this post in WordPress.com was a great pain. It does not render correctly on some browser / device combinations, despite my rewrite efforts. So a Markdown copy of this post can be found as a gist here.


The year is 1998 and @mtheofy then at Glasgow tells me about a relatively new (then) language called Haskell. I’m intrigued but do not do much. A few years later I buy The Haskell School of Expression since The Craft of Functional Programming did not seem enough to motivate me. Time passes and around 2007 I try yet another start. Nothing. I promised my self yet another restart for a 2017 new year’s resolution. Still nothing. So when the current employer offered Haskell classes I could not say no. Armed with the weekly classes and a Safari Learning Path I am trying to correct this. And I am having some fun with list comprehensions. Because as a friend says, if it makes you feel good, go.

So how do you write an infinite list? Let’s say you want list x to include all numbers from 0 to infinity. stack ghci is my friend. Others might try repl.it:

x = [ n | n <- [0..]]

Now you can have the first 20 items of x:

Prelude> x = [ n | n <- [0..]]
Prelude> take 20 x
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
Prelude>

So next I wanted to make an infinite list of the same character. Enter the underscore variable:

Prelude> x = [ 'a' | _ <- [0..]]
Prelude> take 20 x
"aaaaaaaaaaaaaaaaaaaa"
Prelude>

OK, so now let’s try to cycle infinitely characters from a string. I end up with:

Prelude> x = [ c | i  take 20 x
"abcdabcdabcdabcdabcd"
Prelude>

I am kind of unsure why the let statements are needed since I am ~10 days into typing stuff and posted my creation to twitter. What my expression says is that x is comprised of characters from string “abcd”, where given a sequence of numbers, each time a character is chosen based on the sequence number modulo 4. Strings are lists of characters in Haskell and list indexing starts from zero.  Helpful comments come my way. Like the obvious cycle (there is a cycle function? Yes ):

Prelude> take 20 (cycle "abcd")
"abcdabcdabcdabcdabcd"
Prelude> take 20 $ cycle "abcd"
"abcdabcdabcdabcdabcd"
Prelude>

Is not the dollar operator nice to get rid of parentheses? Here is another suggestion about cycling a string:

Prelude> x = [ "abcd" !! (i `mod` 4) | i  take 20 x
"abcdabcdabcdabcdabcd"
Prelude>

This one is more concise and does the same thing, always picking a character from "abcd". If the infix notation for mod confuses you, you can:

Prelude> x = [ "abcd" !! (mod i 4) | i  take 20 x
"abcdabcdabcdabcdabcd"
Prelude>

But the Internet does not stop there. It comes back with more helpful suggestions:

Welcome! A little feedback then if I may: the !! operator should be used VERY cautiously it is not typesafe and lists are not random access anyway. Opt for a function returning Maybe x and for a random access datastructure (strings are by default lists).

Which made me think: How about an infinite string randomly chosen from “abcd”?

$ stack install random
$ stack ghci
:
Prelude> import System.Random
Prelude System.Random> g <- newStdGen 
Prelude System.Random> x = [ "abcd" !! i | i <- randomRs (0,3) g ]
Prelude System.Random> take 10 x
"bcbbddcdab"
Prelude System.Random>

If you want a sequence with a different order, you need to reinitialise both g and x. I will figure out a better way some other time when …I have time.

Adventures with Maybe maybe in another post.

Formatting this post in WordPress.com was a great pain.

resolutions

Last year I promised myself that I would revisit Haskell. Well I did not, so I did not escape the new year’s resolutions cliche. It was an interesting year though, considering that I left my country, worked for Intel, resigned and returned back to Greece and to my previous work.

So for this year I will promise myself something simpler, as a continuation of things I still do in 2017: simply improve my Go-fu. And yes, I also tried to learn Go and miserably failed. Let’s see about that too.

Parsing Techniques – A Practical Guide

Memory gets triggered in the most unexpected ways. I maintain a fairly large library of printed and electronic books (most of them DRMed -the light cases socially, kindle and Adobe locked the rest unfortunately) on subjects that interest me. It is fairly evident that I will not read them all, but I always have a book (and sometimes a paper) to recommend to a friend that has a problem. It seems that I am not the only one that thinks that personal libraries are supposed to be full of unread books.

Anyway, I was listening to Podcast.__init__ Episode 95 and one of the guests mentioned Parsing Techniques – A Practical Guide by Grune, I think it was when they touched Earley parsers and how most books about parsing do not really touch on how the actual parser is built. Wait a minute I’ve got that PDF! And you can go to the author’s site and download it. And you know what? There is a second edition out. For > 100 euros for a DRMed PDF I may not buy it since parsing is definitely not my thing, but somebody else out there might need the second edition. Judging from my skimming of the first edition, this is close to the encyclopaedia of parsing. I will go through some pages tonight.

Just for a refresher.