In sed matching \d might not be what you would expect

A friend asked me the other day whether a certain “search and replace” operation over a credit card number could be done with sed: Given a number like 5105 1051 0510 5100, replace the first three components with something and leave the last one intact.

So my first take on this was:

# echo 5105 1051 0510 5100 | sed -e 's/^\([0-9]\{4\} \)\{3\}/lala /'
lala 5100

which works, but is not very legible. So here is taking advantage of the -r flag, if your modern sed supports it:

# echo 5105 1051 0510 5100 | sed -re 's/^([[:digit:]]{4} ){3}/lala /' 
lala 5100

So my friend asked, why not use \d instead of [[:digit:]] (or even [0-9])?

# echo 5105 1051 0510 5100 | sed -re 's/^(\d{4} ){3}/lala /' 
5105 1051 0510 5100

Why does this not work? Because as it is pointed in the manual:

In addition, this version of sed supports several escape characters (some of which are multi-character) to insert non-printable characters in scripts (\a, \c, \d, \o, \r, \t, \v, \x). These can cause similar problems with scripts written for other seds.

There. I guess that is why I still do not make much use of the -r flag and prefer to escape parentheses when doing matches in sed.

Advertisements

a newbie does list comprehensions

Formatting this post in WordPress.com was a great pain. It does not render correctly on some browser / device combinations, despite my rewrite efforts. So a Markdown copy of this post can be found as a gist here.


The year is 1998 and @mtheofy then at Glasgow tells me about a relatively new (then) language called Haskell. I’m intrigued but do not do much. A few years later I buy The Haskell School of Expression since The Craft of Functional Programming did not seem enough to motivate me. Time passes and around 2007 I try yet another start. Nothing. I promised my self yet another restart for a 2017 new year’s resolution. Still nothing. So when the current employer offered Haskell classes I could not say no. Armed with the weekly classes and a Safari Learning Path I am trying to correct this. And I am having some fun with list comprehensions. Because as a friend says, if it makes you feel good, go.

So how do you write an infinite list? Let’s say you want list x to include all numbers from 0 to infinity. stack ghci is my friend. Others might try repl.it:

x = [ n | n <- [0..]]

Now you can have the first 20 items of x:

Prelude> x = [ n | n <- [0..]]
Prelude> take 20 x
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
Prelude>

So next I wanted to make an infinite list of the same character. Enter the underscore variable:

Prelude> x = [ 'a' | _ <- [0..]]
Prelude> take 20 x
"aaaaaaaaaaaaaaaaaaaa"
Prelude>

OK, so now let’s try to cycle infinitely characters from a string. I end up with:

Prelude> x = [ c | i  take 20 x
"abcdabcdabcdabcdabcd"
Prelude>

I am kind of unsure why the let statements are needed since I am ~10 days into typing stuff and posted my creation to twitter. What my expression says is that x is comprised of characters from string “abcd”, where given a sequence of numbers, each time a character is chosen based on the sequence number modulo 4. Strings are lists of characters in Haskell and list indexing starts from zero.  Helpful comments come my way. Like the obvious cycle (there is a cycle function? Yes ):

Prelude> take 20 (cycle "abcd")
"abcdabcdabcdabcdabcd"
Prelude> take 20 $ cycle "abcd"
"abcdabcdabcdabcdabcd"
Prelude>

Is not the dollar operator nice to get rid of parentheses? Here is another suggestion about cycling a string:

Prelude> x = [ "abcd" !! (i `mod` 4) | i  take 20 x
"abcdabcdabcdabcdabcd"
Prelude>

This one is more concise and does the same thing, always picking a character from "abcd". If the infix notation for mod confuses you, you can:

Prelude> x = [ "abcd" !! (mod i 4) | i  take 20 x
"abcdabcdabcdabcdabcd"
Prelude>

But the Internet does not stop there. It comes back with more helpful suggestions:

Welcome! A little feedback then if I may: the !! operator should be used VERY cautiously it is not typesafe and lists are not random access anyway. Opt for a function returning Maybe x and for a random access datastructure (strings are by default lists).

Which made me think: How about an infinite string randomly chosen from “abcd”?

$ stack install random
$ stack ghci
:
Prelude> import System.Random
Prelude System.Random> g <- newStdGen 
Prelude System.Random> x = [ "abcd" !! i | i <- randomRs (0,3) g ]
Prelude System.Random> take 10 x
"bcbbddcdab"
Prelude System.Random>

If you want a sequence with a different order, you need to reinitialise both g and x. I will figure out a better way some other time when …I have time.

Adventures with Maybe maybe in another post.

Formatting this post in WordPress.com was a great pain.

resolutions

Last year I promised myself that I would revisit Haskell. Well I did not, so I did not escape the new year’s resolutions cliche. It was an interesting year though, considering that I left my country, worked for Intel, resigned and returned back to Greece and to my previous work.

So for this year I will promise myself something simpler, as a continuation of things I still do in 2017: simply improve my Go-fu. And yes, I also tried to learn Go and miserably failed. Let’s see about that too.

Parsing Techniques – A Practical Guide

Memory gets triggered in the most unexpected ways. I maintain a fairly large library of printed and electronic books (most of them DRMed -the light cases socially, kindle and Adobe locked the rest unfortunately) on subjects that interest me. It is fairly evident that I will not read them all, but I always have a book (and sometimes a paper) to recommend to a friend that has a problem. It seems that I am not the only one that thinks that personal libraries are supposed to be full of unread books.

Anyway, I was listening to Podcast.__init__ Episode 95 and one of the guests mentioned Parsing Techniques – A Practical Guide by Grune, I think it was when they touched Earley parsers and how most books about parsing do not really touch on how the actual parser is built. Wait a minute I’ve got that PDF! And you can go to the author’s site and download it. And you know what? There is a second edition out. For > 100 euros for a DRMed PDF I may not buy it since parsing is definitely not my thing, but somebody else out there might need the second edition. Judging from my skimming of the first edition, this is close to the encyclopaedia of parsing. I will go through some pages tonight.

Just for a refresher.

APL

Today it was my fourth (I believe) encounter with APL:

  • First, too many years ago when skimming through Tim Budd’s “The Kamin Interpreters in C++” (and the Kamin book afterwards). For the hardcore fans, Tim Budd has a book on implementing an APL compiler.
  • Next was a Dr Dobbs issue about the J programming language.
  • Some years a ago a comment on this blog about Dyalog.

Today it was Functional Geekery’s Episode 65 where I found out about the most interesting (to me) implementation of Conway’s Game of Life.