Cool thing I learned from twitter today:
For files larger than diff(1) can handle there exists diffh.c from Bell Labs (and has been around since the PDP-11 days). But even diffh is not always good enough. So why not use idiffh which works on any system with a C compiler and knows of no limits in file sizes and strings?
My review on “Algorithms on strings” (for which I’ve blogged before) for the ACM SIGACT News is out. There’s a typographical error though: I did not review “Algorithms on strings” by Dan Gusfield, but “Algorithms on strings” by Crochemore, Hancart and Lecroq.
PS: You can download the review PDF from Bill Gasarch’s site.
Update: The review entry is corrected in the ACM site: Like Bill Gasarch wrote to me: “There is no such thing as a final version of anything anymore!“
- Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, …)
- Regular Expression Matching: the Virtual Machine Approach
- Regular Expression Matching in the Wild
I knew about Russ Cox and his interest in regular expressions because of this link to a pdf copy of “Programming Techniques: Regular expression search algorithm” that I had found at his site. Somehow I had missed the articles. Using Ozan’s words “russ cox, like other top-notch cs people, takes a topic and nails it shut. these three papers are more valuable to me than any RE book”.
Yes the articles are that good. However the good news do not stop here. Russ Cox implemented a fast, safe, thread-friendly alternative to backtracking regular expression engines (like those used in PCRE, Perl, and Python) written in C++, called RE2. It even comes with a POSIX (egrep) mode.
The postmaster in me quickly thought of the possibility of implementing a milter that makes use of RE2, just like milter-regex uses traditional regex(3), but my time is so limited by other more pressing projects, that I can only wish that someone else undertakes such a task.
Time passed, I became a system administrator and most of my exposure to string matching was through scripts and sysadmin stuff automation. Automata are nice, but Perl and shell brought food to the table.
These memories surfaced because I got to read “Algorithms on Strings” in January thanks to Bill Gasarch. Complete, self-contained and with plain and well understood English, the book covers the subject fulfilling simultaneously the needs of those who want to just read the theory, those who want to see the proofs and those who just want to write code.
The pseudocode in the book is understood by anyone who has ever written a single program in C or Java. It either introduces new functions or makes use of others previously defined. This may make it a little difficult at first for people who need to write something described in, for example, chapter six and may find themselves reading from chapter one up to six. In this process the book manages to educate even the programmer who does not care about theory not only about how to do certain functions, but why they are done the way they are. As a plus, references to appropriate Unix shell tools (e.g. diff) are given when appropriate.
A really impressive book, definitely worth your time! A book that you can use both to learn about stuff and as a reference.