Ravi Mohan's Blog

Saturday, July 15, 2006

Algorithmics In The Dark Compiler Alley

My upcoming "write a compiler" project is about writing a industrial strength compiler. The good news is that it is wonderful to be in a position where you get paid to do something like this (vs say, writing yet another J2EE/dotNet/framework du jour [something that starts with "R"? :-P] "enterprise app"). The bad news is that I have to deliver a 2 person year project in about 6 person months ( 1 person - Yours Truly - and about 6 months starting in August).

One way to "shrink" such a project is to use powerful languages. A compiler, while quite complex, has very few library dependencies. What this means is that you can write a compiler in any language you want and not sacrifice any advantages while doing so. The converse of this is that one can abandon brain dead languages like say, Java, and use something truly powerful, like say, Haskell.

I've spent the last few weeks writing 'compiler like' bits and pieces in various languages, and have short listed 3 - Lisp, Scheme, and Haskell. Besides their power, these languages have another feature in common - none of them have an open source, batteries included IDE. (And "no IDE" implies "use emacs". And the sheer customizability of emacs is exhilarating but more on that in another post).

An interesting (and unexpected) development is that evaluating alternative designs calls for a degree of algorithmic sophistication. For example, in the code generation phase of the compiler the decision to use an RTL (gcc's intermediate language)clone vs (say) a BURS (bottom up rewriting system) based graph traversal needs the ability to (mathematically) analyse the competing code gen algorithms and the trade offs of each. The key to getting a grip on BURS, for e.g., is to understand Dynamic Programming. More interestingly, there is often a need to augment an existing algorithm or data structure to accomodate application specific constraints. The paradigm embodied by the language under consideration determines the "shape" of the runtime. The demands of the runtime and the level of optimization required determines the appropriateness of a particular code gen algorithm, which in turn influences the transformations required on the abstract syntax tree, and so on.

There aren't too many books which teach you to design algorithms and data structures (vs memorizing a few standard ones). Of course if you are an "enterprise" programmer, you don't need any of this and can get by with simple lists and hashtables provided by your language's libraries. But if you (intend to) work in more challenging (and, if I may say so, interesting) domains, learn to analyze and design algorithms (and data structures). It might even help you get an interesting and satisfying job. Companies like Google and Amazon have very algorithmics heavy interviews. As Cormen et al say in Introduction To Algorithms (the best book ever written on Algorithms and Data Structures),

"Having a solid base of algorithmic knowledge and technique is one characteristic that separates the truly skilled programmers from the novices. With modern computing technology, you can acccomplish some tasks without knowing much about algorithms, but with a good background in algorithms, you can do much, much more)".

Another interesting, though less comprehensive and more lightweight textbook is The Algorithm Design Manual by Steven Skiena. It is more of a complement to Cormen and Rivest than a substitute and has a distinctive writing style. Every chapter has an accompanying "war story", drawn from the author's extensive consulting experience, where he reports how he applied the technique taught in the chapter to solve a real world problem or optimize (sometimes in a very dramatic fashion) an inefficient solution. Imagine - algorithmics consulting - I guess they won't be outsourcing those jobs anytime soon :-D.

The one "problem" with the Cormen and Rivest book is that it calls for a certain level of mathematical sophistication on the part of the student. This is unavoidable in a book that avoids a "cookbook of algorithms" approach and tries to teach how to design new algorithms and adapt existing ones to teh problem space. So, before you buy this book ( and it IS a book that should be on every programmer's bookshelf) be sure to learn basic discrete mathematics, probability, linear algebra and set theory. Working through Concrete Mathematics really helps. The effort will repay itself a thousand fold.

But be warned. Once you taste the power of algorithms you'll never be able to go back to your "build (or even worse, maintain) yet another Visual Blub dotNet banking app in Bangalore" day job. On the other hand, you may find that programming is challenging, meaningful, and above all, fun. Red Pill or Blue? :-)

11 comments:

Anonymous said...

Interesting Blog !

This compiler project your working on, Is it opensource ? or for your eyes ony...

S said...

Haven't gotten around to reading it, but David Hanson's A Retargetable C Compiler : Design and Implementation is supposed to good also. (The compiler itself is presented in the book as a literate program!)

http://www.amazon.com/gp/product/0805316701/sr=1-1/qid=1152991316/ref=sr_1_1/103-1087370-5784606?ie=UTF8&s=books

BTW, how do you shorten those Amazon.com URL's?

Ravi said...

@anonymous,

I hope to open source the compiler when I am done, but there are a few stakeholders who are nervous about the idea, so we'll have to see. I am fairly optimistic though.

Also, I am working on more than one compiler simultaneously, so *a* compiler will be open sourced. :-)

@S,
Thanks. I have this book on my desk as I type this :-).It is falling apart from heavy use - I've read it cover to cover many times over the last couple of years

There are significant differences between the version of lcc described in the book and the latest version. Also ,the lburg subsystem is not described in as much detail as I would have liked. Reading Dick Grune's Modern Compiler Design (the chapter on code generation) helped me grok the lburg code.

These are minor nitpicks though. Hanson's book is the best I've seen on practical compiler implementation.

Amazon links, I use my mouse cut off the extraneos material from the url :-)

S said...

I met Ronald Graham (world-class mathematician, Concrete Mathematics co-author) once. He gave a lecture at my Univ in 2001 or so.

My math professor (for my graph theory class) was his PhD student long, long time ago -- so I met with Ron Graham at my professor's office. Thankfully I had Concrete Mathematics then -- so I had him sign it for me!! He looked at me and said: "This is a hard book, young man; hope you can handle it." He then turned to my professor and said, "You know they are using Concrete in Israel as a training ground for the Math Olympaid".

Interesting things I noticed: he (Graham) seemed so eagar to learn about the ongoings in Modern Physics. He was visiting my Univ only for a few days and even during that time wanted to access the latest edition of some Physics journal.

And while I was in the room, he said (to my prof) -- "I talked to Don the other day, and he was a little bit overloaded with Vol 4." I was like a kid in a candy store, hearing him talk about Knuth like that!!!

Anyway, the next the he gave the best lecture/presentation that I have ever seen.

-S

Ravi said...

@S

I am so jealous!!! Aaargh I want to study in a University and meet cool people too !!!
:-D

Seriously, the major mistake of my life was to hang around in the "enterprise" software world too long.

To make up for all that lost time (and life :-( ) I plan to apply for a PhD this year.

Now to persuade a good university to let me in :-D

Which university do you study (teach?) in

Anonymous said...

Are these 3 Books available in any bookstore in bangalore?

S said...

Arizona State Univ.

So where (which univ) are you planning to do your PhD and in what area?

You should try to work with someone like http://www.cs.vu.nl/~ast/ !!

So, one day, feeling bored, I followed a like to one of his student's home page. One Fran Kaashoek. OK? Seems like a smart guy -- a MIT prof after all! He talks about "co-founding" SightPath. And I thought, oh cool, he was in the industry and that experience probably helped him get into MIT as a prof. OK, cool. Not bad.

So I googled "Sightpath Inc Cisco" -- apprantly Cisco bought SightPath in 2000 -- for around $800 million. Yeah, not bad.

-S

Ravi said...

Anonymous said...

Are these 3 Books available in any bookstore in bangalore?


Two of them ("Introduction..." and "concrete Mathematics" are available in Bangalore. I am not sure about Skiena's book. I bought it through Rajesh's Learning Journey some time ago. If you are located in India dn wish to buy foreign editions, I reccomend LJ highly.

Ravi said...

@S,
ASU? Cool! I visted the campus when I was in Arizona!

"So where (which univ) are you planning to do your PhD and in what area?"

I haven't decided the university yet but I hope to focus on either Artificial Intelligence or Compilers.

You are right, working with the best in CS (like Tanenbaum) would be awesome! As would selling a company for 800 million!

I've heard of Franz Kaashoek before (his paper on the Orca programming language is *brilliant*) but I didn't know he had co founded SightPath! Thanks for that nugget!

Anonymous said...

Stumbled across your blog today - very interesting! You might want to check out OCaml as well - it's quite similar to Haskell, but with more OO/imperative features (you never know when you need that loop fix!). I'm using F#(a Microsoft derivative of Caml) which is interesting b/c it makes the .NET framework accessible.

OCaml also has a version of Lex & Yacc.

Mahesh said...

Hi Ravi.

I wanted to request you to give me a permission to read your learningandresearch blog.

I am tech nerd since from child hood [1994]

AweSome post. I fell in love with Algo [cormen] and Concrete Mathematics.