Saturday 25 October 2008

Erlang and Map-Reduce

I was a fan of Google's Map-Reduce for a quite long time, as I was first doing my PhD research in distributed systems and then was working in some high-availability projects, so such interest emerged somehow naturally.

Their (i.e. Google's) achievements quite impressed me: first the whole infrastructure (lot of C++ coding - they simple wrote their own filesystem and database!!!), but equally the level of abstraction used when working with distributed data. I wrote about the superiority of that model over the SOA model before, although the SOA model is probably the best you can achieve in a heterogenous environment (and I was probably wrong there...).

So every time I see a map-reduce implementation I can't help reading about it: Hadoop is the most known open source implementation, but there's the QtConcurrent::mappedReduced algorithm in the new Qt 4.5* as well. You see, the idea seems to be catching on.

Now to the news: there is a Map-Reduce implementation in Erlang (!!!)** which runs Python scripts (!!!) and it's called Disco***! And if you don't have a massive parallel cluster at home, you can run it in the Amazon's Elastic Computing Cloud! I don't like Nokia very much, but I must admit that this one is rather cool: you simply write scripts to manipulate your data, much in the vein of UNIX shell programming, only infinitely scalable! And we know that scripting languages are much better for data manipulations than Java or C++. According to its homepage, Disco is quite a success too:

This far Disco has been succesfully used, for instance, in parsing and reformatting data, data clustering, probabilistic modelling, data mining, full-text indexing, and log analysis with hundreds of gigabytes of real-world data. ***
Wow! I like te idea of Erlang and Python working unisono!

---
* Qt 4.5 docs: http://doc.trolltech.com/main-snapshot/threads.html#qtconcurrent
** a small itroduction to Erlang: http://ib-krajewski.blogspot.com/2007/08/erlangs-change-of-fortunes.html
*** Disco's homepage: http://discoproject.org/

3 comments:

Anonymous said...

Ever heard of hyperlinks? Footnotes are outdated. Thanks

Marek Krj said...

I certainly wouldn't like to disagree with anone, but honestly: I couldn't care less.

Anonymous said...

Stop trolling, dear "Anonymous" :) It's not cool since a long-long time. Ever heard of universities, technical papers, etc.? Footnotes are used there. It's much more organized than in-text links.