Tuesday, November 6, 2007
Javascript prototyping for Hadoop map/reduce jobs
Java 6 brought us jvm level scripting support as modelled in jsr-223. (Sun documentation about the subject)
I have chosen the wordcount example (as seen on Hadoop) to demonstrate a complete javascript map/reduce example (a mapper, a reducer and a "driver"):
Shortly: there isn't any. The wordcount demo as presented here takes roughly 6-7 times more time than the wordcount java exmple from Hadoop. It's quite obvious this technique is not really practical beyond prototyping. For prototyping... hmmm I am not totally sure it is usable for that either. Anyway have fun with it, I know I had some while making it ;)
Wordcount Demo
I have chosen the wordcount example (as seen on Hadoop) to demonstrate a complete javascript map/reduce example (a mapper, a reducer and a "driver"):
importPackage(java.util);
importPackage(org.apache.hadoop.io)
function map(key, value, collector){
tokenizer = new StringTokenizer(value);
while(tokenizer.hasMoreTokens()) {
collector.collect(tokenizer.nextToken(), 1);
}
}
function reduce(key, values, collector){
counter=0;
while(values.hasNext()) {
counter+=values.next().get();
}
collector.collect(key, counter);
}
function driver(args, jobConf, jobClient) {
jobConf.outputKeyClass=Text;
jobConf.outputValueClass=IntWritable;
jobConf.setInput(args[0]);
jobConf.setOutput(args[1]);
jobConf.map='map';
jobConf.combiner='reduce';
jobConf.reduce='reduce';
jobClient.runJob(jobConf);
}
System requirements
- Java 6 from Sun
- Hadoop from Apache
- scriptmr.jar from foofactory [source]
Running javascript map reduce
- Install java as documented by Sun
- Extract hadoop release tar ball into your favourite location
- Run your javascript map/reduce job with following command:
bin/hadoop jar </path/to/scriptmr.jar> <script-file-name> [<script-arg-1>...]
Performance?
Shortly: there isn't any. The wordcount demo as presented here takes roughly 6-7 times more time than the wordcount java exmple from Hadoop. It's quite obvious this technique is not really practical beyond prototyping. For prototyping... hmmm I am not totally sure it is usable for that either. Anyway have fun with it, I know I had some while making it ;)
Labels: hadoop, java, javascript, mapreduce
