Tuesday, November 6, 2007

 

Javascript prototyping for Hadoop map/reduce jobs

Java 6 brought us jvm level scripting support as modelled in jsr-223. (Sun documentation about the subject)

Wordcount Demo



I have chosen the wordcount example (as seen on Hadoop) to demonstrate a complete javascript map/reduce example (a mapper, a reducer and a "driver"):


importPackage(java.util);
importPackage(org.apache.hadoop.io)


function map(key, value, collector){
tokenizer = new StringTokenizer(value);
while(tokenizer.hasMoreTokens()) {
collector.collect(tokenizer.nextToken(), 1);
}
}

function reduce(key, values, collector){
counter=0;
while(values.hasNext()) {
counter+=values.next().get();
}
collector.collect(key, counter);
}

function driver(args, jobConf, jobClient) {
jobConf.outputKeyClass=Text;
jobConf.outputValueClass=IntWritable;
jobConf.setInput(args[0]);
jobConf.setOutput(args[1]);
jobConf.map='map';
jobConf.combiner='reduce';
jobConf.reduce='reduce';
jobClient.runJob(jobConf);
}



System requirements




Running javascript map reduce



  • Install java as documented by Sun

  • Extract hadoop release tar ball into your favourite location

  • Run your javascript map/reduce job with following command: bin/hadoop jar </path/to/scriptmr.jar> <script-file-name> [<script-arg-1>...]



Performance?


Shortly: there isn't any. The wordcount demo as presented here takes roughly 6-7 times more time than the wordcount java exmple from Hadoop. It's quite obvious this technique is not really practical beyond prototyping. For prototyping... hmmm I am not totally sure it is usable for that either. Anyway have fun with it, I know I had some while making it ;)

Labels: , , ,



Navigation