Sunday, March 29, 2015

Compiling and running Hadoop's WordCount.java

After installing and running Hadoop for the first time, it came the moment when I would have to compile and run one of the Hadoop's utilities. The first one is WordCount.java.

I first tryed doing the adjustments in Hadoop 2.6.0 but the differences where enough to prevent one from following the book. So I downloaded and unpacked Hadoop 1.2.1. Also I decided to keep just one directory with Hadoop, which was /usr/local/src/hadoop, plain and simple.

If you just try running the compiler with the commands as they are depicted in Chuck Lam's Hadoop in Action, the compilation process stops with several errors being found. 

My first reaction was to say: "Well, I would have to learn Java for real one day...". But after a while I decided to look for that mistake in specific on the Internet. To my surprise, the book site had several suggestions on how to overcome the compilation error in WordCount.java.

To make a long story short, the right command to make it work is javac -classpath /usr/local/src/hadoop/hadoop-core-1.2.1.jar:/usr/local/src/hadoop/lib/commons-cli-1.2.jar -d playground/classes playground/src/WordCount.java where javac is the java compiler and classpath is the path where hadoop-core and any other java library that can't be found by default by the java compiler is located.
 
In this case if you issue just the standard command as it is written in the book "javac -classpath hadoop-core-1.2.1.jar -d playground/classes playground/src/WordCount.java" it will fail with several errors.
 
This can be partially overcome if you specify hadoop-core-1.2.1.jar directory explicitly. In this case you would issue: "javac -classpath /usr/local/src/hadoop/hadoop-core-1.2.1.jar -d playground/classes playground/src/WordCount.java". This will narrow the ammount of errors to one which is "class file for org.apache.commons.cli.Options not found".
 
Looking for commons.cli library in the hadoop tree structure you will see that it can be found at /usr/local/src/hadoop/lib and it's full name is commons-cli-1.2.jar. Now issue the command with both java libraries fully specified in the -classpath option (as presented in the beggining of the explanation) and it should compile smoothly.
 
This command can be reduced if you specify /usr/local/src/hadoop in PATH (export PATH=$PATH:/usr/local/src/hadoop). In this case the command would became: "javac -classpath hadoop-core-1.2.1.jar:/usr/local/src/hadoop/lib/commons-cli-1.2.jar -d playground/classes playground/src/WordCount.java"
 
Just remembering that it is also of good advice adding hadoop's command line utility path in environment PATH variable (export PATH=$PATH:usr/local/src/hadoop/bin), because this will make it avaiable for running just by typing "hadoop". Also, PATH has no effect in "lib"s directories (as far as I have noticed).

Long live Hadoop!

Gustavo

No comments:

Post a Comment