使用命令行编译打包运行MapReduce程序 – WordCount – 博客园

对于如何编译WordCount.java，对于0.20 等旧版本版本的做法很常见，具体如下：

 javac -classpath /usr/local/hadoop/hadoop-<span style="color: #800080;">1.0</span>.<span style="color: #800080;">1</span>/hadoop-core-<span style="color: #800080;">1.0</span>.<span style="color: #800080;">1</span>.jar WordCount.java

但较新的 2.X 版本中，已经没有 hadoop-core*.jar 这个文件，因此编辑和打包自己的MapReduce程序与旧版本有所不同。

本文以 Hadoop 2.6环境下的WordCount实例来介绍 2.x 版本中如何编辑自己的MapReduce程序。

Hadoop 2.x 版本中的依赖 jar

Hadoop 2.x 版本中jar不再集中在一个 hadoop-core*.jar 中，而是分成多个 jar，如运行WordCount实例需要如下三个 jar:

$HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar

$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar

$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar

编译、打包 Hadoop MapReduce 程序

将上述 jar 添加至 classpath 路径：

hadoop@ubuntu:~$ export CLASSPATH=<span style="color: #800000;">"</span><span style="color: #800000;">$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/lib/commons-cli-1.2.jar:$CLASSPATH</span><span style="color: #800000;">"</span>

接着就可以编译 WordCount.java 了（使用的是 2.6.0源码中的 WordCount.java）

文件位于/hadoop-2.6.0-src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples 中，

javac WordCount.java

编译时会有警告，可以忽略。编译后可以看到生成了几个.class文件。

/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method ‘value()’ in type ‘LimitedPrivate’: class file for org.apache.hadoop.classification.InterfaceAudience not found
1 warning
hadoop@ubuntu:~/opt/code$ ls
WordCount.class WordCount.java WordCount$MapClass.class WordCount$Reduce.class

接着把 .class 文件打包成 jar，才能在 Hadoop 中运行：

hadoop@ubuntu:~/opt/code$ jar -cvf WordCount.jar ./WordCount*.class
added manifest
adding: WordCount.class(in = 3363) (out= 1687)(deflated 49%)
adding: WordCount$MapClass.class(in = 1978) (out= 800)(deflated 59%)
adding: WordCount$Reduce.class(in = 1641) (out= 645)(deflated 60%)

创建HDFS所需的输入文件夹：

hadoop@ubuntu:~/opt/code$ mkdir input
hadoop@ubuntu:~/opt/code$ echo “Hello Hadoop Goodbye Hadoop” > ./input/file1
hadoop@ubuntu:~/opt/code$ echo “Hello World Bye World” > ./input/file2
hadoop@ubuntu:~/opt/code$ ls ./input
file1 file2

运行我们的wordcount程序：

hadoop@ubuntu:~$ cd ~/opt/code

hadoop@ubuntu:~/opt/code$ ~/opt/hadoop-2.6.0/bin/hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output

程序运行完之后，检查我们的输出结果：

hadoop@ubuntu:~/opt/code$ <span style="color: #0000ff;">ls</span> ./<span style="color: #000000;">output</span>

part-r-00000 _SUCCESS

hadoop@ubuntu:~/opt/code$ cat ./output/part-r-00000

Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

PS：WordCount.java 源代码如下：

<span style="color: #0000ff;">package</span><span style="color: #000000;"> org.apache.hadoop.mapred;</span>

import java.io.IOException;

import java.util.ArrayList;

import java.util.Iterator;

import java.util.List;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

/**

* This is an example Hadoop Map/Reduce application.

* It reads the text input files, breaks each line into words

* and counts them. The output is a locally sorted list of words and the

* count of how often they occurred.

* To run: bin/hadoop jar build/hadoop-examples.jar wordcount

* [-m maps] [-r reduces] in-dir out-dir

public class WordCount extends Configured implements Tool {

/**

* Counts the words in each line.

* For each line of input, break the line into words and emit them as

* (word, 1).

public static class MapClass extends MapReduceBase

implements Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable> output,

Reporter reporter) throws IOException {

String line = value.toString();

StringTokenizer itr = new StringTokenizer(line);

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

output.collect(word, one);

}

/**

* A reducer class that just emits the sum of the input values.

public static class Reduce extends MapReduceBase

implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,

OutputCollector<Text, IntWritable> output,

Reporter reporter) throws IOException {

int sum = 0;

while (values.hasNext()) {

sum += values.next().get();

}

output.collect(key, new IntWritable(sum));

}

static int printUsage() {

System.out.println(“wordcount [-m <maps>] [-r <reduces>] <input> <output>”);

ToolRunner.printGenericCommandUsage(System.out);

return -1;

}

/**

* The main driver for word count map/reduce program.

* Invoke this method to submit the map/reduce job.

* @throws IOException When there is communication problems with the

* job tracker.

public int run(String[] args) throws Exception {

JobConf conf = new JobConf(getConf(), WordCount.class);

conf.setJobName(“wordcount”);

// the keys are words (strings)

conf.setOutputKeyClass(Text.class);

// the values are counts (ints)

conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(MapClass.class);

conf.setCombinerClass(Reduce.class);

conf.setReducerClass(Reduce.class);

List<String> other_args = new ArrayList<String>();

for(int i=0; i < args.length; ++i) {

try {

if (“-m”.equals(args[i])) {

conf.setNumMapTasks(Integer.parseInt(args[++i]));

} else if (“-r”.equals(args[i])) {

conf.setNumReduceTasks(Integer.parseInt(args[++i]));

} else {

other_args.add(args[i]);

}

} catch (NumberFormatException except) {

System.out.println(“ERROR: Integer expected instead of ” + args[i]);

return printUsage();

} catch (ArrayIndexOutOfBoundsException except) {

System.out.println(“ERROR: Required parameter missing from ” +

args[i-1]);

return printUsage();

}

// Make sure there are exactly 2 parameters left.

if (other_args.size() != 2) {

System.out.println(“ERROR: Wrong number of parameters: ” +

other_args.size() + ” instead of 2.”);

return printUsage();

}

FileInputFormat.setInputPaths(conf, other_args.get(0));

FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));

JobClient.runJob(conf);

return 0;

}

public static void main(String[] args) throws Exception {

int res = ToolRunner.run(new Configuration(), new WordCount(), args);

System.exit(res);

}

参考资料

http://www.powerxing.com/hadoop-build-project-by-shell/

http://blog.sina.com.cn/s/blog_68cceb610101r6tg.html

http://www.cppblog.com/humanchao/archive/2014/05/27/207118.aspx

来源URL：http://www.cnblogs.com/myresearch/p/mapreduce-compile-jar-run.html