现在我的hadoop搭建完毕后,mapreduce无法完成。
- 13/07/18 01:59:42 INFO mapred.JobClient: Running job: job_201307180157_0001
- 13/07/18 01:59:43 INFO mapred.JobClient: map 0% reduce 0%
- 13/07/18 01:59:52 INFO mapred.JobClient: map 50% reduce 0%
- 13/07/18 01:59:55 INFO mapred.JobClient: map 100% reduce 0%
- 13/07/18 02:07:48 INFO mapred.JobClient: Task Id : attempt_201307180157_0001_m_000000_0, Status : FAILED
- Too many fetch-failures
- 13/07/18 02:07:48 WARN mapred.JobClient: Error reading task outputConnection refused
- 13/07/18 02:07:48 WARN mapred.JobClient: Error reading task outputConnection refused
原因在于,我的局域网是带有DNS的,在解析域名的时候, 发生了问题。
访问http://10.30.7.111:50030/jobtracker.jsp 能看到问题所在:
- <tbody><tr><td align=“center” colspan=“6”><b>Task Trackers</b></td></tr>
- <tr><td><b>Name</b></td><td><b>Host</b></td><td><b># running tasks</b></td><td><b>Max Map Tasks</b></td><td><b>Max Reduce Tasks</b></td><td><b>Failures</b></td><td><b>Seconds since heartbeat</b></td></tr>
- <tr><td><a href=“http://180.168.41.175:50060/”>tracker_180.168.41.175:127.0.0.1/127.0.0.1:32769</a></td><td>180.168.41.175</td><td>1</td><td>2</td><td>2</td><td>7</td><td>2</td></tr>
- <tr><td><a href=“http://180.168.41.175:50060/”>tracker_180.168.41.175:127.0.0.1/127.0.0.1:32874</a></td><td>180.168.41.175</td><td>0</td><td>2</td><td>2</td><td>0</td><td>2</td></tr>
- </tbody>
即解析datanode时候,没有解析到,导致超时,然后too many fetch failure.
现在问题阐述如下:
我的机器名:即hostname
hadoop.h1 masters
hadoop.h2 slaves
hadoop.h3 slaves
但是我在配置文件里面用的都是hadoop1:9000,hadoop1:9001,包括slaves,masters
/etc/hosts里面也是
10.30.7.111 hadoop1
10.30.7.120 hadoop2
10.30.7.121 hadoop3
…
然后我发现,hadoop启动的时候,
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = hadoop.h1/180.168.41.175
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 0.20.2
- STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by ‘chrisdo’ on Fri Feb 19 08:07:34 UTC 2010
这个不正确的ip让我百思不得其解。
经过自己琢磨研究
- org.apache.hadoop.hdfs.server.namenode.NameNode
,结论如下
这个java.net.InetAddress 类找抽啊,他拿hostIP的时候,是根据hostname得出结果,再去/etc/hosts找映射,我的hsotsname是hadoop.h1,根本找不到。hosts里面全都是hadoop1,2,3的映射~~电信DNS如果找不到这个域名的话,就返回180这个ip了,这rule也找死,配这么个摸不着头脑的rule。。。
所以,hosts文件里面又多了几行
10.30.7.111 hadoop.h1
10.30.7.120 hadoop.h2
10.30.7.121 hadoop.h3
终于,启动日志回复了正常
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = hadoop.h1/10.30.7.111
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 0.20.2
- STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by ‘chrisdo’ on Fri Feb 19 08:17:01 UTC 2010
尽管如此,mapreduce还是不能正确通过,因为datanode地址被硬生生地解析成了180.
hadoop启动里面也没有日志可看。这可让人难以琢磨。请大家帮忙看看,这样配置以后,为什么hadoop程序还是返回了180给datanode,莫非,还有其他的域名需要写在hosts中?
(ssh均配置正确,都能互访)
请帮忙想想,hadoop程序到底使用什么域名去取ip地址的!