一、上传hadoop包到master机器/usr目录
版本:hadoop-1.2.1.tar.gz
解压:
- tar -zxvf hadoop-1.2.1.tar.gz
当前目录产出hadoop-1.2.1目录,进去创建tmp目录备用:
- [root@master hadoop-1.2.1]# mkdir tmp
返回usr目录,赋给hadoop用户hadoop-1.2.1读写权限
- [root@master usr]# chown -R hadoop:hadoop hadoop-1.2.1/
插曲:在后面操作时,是赋完hadoop目录权限后,再建立的tmp目录,所以格式化namenode时,出现错误:
- [hadoop@master conf]$ hadoop namenode -format
- Warning: $HADOOP_HOME is deprecated.
- 13/09/08 00:33:06 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = master.hadoop/192.168.70.101
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 1.2.1
- STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by ‘mattf’ on Mon Jul 22 15:23:09 PDT 2013
- STARTUP_MSG: java = 1.6.0_45
- ************************************************************/
- 13/09/08 00:33:06 INFO util.GSet: Computing capacity for map BlocksMap
- 13/09/08 00:33:06 INFO util.GSet: VM type = 32-bit
- 13/09/08 00:33:06 INFO util.GSet: 2.0% max memory = 1013645312
- 13/09/08 00:33:06 INFO util.GSet: capacity = 2^22 = 4194304 entries
- 13/09/08 00:33:06 INFO util.GSet: recommended=4194304, actual=4194304
- 13/09/08 00:33:06 INFO namenode.FSNamesystem: fsOwner=hadoop
- 13/09/08 00:33:06 INFO namenode.FSNamesystem: supergroup=supergroup
- 13/09/08 00:33:06 INFO namenode.FSNamesystem: isPermissionEnabled=true
- 13/09/08 00:33:06 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
- 13/09/08 00:33:06 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
- 13/09/08 00:33:06 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
- 13/09/08 00:33:06 INFO namenode.NameNode: Caching file names occuring more than 10 times
- 13/09/08 00:33:07 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /usr/hadoop-1.2.1/tmp/dfs/name/current
- at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:294)
- at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1337)
- at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1356)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1261)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1467)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
- 13/09/08 00:33:07 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at master.hadoop/192.168.70.101
- ************************************************************/
- [hadoop@master conf]$
修正权限后解决。
二、配置hadoop环境变量(root) master slave都做:
- [root@master conf]# vi /etc/profile
- HADOOP_HOME=/usr/hadoop-1.2.1
- export HADOOP_HOME
- PATH=$PATH:$HADOOP_HOME/bin
- export PATH
加载环境变量:
- [root@master conf]# source /etc/profile
测试环境变量:
- [root@master conf]# hadoop
- Warning: $HADOOP_HOME is deprecated.
- Usage: hadoop [–config confdir] COMMAND
- where COMMAND is one of:
- namenode -format format the DFS filesystem
- ….
done
三、修改hadoop JAVA_HOME路径:
- [root@slave01 conf]# vi hadoop-env.sh
- # The java implementation to use. Required.
- export JAVA_HOME=/usr/jdk1.6.0_45
四、修改core-site.xml配置
- <configuration>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/usr/hadoop-1.2.1/tmp</value>
- </property>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://master.hadoop:9000</value>
- </property>
- </configuration>
五、修改hdfs-site.xml
- [hadoop@master conf]$ vi hdfs-site.xml
- <configuration>
- <property>
- <name>dfs.data.dir</name>
- <value>/usr/hadoop-1.2.1/data</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>2</value>
- </property>
- </configuration>
六、修改mapred-site.xml
- [hadoop@master conf]$ vi mapred-site.xml
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>master.hadoop:9001</value>
- </property>
- </configuration>
七、修改 masters和slaves
- [hadoop@master conf]$ vi masters
添加hostname或者IP
- master.hadoop
- [hadoop@master conf]$ vi slaves
- slave01.hadoop
- slave02.hadoop
八、将修改好的hadoop分发给slave,由于slave节点hadoop用户还没有usr目录下的写权限,所以目的主机用root,源主机无所谓
- [root@master usr]#
- [root@master usr]# scp -r hadoop-1.2.1/ root@slave01.hadoop:/usr
- …
- [root@master usr]#
- [root@master usr]# scp -r hadoop-1.2.1/ root@slave02.hadoop:/usr
然后,slave修改hadoop-1.2.1目录权限
九、格式化HDFS文件系统
- [hadoop@master usr]$
- [hadoop@master usr]$ hadoop namenode -format
- Warning: $HADOOP_HOME is deprecated.
- 14/10/22 07:26:09 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = master.hadoop/192.168.1.100
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 1.2.1
- STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by ‘mattf’ on Mon Jul 22 15:23:09 PDT 2013
- STARTUP_MSG: java = 1.6.0_45
- ************************************************************/
- 14/10/22 07:26:09 INFO util.GSet: Computing capacity for map BlocksMap
- 14/10/22 07:26:09 INFO util.GSet: VM type = 32-bit
- 14/10/22 07:26:09 INFO util.GSet: 2.0% max memory = 1013645312
- 14/10/22 07:26:09 INFO util.GSet: capacity = 2^22 = 4194304 entries
- 14/10/22 07:26:09 INFO util.GSet: recommended=4194304, actual=4194304
- 14/10/22 07:26:09 INFO namenode.FSNamesystem: fsOwner=hadoop
- 14/10/22 07:26:09 INFO namenode.FSNamesystem: supergroup=supergroup
- 14/10/22 07:26:09 INFO namenode.FSNamesystem: isPermissionEnabled=true
- 14/10/22 07:26:09 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
- 14/10/22 07:26:09 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
- 14/10/22 07:26:09 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
- 14/10/22 07:26:09 INFO namenode.NameNode: Caching file names occuring more than 10 times
- 14/10/22 07:26:09 INFO common.Storage: Image file /usr/hadoop/tmp/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds.
- 14/10/22 07:26:09 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/usr/hadoop/tmp/dfs/name/current/edits
- 14/10/22 07:26:09 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/usr/hadoop/tmp/dfs/name/current/edits
- 14/10/22 07:26:09 INFO common.Storage: Storage directory /usr/hadoop/tmp/dfs/name has been successfully formatted.
- 14/10/22 07:26:09 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at master.hadoop/192.168.1.100
- ************************************************************/
- [hadoop@master usr]$
出现…….successfully formatted 为成功,见第一步的插曲
十、启动hadoop
启动器,先关闭iptables(master、slave都要关闭),不然执行任务可能出错
- [root@master usr]# service iptables stop
- iptables: Flushing firewall rules: [ OK ]
- iptables: Setting chains to policy ACCEPT: filter [ OK ]
- iptables: Unloading modules: [ OK ]
- [root@master usr]#
(slave忘记关闭防火墙)插曲:
- [hadoop@master hadoop-1.2.1]$ hadoop jar hadoop-examples-1.2.1.jar pi 10 100
- Warning: $HADOOP_HOME is deprecated.
- Number of Maps = 10
- Samples per Map = 100
- 13/09/08 02:17:05 INFO hdfs.DFSClient: Exception in createBlockOutputStream 192.168.70.102:50010 java.net.NoRouteToHostException: No route to host
- 13/09/08 02:17:05 INFO hdfs.DFSClient: Abandoning blk_9160013073143341141_4460
- 13/09/08 02:17:05 INFO hdfs.DFSClient: Excluding datanode 192.168.70.102:50010
- 13/09/08 02:17:05 INFO hdfs.DFSClient: Exception in createBlockOutputStream 192.168.70.103:50010 java.net.NoRouteToHostException: No route to host
- 13/09/08 02:17:05 INFO hdfs.DFSClient: Abandoning blk_-1734085534405596274_4461
- 13/09/08 02:17:05 INFO hdfs.DFSClient: Excluding datanode 192.168.70.103:50010
- 13/09/08 02:17:05 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes, instead of 1
- at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
关闭后解决,另外,配置尽量用IP吧。
启动
- [root@master usr]# su hadoop
- [hadoop@master usr]$ start-all.sh
- Warning: $HADOOP_HOME is deprecated.
- starting namenode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-master.hadoop.out
- slave01.hadoop: starting datanode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-slave01.hadoop.out
- slave02.hadoop: starting datanode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-slave02.hadoop.out
- The authenticity of host ‘master.hadoop (192.168.70.101)’ can’t be established.
- RSA key fingerprint is 6c:e0:d7:22:92:80:85:fb:a6:d6:a4:8f:75:b0:96:7e.
- Are you sure you want to continue connecting (yes/no)? yes
- master.hadoop: Warning: Permanently added ‘master.hadoop,192.168.70.101’ (RSA) to the list of known hosts.
- master.hadoop: starting secondarynamenode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-master.hadoop.out
- starting jobtracker, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-master.hadoop.out
- slave02.hadoop: starting tasktracker, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-slave02.hadoop.out
- slave01.hadoop: starting tasktracker, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-slave01.hadoop.out
- [hadoop@master usr]$
从日志看出,启动过程:namenode(master)—-> datanode(slave01、slave02)—->secondarynamenode(master)—–> jobtracker(master)—–> 最后启动tasktracker(slave01、slave02)
十一、验证
查看hadoop进程,master、slave节点分别用jps
master:
- [hadoop@master tmp]$ jps
- 6009 Jps
- 5560 SecondaryNameNode
- 5393 NameNode
- 5627 JobTracker
- [hadoop@master tmp]$
slave01:
- [hadoop@slave01 tmp]$ jps
- 3855 Jps
- 3698 TaskTracker
- 3636 DataNode
slave02:
- [root@slave02 tmp]# jps
- 3628 TaskTracker
- 3748 Jps
- 3567 DataNode
- [root@slave02 tmp]#
查看集群状态:hadoop dfsadmin -report
- [hadoop@master tmp]$ hadoop dfsadmin -report
- Warning: $HADOOP_HOME is deprecated.
- Configured Capacity: 14174945280 (13.2 GB)
- Present Capacity: 7577288704 (7.06 GB)
- DFS Remaining: 7577231360 (7.06 GB)
- DFS Used: 57344 (56 KB)
- DFS Used%: 0%
- Under replicated blocks: 0
- Blocks with corrupt replicas: 0
- Missing blocks: 0
- ————————————————-
- Datanodes available: 2 (2 total, 0 dead)
- Name: 192.168.70.103:50010
- Decommission Status : Normal
- Configured Capacity: 7087472640 (6.6 GB)
- DFS Used: 28672 (28 KB)
- Non DFS Used: 3298820096 (3.07 GB)
- DFS Remaining: 3788623872(3.53 GB)
- DFS Used%: 0%
- DFS Remaining%: 53.46%
- Last contact: Sun Sep 08 01:19:18 PDT 2013
- Name: 192.168.70.102:50010
- Decommission Status : Normal
- Configured Capacity: 7087472640 (6.6 GB)
- DFS Used: 28672 (28 KB)
- Non DFS Used: 3298836480 (3.07 GB)
- DFS Remaining: 3788607488(3.53 GB)
- DFS Used%: 0%
- DFS Remaining%: 53.45%
- Last contact: Sun Sep 08 01:19:17 PDT 2013
- [hadoop@master tmp]$
集群管理页面:master IP
http://192.168.70.101:50030
http://192.168.70.101:50070/
十二、执行任务,算个圆周率
- [hadoop@master hadoop-1.2.1]$ hadoop jar hadoop-examples-1.2.1.jar pi 10 100
第一个参数10:表示运行10次map任务
第二个参数100:表示每个map取样的个数
正常结果:
- [hadoop@master hadoop-1.2.1]$ hadoop jar hadoop-examples-1.2.1.jar pi 10 100
- Warning: $HADOOP_HOME is deprecated.
- Number of Maps = 10
- Samples per Map = 100
- Wrote input for Map #0
- Wrote input for Map #1
- Wrote input for Map #2
- Wrote input for Map #3
- Wrote input for Map #4
- Wrote input for Map #5
- Wrote input for Map #6
- Wrote input for Map #7
- Wrote input for Map #8
- Wrote input for Map #9
- Starting Job
- 13/09/08 02:21:50 INFO mapred.FileInputFormat: Total input paths to process : 10
- 13/09/08 02:21:52 INFO mapred.JobClient: Running job: job_201309080221_0001
- 13/09/08 02:21:53 INFO mapred.JobClient: map 0% reduce 0%
- 13/09/08 02:24:06 INFO mapred.JobClient: map 10% reduce 0%
- 13/09/08 02:24:07 INFO mapred.JobClient: map 20% reduce 0%
- 13/09/08 02:24:21 INFO mapred.JobClient: map 30% reduce 0%
- 13/09/08 02:24:28 INFO mapred.JobClient: map 40% reduce 0%
- 13/09/08 02:24:31 INFO mapred.JobClient: map 50% reduce 0%
- 13/09/08 02:24:32 INFO mapred.JobClient: map 60% reduce 0%
- 13/09/08 02:24:38 INFO mapred.JobClient: map 70% reduce 0%
- 13/09/08 02:24:41 INFO mapred.JobClient: map 80% reduce 13%
- 13/09/08 02:24:44 INFO mapred.JobClient: map 80% reduce 23%
- 13/09/08 02:24:45 INFO mapred.JobClient: map 100% reduce 23%
- 13/09/08 02:24:47 INFO mapred.JobClient: map 100% reduce 26%
- 13/09/08 02:24:53 INFO mapred.JobClient: map 100% reduce 100%
- 13/09/08 02:24:54 INFO mapred.JobClient: Job complete: job_201309080221_0001
- 13/09/08 02:24:54 INFO mapred.JobClient: Counters: 30
- 13/09/08 02:24:54 INFO mapred.JobClient: Job Counters
- 13/09/08 02:24:54 INFO mapred.JobClient: Launched reduce tasks=1
- 13/09/08 02:24:54 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=638017
- 13/09/08 02:24:54 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 13/09/08 02:24:54 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 13/09/08 02:24:54 INFO mapred.JobClient: Launched map tasks=10
- 13/09/08 02:24:54 INFO mapred.JobClient: Data-local map tasks=10
- 13/09/08 02:24:54 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=44458
- 13/09/08 02:24:54 INFO mapred.JobClient: File Input Format Counters
- 13/09/08 02:24:54 INFO mapred.JobClient: Bytes Read=1180
- 13/09/08 02:24:54 INFO mapred.JobClient: File Output Format Counters
- 13/09/08 02:24:54 INFO mapred.JobClient: Bytes Written=97
- 13/09/08 02:24:54 INFO mapred.JobClient: FileSystemCounters
- 13/09/08 02:24:54 INFO mapred.JobClient: FILE_BYTES_READ=226
- 13/09/08 02:24:54 INFO mapred.JobClient: HDFS_BYTES_READ=2460
- 13/09/08 02:24:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=623419
- 13/09/08 02:24:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
- 13/09/08 02:24:54 INFO mapred.JobClient: Map-Reduce Framework
- 13/09/08 02:24:54 INFO mapred.JobClient: Map output materialized bytes=280
- 13/09/08 02:24:54 INFO mapred.JobClient: Map input records=10
- 13/09/08 02:24:54 INFO mapred.JobClient: Reduce shuffle bytes=280
- 13/09/08 02:24:54 INFO mapred.JobClient: Spilled Records=40
- 13/09/08 02:24:54 INFO mapred.JobClient: Map output bytes=180
- 13/09/08 02:24:54 INFO mapred.JobClient: Total committed heap usage (bytes)=1414819840
- 13/09/08 02:24:54 INFO mapred.JobClient: CPU time spent (ms)=377130
- 13/09/08 02:24:54 INFO mapred.JobClient: Map input bytes=240
- 13/09/08 02:24:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=1280
- 13/09/08 02:24:54 INFO mapred.JobClient: Combine input records=0
- 13/09/08 02:24:54 INFO mapred.JobClient: Reduce input records=20
- 13/09/08 02:24:54 INFO mapred.JobClient: Reduce input groups=20
- 13/09/08 02:24:54 INFO mapred.JobClient: Combine output records=0
- 13/09/08 02:24:54 INFO mapred.JobClient: Physical memory (bytes) snapshot=1473769472
- 13/09/08 02:24:54 INFO mapred.JobClient: Reduce output records=0
- 13/09/08 02:24:54 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4130349056
- 13/09/08 02:24:54 INFO mapred.JobClient: Map output records=20
- Job Finished in 184.973 seconds
- Estimated value of Pi is 3.14800000000000000000
- [hadoop@master hadoop-1.2.1]$
由于未关闭slave防火墙,见第十步插曲。