CentOS6.4+hadoop-0.20.2安装实录
下载网址:资料网址节点规划:10.10.1.131 hadoop110.10.1.132 hadoop210.10.1.133 hadoop310.10.1.134 dog10.10.1.135 cat10.10.1.136 gangster一、解压安装
在主节点上做验证JDK是否安装[root@hadoop1 ~]# java -versionjava version "1.7.0_09-icedtea"OpenJDK Runtime Environment (rhel-2.3.4.1.el6_3-x86_64)OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)验证SSH是否安装[root@hadoop1 ~]# ssh -versionOpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010Bad escape character 'rsion'.各节点上都做
[root@hadoop1 ~]#vi /etc/hosts10.10.1.131 hadoop110.10.1.132 hadoop210.10.1.133 hadoop310.10.1.134 dog10.10.1.135 cat10.10.1.136 gangster[root@hadoop1 ~]# useradd hadoop
[root@hadoop1 ~]# passwd hadoop[root@hadoop1 ~]# vi /etc/sysconfig/iptables-A INPUT -s 10.10.1.131 -j ACCEPT-A INPUT -s 10.10.1.132 -j ACCEPT-A INPUT -s 10.10.1.133 -j ACCEPT-A INPUT -s 10.10.1.170 -j ACCEPT-A INPUT -s 10.10.1.171 -j ACCEPT-A INPUT -s 10.10.1.172 -j ACCEPT二、安装hadoop
把下载的hadoop-0.20.2.tar.gz文件上传到/home/hadoop目录下[hadoop@hadoop1 ~]$ tar xzvf hadoop-0.20.2.tar.gz-rw-r--r--. 1 hadoop hadoop 44575568 Feb 16 21:34 hadoop-0.20.2.tar.gz[hadoop@hadoop1 ~]$ cd hadoop-0.20.2[hadoop@hadoop1 hadoop-0.20.2]$ lltotal 4872drwxr-xr-x. 2 hadoop hadoop 4096 Feb 16 21:37 bin-rw-rw-r--. 1 hadoop hadoop 74035 Feb 19 2010 build.xmldrwxr-xr-x. 4 hadoop hadoop 4096 Feb 19 2010 c++-rw-rw-r--. 1 hadoop hadoop 348624 Feb 19 2010 CHANGES.txtdrwxr-xr-x. 2 hadoop hadoop 4096 Feb 16 21:37 confdrwxr-xr-x. 13 hadoop hadoop 4096 Feb 19 2010 contribdrwxr-xr-x. 7 hadoop hadoop 4096 Feb 16 21:37 docs-rw-rw-r--. 1 hadoop hadoop 6839 Feb 19 2010 hadoop-0.20.2-ant.jar-rw-rw-r--. 1 hadoop hadoop 2689741 Feb 19 2010 hadoop-0.20.2-core.jar-rw-rw-r--. 1 hadoop hadoop 142466 Feb 19 2010 hadoop-0.20.2-examples.jar-rw-rw-r--. 1 hadoop hadoop 1563859 Feb 19 2010 hadoop-0.20.2-test.jar-rw-rw-r--. 1 hadoop hadoop 69940 Feb 19 2010 hadoop-0.20.2-tools.jardrwxr-xr-x. 2 hadoop hadoop 4096 Feb 16 21:37 ivy-rw-rw-r--. 1 hadoop hadoop 8852 Feb 19 2010 ivy.xmldrwxr-xr-x. 5 hadoop hadoop 4096 Feb 16 21:37 libdrwxr-xr-x. 2 hadoop hadoop 4096 Feb 16 21:37 librecordio-rw-rw-r--. 1 hadoop hadoop 13366 Feb 19 2010 LICENSE.txt-rw-rw-r--. 1 hadoop hadoop 101 Feb 19 2010 NOTICE.txt-rw-rw-r--. 1 hadoop hadoop 1366 Feb 19 2010 README.txtdrwxr-xr-x. 15 hadoop hadoop 4096 Feb 16 21:37 srcdrwxr-xr-x. 8 hadoop hadoop 4096 Feb 19 2010 webap三、配置主节点hadoop用户无密码访问从节点
在MASTER节点上生成MASTER节点的ssh密钥对[root@hadoop1 ~]#su - hadoop[hadoop@hadoop1 ~]$ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Created directory '/home/hadoop/.ssh'.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa.Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.The key fingerprint is:b9:5c:48:74:25:33:ac:9f:11:c9:77:5e:02:43:3b:ba The key's randomart p_w_picpath is:+--[ RSA 2048]----+| .o=+=. || . .=+.oo .|| .. ooo o || ..o.. .. || S.oo || . oo. || o E || || |+-----------------+把公钥拷贝到各SLAVE节点上[hadoop@hadoop1 .ssh]$ scp id_rsa.pub The authenticity of host 'hadoop2 (10.10.1.132)' can't be established.RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'hadoop2,10.10.1.132' (RSA) to the list of known hosts. password: id_rsa.pub 100% 405 0.4KB/s 00:00 [hadoop@hadoop1 .ssh]$ scp id_rsa.pub The authenticity of host 'hadoop3 (10.10.1.133)' can't be established.RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'hadoop3,10.10.1.133' (RSA) to the list of known hosts. password: id_rsa.pub 100% 405 0.4KB/s 00:00 [hadoop@hadoop1 .ssh]$ scp id_rsa.pub The authenticity of host 'cat (10.10.1.171)' can't be established.RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'cat,10.10.1.171' (RSA) to the list of known hosts. password: id_rsa.pub 100% 405 0.4KB/s 00:00 [hadoop@hadoop1 .ssh]$ scp id_rsa.pub The authenticity of host 'dog (10.10.1.170)' can't be established.RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'dog,10.10.1.170' (RSA) to the list of known hosts. password: id_rsa.pub 100% 405 0.4KB/s 00:00 [hadoop@hadoop1 .ssh]$ scp id_rsa.pub The authenticity of host 'gangster (10.10.1.172)' can't be established.RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'gangster,10.10.1.172' (RSA) to the list of known hosts. password: id_rsa.pub 100% 405 0.4KB/s 00:00在各SLAVE节点上:
[hadoop@hadoop2 ~]$ mkdir .ssh[hadoop@hadoop2 ~]$ chmod 700 .ssh/[hadoop@hadoop2 ~]$ mv master-key .ssh/authorized_keys[hadoop@hadoop2 ~]$ cd .ssh/[hadoop@hadoop2 .ssh]$ chmod 600 authorized_keys[hadoop@hadoop3 ~]$ mkdir .ssh
[hadoop@hadoop3 ~]$ chmod 700 .ssh/[hadoop@hadoop3 ~]$ mv master-key .ssh/authorized_keys[hadoop@hadoop3 ~]$ cd .ssh/[hadoop@hadoop3 .ssh]$ chmod 600 authorized_keys[hadoop@dog ~]$ mkdir .ssh
[hadoop@dog ~]$ chmod 700 .ssh/[hadoop@dog ~]$ mv master-key .ssh/authorized_keys[hadoop@dog ~]$ cd .ssh/[hadoop@dog .ssh]$ chmod 600 authorized_keys[hadoop@cat ~]$ mkdir .ssh
[hadoop@cat ~]$ chmod 700 .ssh/[hadoop@cat ~]$ mv master-key .ssh/authorized_keys[hadoop@cat ~]$ cd .ssh/[hadoop@cat .ssh]$ chmod 600 authorized_keys[hadoop@gangster ~]$ mkdir .ssh
[hadoop@gangster ~]$ chmod 700 .ssh/[hadoop@gangster ~]$ mv master-key .ssh/authorized_keys[hadoop@gangster ~]$ cd .ssh/[hadoop@gangster .ssh]$ chmod 600 authorized_keys在MASTER节点上测试主节点无口令访问各从节点
[hadoop@hadoop1 .ssh]$ ssh hadoop2Last login: Sat Feb 15 19:35:21 2014 from hadoop1[hadoop@hadoop2 ~]$ exitlogoutConnection to hadoop2 closed.[hadoop@hadoop1 .ssh]$ ssh hadoop3
Last login: Sat Feb 15 21:57:38 2014 from hadoop1[hadoop@hadoop3 ~]$ exitlogoutConnection to hadoop3 closed.[hadoop@hadoop1 .ssh]$ ssh cat
Last login: Sat Feb 15 14:33:50 2014 from hadoop1[hadoop@cat ~]$ exitlogoutConnection to cat closed.[hadoop@hadoop1 .ssh]$ ssh dog
Last login: Sun Feb 16 20:41:19 2014 from hadoop1[hadoop@dog ~]$ exitlogoutConnection to dog closed.[hadoop@hadoop1 .ssh]$ ssh gangster
Last login: Sat Feb 15 18:03:45 2014 from hadoop1[hadoop@gangster ~]$ exitlogoutConnection to gangster closed.二、配置运行参数
几个主要参数要进行配置1.fs.default.name 2.hadoop.tmp.dir 3.mapred.job.tracker 4.dfs.name.dir 5.dfs.data.dir 6.dfs.http.address[root@hadoop1 ~]# su - hadoop
[hadoop@hadoop1 ~]$ cd hadoop-0.20.2/conf/[hadoop@hadoop1 conf]$ vi hadoop-env.sh export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 export HADOOP_LOG_DIR=/var/log/hadoop在各节点:[hadoop@hadoop1 ~]# chmod 777 /var/log/hadoop[hadoop@hadoop2 ~]# chmod 777 /var/log/hadoop[hadoop@hadoop3 ~]# chmod 777 /var/log/hadoop测试JAVA_HOME设置是否正确
[hadoop@hadoop1 bin]$ ./hadoop versionHadoop 0.20.2Subversion -r 911707Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010[hadoop@hadoop1 conf]$ vi core-site.xml
<configuration> <property> ##配置地址和端口 <name>fs.default.name</name> <value>hdfs://10.10.1.131:9000</value> </property></configuration>hdfs-site.xml中
dfs.http.address提供web页面显示的地址和端口默认是50070,ip是namenode的ipdfs.data.dir是datanode机器上data数据存放的位置,没有则放到core-site.xml的tmp目录中dfs.name.dir是namenode机器上name数据存放的位置,没有则放到core-site.xml的tmp目录中[hadoop@hadoop1 hadoop-0.20.2]$ pwd/home/hadoop/hadoop-0.20.2[hadoop@hadoop1 hadoop-0.20.2]$ mkdir data[hadoop@hadoop1 hadoop-0.20.2]$ mkdir name[hadoop@hadoop1 hadoop-0.20.2]$ mkdir tmp[hadoop@hadoop1 conf]$ vi hdfs-site.xml <configuration> <property> <name>dfs.name.dir</name> <value>/home/hadoop/hadoop-0.20.2/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/hadoop-0.20.2/data</value> </property> <property> <name>dfs.http.address</name> <value>10.10.1.131:50071</value> </property> <property> <name>dfs.replication</name> <value>2</value> <description>The actual number of relications can be specified when the file is created.</description> </property></configuration>[hadoop@hadoop1 conf]$ vi mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>10.10.1.131:9001</value> <description>The host and port that the MapReduce job tracker runs at.</description> </property>[hadoop@hadoop1 conf]$ vi masters
hadoop1[hadoop@hadoop1 conf]$ vi slaves
hadoop2hadoop3四、向各datanodes节点复制hadoop
[hadoop@hadoop1 ~]$ pwd/home/hadoop[hadoop@hadoop1 ~]$ scp -r hadoop-0.20.2 hadoop2:/home/hadoop/.[hadoop@hadoop1 ~]$ scp -r hadoop-0.20.2 hadoop3:/home/hadoop/.五、格式化分布式文件系统
在namenode上[hadoop@hadoop1 bin]$ pwd/home/hadoop/hadoop-0.20.2/bin[hadoop@hadoop1 bin]$ ./hadoop namenode -format14/02/18 20:23:24 INFO namenode.NameNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = hadoop1.cfzq.com/10.10.1.131STARTUP_MSG: args = [-format]STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/14/02/18 20:23:38 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop14/02/18 20:23:38 INFO namenode.FSNamesystem: supergroup=supergroup14/02/18 20:23:38 INFO namenode.FSNamesystem: isPermissionEnabled=true14/02/18 20:23:39 INFO common.Storage: Image file of size 96 saved in 0 seconds.14/02/18 20:23:39 INFO common.Storage: Storage directory /home/hadoop/hadoop-0.20.2/data has been successfully formatted.14/02/18 20:23:39 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at hadoop1.cfzq.com/10.10.1.131************************************************************/六、启动守护进程
[hadoop@hadoop1 bin]$ pwd/home/hadoop/hadoop-0.20.2/bin[hadoop@hadoop1 bin]$ ./start-all.sh starting namenode, logging to /var/log/hadoop/hadoop-hadoop-namenode-hadoop1.cfzq.com.outhadoop3: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-hadoop3.cfzq.com.outhadoop2: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-hadoop2.cfzq.com.outhadoop1: starting secondarynamenode, logging to /var/log/hadoop/hadoop-hadoop-secondarynamenode-hadoop1.cfzq.com.outstarting jobtracker, logging to /var/log/hadoop/hadoop-hadoop-jobtracker-hadoop1.cfzq.com.outhadoop3: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-hadoop3.cfzq.com.outhadoop2: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-hadoop2.cfzq.com.out在各节点上检查进程启动情况[hadoop@hadoop1 bin]$ /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/jps30042 Jps29736 NameNode29885 SecondaryNameNode29959 JobTracker[hadoop@hadoop2 conf]$ jps
13437 TaskTracker13327 DataNode13481 Jps[hadoop@hadoop3 ~]$ jps
12117 Jps11962 DataNode12065 TaskTracker七、检测hadoop群集
[hadoop@hadoop1 hadoop-0.20.2]$ pwd
/home/hadoop/hadoop-0.20.2[hadoop@hadoop1 hadoop-0.20.2]$ mkdir input[hadoop@hadoop1 hadoop-0.20.2]$ cd input[hadoop@hadoop1 input]$ echo "hello world" >test1.txt[hadoop@hadoop1 input]$ echo "hello hadoop" >test2.txt[hadoop@hadoop1 input]$ lltotal 8-rw-rw-r--. 1 hadoop hadoop 12 Feb 18 09:32 test1.txt-rw-rw-r--. 1 hadoop hadoop 13 Feb 18 09:33 test2.txt把input目录下的文件拷贝到hadoop系统里面去[hadoop@hadoop1 input]$ cd ../bin/[hadoop@hadoop1 bin]$ pwd/home/hadoop/hadoop-0.20.2/bin[hadoop@hadoop1 bin]$ ./hadoop dfs -put ../input indfs参数:指示进行分布式文件系统的操作-put参数:把操作系统的文件输入到分布式文件系统../input参数:操作系统的文件目录in参数:分布式文件系统的文件目录[hadoop@hadoop1 bin]$ ./hadoop dfs -ls ./in/*
-rw-r--r-- 2 hadoop supergroup 12 2014-02-18 10:11 /user/hadoop/in/test1.txt-rw-r--r-- 2 hadoop supergroup 13 2014-02-18 10:11 /user/hadoop/in/test2.txt/user/hadoop/in表示用户hadoop的根目录下in目录,而非操作系统的目录。-ls参数:分布式文件系统中列出目录文件[hadoop@hadoop1 hadoop-0.20.2]$ pwd
/home/hadoop/hadoop-0.20.2[hadoop@hadoop1 hadoop-0.20.2]$ bin/hadoop jar ../hadoop-0.20.2-examples.jar wordcount in out14/02/18 11:15:58 INFO input.FileInputFormat: Total input paths to process : 214/02/18 11:15:59 INFO mapred.JobClient: Running job: job_201402180923_000114/02/18 11:16:00 INFO mapred.JobClient: map 0% reduce 0%14/02/18 11:16:07 INFO mapred.JobClient: map 50% reduce 0%14/02/18 11:16:10 INFO mapred.JobClient: map 100% reduce 0%14/02/18 11:16:19 INFO mapred.JobClient: map 100% reduce 100%14/02/18 11:16:21 INFO mapred.JobClient: Job complete: job_201402180923_000114/02/18 11:16:21 INFO mapred.JobClient: Counters: 1714/02/18 11:16:21 INFO mapred.JobClient: Job Counters 14/02/18 11:16:21 INFO mapred.JobClient: Launched reduce tasks=114/02/18 11:16:21 INFO mapred.JobClient: Launched map tasks=214/02/18 11:16:21 INFO mapred.JobClient: Data-local map tasks=214/02/18 11:16:21 INFO mapred.JobClient: FileSystemCounters14/02/18 11:16:21 INFO mapred.JobClient: FILE_BYTES_READ=5514/02/18 11:16:21 INFO mapred.JobClient: HDFS_BYTES_READ=2514/02/18 11:16:21 INFO mapred.JobClient: FILE_BYTES_WRITTEN=18014/02/18 11:16:21 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2514/02/18 11:16:21 INFO mapred.JobClient: Map-Reduce Framework14/02/18 11:16:21 INFO mapred.JobClient: Reduce input groups=314/02/18 11:16:21 INFO mapred.JobClient: Combine output records=414/02/18 11:16:21 INFO mapred.JobClient: Map input records=214/02/18 11:16:21 INFO mapred.JobClient: Reduce shuffle bytes=6114/02/18 11:16:21 INFO mapred.JobClient: Reduce output records=314/02/18 11:16:21 INFO mapred.JobClient: Spilled Records=814/02/18 11:16:21 INFO mapred.JobClient: Map output bytes=4114/02/18 11:16:21 INFO mapred.JobClient: Combine input records=414/02/18 11:16:21 INFO mapred.JobClient: Map output records=414/02/18 11:16:21 INFO mapred.JobClient: Reduce input records=414/02/18 11:16:21 INFO mapred.JobClient: Reduce input records=4jar参数:提交作业wordount参数:功能为wordcountin参数:原始数据位置out参数:输出数据位置[hadoop@hadoop1 hadoop-0.20.2]$ bin/hadoop dfs -ls
Found 4 itemsdrwxr-xr-x - hadoop supergroup 0 2014-02-18 09:52 /user/hadoop/indrwxr-xr-x - hadoop supergroup 0 2014-02-18 11:16 /user/hadoop/out[hadoop@hadoop1 hadoop-0.20.2]$ bin/hadoop dfs -ls ./out
Found 2 itemsdrwxr-xr-x - hadoop supergroup 0 2014-02-18 11:15 /user/hadoop/out/_logs-rw-r--r-- 2 hadoop supergroup 25 2014-02-18 11:16 /user/hadoop/out/part-r-00000[hadoop@hadoop1 hadoop-0.20.2]$ bin/hadoop dfs -cat ./out/part-r-00000
hadoop 1hello 2world 1八、通过WEB了解hadoop的活动
[hadoop@hadoop1 bin]$ netstat -all |grep :5getnameinfo failedtcp 0 0 *:57195 *:* LISTEN tcp 0 0 hadoop1:ssh 10.10.1.198:58000 ESTABLISHED tcp 0 0 *:50030 *:* LISTEN tcp 0 0 *:50070 *:* LISTEN tcp 0 0 *:59420 *:* LISTEN tcp 0 0 *:50849 *:* LISTEN tcp 0 0 *:50090 *:* LISTEN---监控jobtracker
hadoop1 Hadoop Map/Reduce AdministrationQuick Links Scheduling Info Running Jobs Completed Jobs Failed Jobs Local Logs State: INITIALIZINGStarted: Mon Feb 17 21:33:23 CST 2014Version: 0.20.2, r911707Compiled: Fri Feb 19 08:07:34 UTC 2010 by chrisdoIdentifier: 201402172133--------------------------------------------------------------------------------Cluster Summary (Heap Size is 117.94 MB/888.94 MB)Maps Reduces Total Submissions Nodes Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes 0 0 0 0 0 0 - 0 --------------------------------------------------------------------------------Scheduling InformationQueue Name Scheduling Information default N/A --------------------------------------------------------------------------------Filter (Jobid, Priority, User, Name) Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields --------------------------------------------------------------------------------Running Jobsnone --------------------------------------------------------------------------------Completed Jobsnone --------------------------------------------------------------------------------Failed Jobsnone --------------------------------------------------------------------------------Local LogsLog directory, Job Tracker History --------------------------------------------------------------------------------Hadoop, 2014. ---监控HDFSNameNode 'hadoop1:9000'Started: Mon Feb 17 12:35:22 CST 2014 Version: 0.20.2, r911707 Compiled: Fri Feb 19 08:07:34 UTC 2010 by chrisdo Upgrades: There are no upgrades in progress.Browse the filesystem
Namenode Logs --------------------------------------------------------------------------------Cluster Summary1 files and directories, 0 blocks = 1 total. Heap Size is 117.94 MB / 888.94 MB (13%)Configured Capacity : 0 KB
DFS Used : 0 KB Non DFS Used : 0 KB DFS Remaining : 0 KB DFS Used% : 100 % DFS Remaining% : 0 % Live Nodes : 0 Dead Nodes : 0There are no datanodes in the cluster
--------------------------------------------------------------------------------NameNode Storage: Storage Directory Type State /tmp/hadoop-hadoop/dfs/name IMAGE_AND_EDITS Active--------------------------------------------------------------------------------
Hadoop, 2014.