[HD] Hadoop开源环境搭建(集群模式):1.HDFS/Yarn

常用的华为FusionInsight C60U10中各组件的版本,以此作为兼容参考

HDFS:2.7.2
Hive:1.3.0
HBase:1.0.2
Spark:1.5.1
Solr:5.3.1
Flume:1.6.0
Kafka:2.10-0.10.0.0
Storm:0.10.0
Hue:3.9.0

本文配置:Redhat6.5、JDK-jdk1.7.0_79 、 Hadoop-hadoop-2.7.3

三节点配置和名称(主 6C18G、子4C16G):

 

192.168.111.140 HMASTER
192.168.111.141 HDATA01
192.168.111.142 HDATA02

详细步骤如下

一、机器配置

1.配置主机:

Step1 添加hosts

$ vi /etc/hosts
######Hadoop Nodes
192.168.111.140 HMASTER
192.168.111.141 HDATA01
192.168.111.142 HDATA02

 

2.配置JDK

安装在/usr/local下,然后 vi .bash_profile

##############
### Env--JDK
##############
export JAVA_HOME=/usr/local/jdk1.7.0_79
export JRE_HOME=${JAVA_HOME}/jre  
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib  
export PATH=${JAVA_HOME}/bin:$PATH

 

3.创建用户(可选)

$ groupadd hadoop
$ useradd -m hadoop -d /hame/hadoop -s /bin/sh -G hadoop,root

如果已经有了hadoop用户想要加到root组,注意是:usermod -a -G root hadoop

 

 

二、配置ssh免密登录

4.配置ssh免密

Step4.1 登录Master节点,创建ssh-key,生成公钥和私钥


$ ssh-keygen -t rsa -P ''

出现图形, 然后导入公钥


$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys

然后赋权限:


$ chmod 600 .ssh/*

可以ssh试一下,如果报错”Agent admitted failure to sign using the key.”,使用以下命令可以解决:


$ ssh-add ~/.ssh/id_rsa

 

Step4.2 登录数据节点,同理创建ssh-key

HData01:
$ ssh-keygen -t rsa -P ''
$ scp .ssh/id_rsa.pub hadoop@Hmaster:/home/hadoop/id_rsa_HData01.pub
HData02:
$ ssh-keygen -t rsa -P ''
$ scp .ssh/id_rsa.pub hadoop@Hmaster:/home/hadoop/id_rsa_HData02.pub

 

Step4.3 登录数据节点

$ cat /home/hadoop/id_rsa_HData01.pub >> .ssh/authorized_keys
$ cat /home/hadoop/id_rsa_HData02.pub >> .ssh/authorized_keys
传到数据节点上
$ scp .ssh/authorized_keys hadoop@HData01:/home/hadoop/.ssh/authorized_keys等
并修改数据节点上的密钥文件权限为600

 

 

三、安装Hadoop

5.把hadoop源码包复制到/home/hadoop下,然后解压。Hadoop用户操作:

$ tar zxvf hadoop-2.7.3.tar.gz

 

6.更改配置文件,共有以下配置文件需要修改


Step6.1 修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh 	(修改 export JAVA_HOME=/usr/local/jdk1.7.0_79)
Step6.2 修改$HADOOP_HOME/etc/hadoop/yarn-env.sh 	(修改 export JAVA_HOME=/usr/local/jdk1.7.0_79)
$HADOOP_HOME/etc/hadoop/core-site.xml
参考如下:
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://HMaster:9000</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
    <!-- 指定hadoop临时目录,自行创建 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
    </property>
    <!-- for Hadoop AuthorizationException): User: hadoop is not allowed to impersonate hive这个是我之后安装hiveserver2要用的-->
    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
</configuration>


Step6.3 修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml
参考如下:
<configuration>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>HMaster:50090</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/hadoop/tmp/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/home/hadoop/tmp/hdfs/data</value>
    </property>
备注:replication 是数据副本数量,默认为3,salve少于3台就会报错


Step6.4 修改$HADOOP_HOME/etc/hadoop/mapred-site.xml
参考如下:
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.address</name>
          <value>HMaster:10020</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.webapp.address</name>
          <value>HMaster:19888</value>
  </property>
</configuration>
备注:jobTracker的地址和端口

Step6.5 修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
参考如下:
<configuration>
<!-- Configurations for ResourceManager -->
     <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
     </property>
     <property>
           <name>yarn.resourcemanager.address</name>
           <value>HMaster:8032</value>
     </property>
     <property>
          <name>yarn.resourcemanager.scheduler.address</name>
          <value>HMaster:8030</value>
      </property>
     <property>
         <name>yarn.resourcemanager.resource-tracker.address</name>
         <value>HMaster:8031</value>
     </property>
     <property>
         <name>yarn.resourcemanager.admin.address</name>
         <value>HMaster:8033</value>
     </property>
     <property>
         <name>yarn.resourcemanager.webapp.address</name>
         <value>HMaster:8088</value>
     </property>
</configuration>


Step6.6 修改$HADOOP_HOME/etc/hadoop/slaves
参考如下:
HDATA01
HDATA02

 

7.格式化HDFS文件系统和启动NameNode

Step7.1 将主节点上的Hadoop_Home复制到HData01,HData02

$ scp -r hadoop-2.7.3 hadoop@HDATA01:/home/hadoop/
$ scp -r hadoop-2.7.3 hadoop@HDATA02:/home/hadoop/

 

Step7.2 格式化文件系统(主节点上执行即可)HMaster


$ ${HADOOP_HOME}/bin/hdfs namenode –format

 

Step7.2 启动Name


$ ${HADOOP_HOME}/sbin/start-dfs.sh

 

Step7.3 启动Yarn


$ $HADOOP_HOME/sbin/start-yarn.sh

 

验证可看到,1>Master节点存在进程:

8838 SecondaryNameNode

9266 Jps

8630 NameNode

2756 DataNode

3802 NodeManager

9001 ResourceManager

2>HData01、HData02存在进程

4194 Jps

2741 DataNode

4059 NodeManager

分类上一篇:     分类下一篇:

Leave a Reply