App下載

Hadoop高可用搭建超詳細

猿友 2021-01-08 16:41:29 瀏覽數(shù) (2399)
反饋

Hadoop高可用搭建超詳細

實驗環(huán)境

master:192.168.10.131
slave1:192.168.10.129
slave2:192.168.10.130
操作系統(tǒng)ubuntu-16.04.3
hadoop-2.7.1
zookeeper-3.4.8

安裝步驟

1.安裝jd

  • 將jdk安裝到opt目錄下
tar -zvxf jdk-8u221-linux-x64.tar.gz
  • 配置環(huán)境變量
vim etc/profile
#jdk
export JAVA_HOME=/opt/jdk1.8.0_221
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
source etc/profile

2.修改hostname

分別將三臺虛擬機的修改為master、slave1、slave2

vim  /etc/hostname

3.修改hosts映射,并配置ssh免密登錄

  • 修改hosts文件,每臺主機都需進行以下操作
vim /etc/hosts
192.168.10.131 master
192.168.10.129 slave1
192.168.10.130 slave2
  • 配置ssh免密
    首先需要關(guān)閉防火墻
1、查看端口開啟狀態(tài)
sudo ufw status
2、開啟某個端口,比如我開啟的是8381
sudo ufw allow 8381
3、開啟防火墻
sudo ufw enable
4、關(guān)閉防火墻
sudo ufw disable
 5、重啟防火墻
 sudo ufw reload
 6、禁止外部某個端口比如80
 sudo ufw delete allow 80
 7、查看端口ip
 netstat -ltn

集群在啟動的過程中需要ssh遠程登錄到別的主機上,為了避免每次輸入對方主機的密碼,我們需要配置免密碼登錄(提示操作均按回車)

ssh-keygen -t rsa

將每臺主機的公匙復(fù)制給自己以及其他主機

ssh-copy-id -i ~/.ssh/id_rsa.pub root@master
ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave2

4.設(shè)置時間同步

  • 安裝ntpdate服務(wù)
apt-get install ntpdate
  • 修改ntp配置文件
vim /etc/ntp.conf

# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help

driftfile /var/lib/ntp/ntp.drift

# Enable this if you want statistics to be logged.
#statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable

# Specify one or more NTP servers.

# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board
# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for
# more information.
#pool 0.ubuntu.pool.ntp.org iburst
#pool 1.ubuntu.pool.ntp.org iburst
#pool 2.ubuntu.pool.ntp.org iburst
#pool 3.ubuntu.pool.ntp.org iburst

# Use Ubuntu's ntp server as a fallback.
#pool ntp.ubuntu.com

# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details.  The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.

# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery limited
restrict -6 default kod notrap nomodify nopeer noquery limited

# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1

# Needed for adding pool entries
restrict source notrap nomodify noquery


# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
# 允許局域網(wǎng)內(nèi)設(shè)備與這臺服務(wù)器進行同步時間.但是拒絕讓他們修改服務(wù)器上的時間
#restrict 192.168.10.131 mask 255.255.255.0 nomodify notrust
#statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable

# Specify one or more NTP servers.

# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board
# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for
# more information.
#pool 0.ubuntu.pool.ntp.org iburst
#pool 1.ubuntu.pool.ntp.org iburst
#pool 2.ubuntu.pool.ntp.org iburst
#pool 3.ubuntu.pool.ntp.org iburst

# Use Ubuntu's ntp server as a fallback.
#pool ntp.ubuntu.com

# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details.  The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.

# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery limited
restrict -6 default kod notrap nomodify nopeer noquery limited

# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1

# Needed for adding pool entries
restrict source notrap nomodify noquery


# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
# 允許局域網(wǎng)內(nèi)設(shè)備與這臺服務(wù)器進行同步時間.但是拒絕讓他們修改服務(wù)器上的時間
#restrict 192.168.10.131 mask 255.255.255.0 nomodify notrust
restrict 192.168.10.129 mask 255.255.255.0 nomodify notrust
restrict 192.168.10.130 mask 255.255.255.0 nomodify notrust

# 允許上層時間服務(wù)器修改本機時間
#restrict times.aliyun.com nomodify
#restrict ntp.aliyun.com  nomodify
#restrict cn.pool.ntp.org nomodify 

# 定義要同步的時間服務(wù)器
server 192.168.10.131 perfer
#server times.aliyun.com iburst prefer    # prefer表示為優(yōu)先,表示本機優(yōu)先同步該服務(wù)器時間
#server ntp.aliyun.com iburst
#server cn.pool.ntp.org iburst

#logfile /var/log/ntpstats/ntpd.log    # 定義ntp日志目錄
#pidfile  /var/run/ntp.pid    # 定義pid路徑

# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255

# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines.  Please do this only if you trust everybody on the network!
#disable auth
#broadcastclient

#Changes recquired to use pps synchonisation as explained in documentation:
#http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm#AEN3918

#server 127.127.8.1 mode 135 prefer    # Meinberg GPS167 with PPS
#fudge 127.127.8.1 time1 0.0042        # relative to PPS for my hardware

#server 127.127.22.1                   # ATOM(PPS)
#fudge 127.127.22.1 flag3 1            # enable PPS API
server 127.127.1.0
fudge 127.127.1.0 stratum 10

  • 啟動ntpd服務(wù),并查看ntp同步狀態(tài)
service ntpd start  #啟動ntp服務(wù)
ntpq -p        #觀察時間同步狀況
ntpstat         #查看時間同步結(jié)果
  • 重啟服務(wù),與master主機時間同步
/etc/init.d/ntp restart
ntpdate 192.168.10.131

5.安裝hadoop至/opt/data目錄下

  • 在/opt目錄下新建Data目錄
cd /opt
mkdir Data
  • 下載并解壓hadoop至/opt/data目錄
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/
tar -zvxf hadoop-2.7.1.tar /opt/data
  • 配置環(huán)境變量
# HADOOP
export HADOOP_HOME=/opt/Data/hadoop-2.7.1
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
export HADOOP_YARN_HOME=$HADOOP_HOME

6.修改hadoop配置文件

文件目錄hadoop-2.7.1/etc/hadoop

  • 修改hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_221
  • 修改core-site.xml
<configuration>
<!-- 指定hdfs的nameservice為ns1 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ns1/</value>
    </property>
    
<!-- 指定hadoop臨時目錄 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/Data/hadoop-2.7.1/tmp</value>
    </property>
    
<!-- 指定zookeeper地址 -->
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>slave1:2181,slave2:2181</value>
    </property>
    
<!--修改core-site.xml中的ipc參數(shù),防止出現(xiàn)連接journalnode服務(wù)ConnectException-->
    <property>
        <name>ipc.client.connect.max.retries</name>
        <value>100</value>
    <description>Indicates the number of retries a client will make to establish a server connection.</description>
    </property>
</configuration>
  • 修改hdfs-site.xml
<configuration>
<!--指定hdfs的nameservice為ns1,需要和core-site.xml中的保持一致 -->
   <property>
      <name>dfs.nameservices</name>
      <value>ns1</value>
   </property>

<!-- ns1下面有兩個NameNode,分別是nn1,nn2 -->
   <property>
      <name>dfs.ha.namenodes.ns1</name>
      <value>nn1,nn2</value>
   </property>

<!-- nn1的RPC通信地址 -->
   <property>
      <name>dfs.namenode.rpc-address.ns1.nn1</name>
      <value>master:9820</value>
   </property>

<!-- nn1的http通信地址 -->
   <property>
      <name>dfs.namenode.http-address.ns1.nn1</name>
      <value>master:9870</value>
   </property>

<!-- nn2的RPC通信地址 -->
   <property>
      <name>dfs.namenode.rpc-address.ns1.nn2</name>
      <value>slave1:9820</value>
   </property>

<!-- nn2的http通信地址 -->
   <property>
      <name>dfs.namenode.http-address.ns1.nn2</name>
      <value>slave1:9870</value>
   </property>

<!-- 指定NameNode的日志在JournalNode上的存放位置 -->
   <property>
      <name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;slave1:8485;slave2:8485/ns1</value>
   </property>

<!-- 指定JournalNode在本地磁盤存放數(shù)據(jù)的位置 -->
   <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/opt/Data/hadoop-2.7.1/journal</value>
   </property>

<!-- 開啟NameNode失敗自動切換 -->
   <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
   </property>

<!-- 配置失敗自動切換實現(xiàn)方式 -->
   <property>
      <name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
   </property>

<!-- 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行-->
   <property>
      <name>dfs.ha.fencing.methods</name>
      <value>
      sshfence
      shell(/bin/true)
     </value>
   </property>

<!-- 使用sshfence隔離機制時需要ssh免登陸 -->
   <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/root/.ssh/id_rsa</value>
   </property>

<!-- 配置sshfence隔離機制超時時間 -->
   <property>
      <name>dfs.ha.fencing.ssh.connect-timeout</name>
      <value>30000</value>
   </property>
   
   <!--配置namenode存放元數(shù)據(jù)的目錄,可以不配置,如果不配置則默認放到hadoop.tmp.dir下-->
   <property>
      <name>dfs.namenode.name.dir</name>
      <value>/opt/Data/hadoop-2.7.1/data/name</value>
   </property>
   
   <!--配置datanode存放元數(shù)據(jù)的目錄,可以不配置,如果不配置則默認放到hadoop.tmp.dir下-->
   <property>
      <name>dfs.datanode.data.dir</name>
      <value>/opt/Data/hadoop-2.7.1/data/data</value>
   </property>
   
    <!--配置復(fù)本數(shù)量-->
   <property>
      <name>dfs.replication</name>
      <value>2</value>
   </property>
   
   <!--設(shè)置用戶的操作權(quán)限,false表示關(guān)閉權(quán)限驗證,任何用戶都可以操作-->
   <property>
      <name>dfs.webhdfs.enabled</name>
      <value>true</value>
   </property>
   
</configuration>
  • 修改mapred-site.xml
將文件名修改為mapred-site.xml
cp mapred-queues.xml.template mapred-site.xml


<configuration>
    <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
    </property>

</configuration>
  • 修改yarn-site.xml
<configuration>
<!-- 指定nodemanager啟動時加載server的方式為shuffle server -->
<property>
<name>yarn.nodemanager.aux-services</name>   
<value>mapreduce_shuffle</value> 
</property>

<!--配置yarn的高可用-->
<property>   
<name>yarn.resourcemanager.ha.enabled</name>   
<value>true</value> 
</property>

<!--執(zhí)行yarn集群的別名-->        
<property>   
<name>yarn.resourcemanager.cluster-id</name>   
<value>cluster1</value> 
</property> 

<!--指定兩個resourcemaneger的名稱-->
<property>  
<name>yarn.resourcemanager.ha.rm-ids</name>   
<value>rm1,rm2</value> 
</property> 

<!--配置rm1的主機-->
<property>   
<name>yarn.resourcemanager.hostname.rm1</name>   
<value>master</value> 
</property> 

<!--配置rm2的主機-->
<property>   
<name>yarn.resourcemanager.hostname.rm2</name>   
<value>slave1</value> 
</property>      
                         
<!--配置2個resourcemanager節(jié)點--> 
<property>   
<name>yarn.resourcemanager.zk-address</name>   
<value>slave1:2181,slave2:2181</value> 
</property>        
                       
<!--zookeeper集群地址-->
<property>    
<name>yarn.nodemanager.vmem-check-enabled</name>    
<value>false</value>    
<description>Whether virtual memory limits will be enforced for containers</description>
</property>

<!--物理內(nèi)存8G-->  
<property>    
<name>yarn.nodemanager.vmem-pmem-ratio</name>    
<value>8</value>             
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
</configuration>
  • 修改slave
master
slave1
slave2

7.zookeeper集群安裝配置

  • 下載并解壓zookeeper-3.4.8.tar.gz
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz
tar -zvxf zookeeper-3.4.8.tar.gz /opt/Data
  • 修改配置文件
#zookeeper
export ZOOKEEPER_HOME=/opt/Data/zookeeper-3.4.8
export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin

進入conf目錄,復(fù)制zoo-sample.cfg為zoo.cfg

cp zoo-sample.cfg zoo.cfg
  • 修改zoo.cfg
dataDir=/opt/Data/zookeeper-3.4.8/tmp  //需要在zookeeper-3.4.8目錄下新建tmp目錄

server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
  • 在tmp目錄中創(chuàng)建myid文件
vim myid 

1     //其他主機需要修改該編號  2,3

8.啟動集群

  • 格式化master主機namenode。/etc/hadoop目錄下輸入該命令
hadoop namenode -format
  • 將Data目錄拷貝到其他兩臺主機上
scp -r /opt/Data root@slave1:/opt
scp -r /opt/Data root@slave2:/opt
  • 啟動zookeeper,所有節(jié)點均執(zhí)行
hadoop-daemon.sh start zkfc
  • 格式化zookeeper,所有節(jié)點均執(zhí)行
hdfs zkfc -formatZK
  • 啟動journalnode,namenode備用節(jié)點相同(hadoop-2.7.1目錄下執(zhí)行)
hadoop-daemon.sh start journalnode
  • 啟動集群
start-all.sh
  • 查看端口
netstat -ntlup  #可以查看服務(wù)端占用的端口
  • 查看進程jps
    master
    slave1
    slave2
  • Web查看集群情況
    (namenode節(jié)點ip地址:9870)
    20210106172557738

20210106172827319

2021010617263890

推薦好課:Hadoop 教程


0 人點贊