美文网首页小卜java
Hadoop系列-HBase数据库(二)

Hadoop系列-HBase数据库(二)

作者: 汤太咸啊 | 来源:发表于2022-04-01 18:01 被阅读0次

    一、Hadoop Hbase问题

    前几天写了安装HBase,结果这几天发现通过hbase shell执行进去,status总是报错

    #命令进入hbase shell
    hbase shell
    #查看hbase状态
    status
    #总是报以下错误
    ERROR:Can't get master address from Zookeeper;znode data==null
    

    经过了三天的排查,终于发现了是,hadoop 2.7.0的版本太旧了,需要重新安装个更新的hadoop 2.9.3版本,下面咱们开始把hadoop和hbase一起重新安装一遍,就当复习了。

    二、清理原有服务配置服务器以及安装java

    #停止并删除原有的hadoop路径
    cd /usr/local/hadoop/sbin
    ./stop-all.sh
    cd ../..
    rm -rf ./hadoop
    
    #编辑yum安装源
    vi /etc/yum.repos.d/CentOS-Base.repo
    #修改为如下内容
    [base]
    name=CentOS-$releasever - Base
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os&infra=$infra
    #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
    baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/os/x86_64/
    gpgcheck=1
    #gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
    gpgkey=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/os/x86_64/RPM-GPG-KEY-CentOS-6
    
    #released updates
    [updates]
    name=CentOS-$releasever - Updates
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates&infra=$infra
    #baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
    baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/updates/x86_64/
    gpgcheck=1
    #gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
    gpgkey=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/os/x86_64/RPM-GPG-KEY-CentOS-6
    
    #additional packages that may be useful
    [extras]
    name=CentOS-$releasever - Extras
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras&infra=$infra
    #baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
    baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/extras/x86_64/
    gpgcheck=1
    #gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
    gpgkey=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/os/x86_64/RPM-GPG-KEY-CentOS-6
    
    #additional packages that extend functionality of existing packages
    [centosplus]
    name=CentOS-$releasever - Plus
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=centosplus&infra=$infra
    #baseurl=http://mirror.centos.org/centos/$releasever/centosplus/$basearch/
    baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/centosplus/x86_64/
    gpgcheck=1
    enabled=0
    #gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
    gpgkey=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/os/x86_64/RPM-GPG-KEY-CentOS-6
    
    #contrib - packages by Centos Users
    [contrib]
    name=CentOS-$releasever - Contrib
    #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=contrib&infra=$infra
    #baseurl=http://mirror.centos.org/centos/$releasever/contrib/$basearch/
    baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/contrib/x86_64/
    gpgcheck=1
    enabled=0
    #gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
    gpgkey=https://mirrors.tuna.tsinghua.edu.cn/centos-vault/6.8/os/x86_64/RPM-GPG-KEY-CentOS-6
    
    #修改https验证,增加下面的忽略ssl验证
    vi /etc/yum.conf
    sslverify=false
    
    #安装jdk
    yum install java-1.8.0-openjdk.x86_64
    
    #编辑环境变量.bash_profile
    vi ~/.bash_profile
    #.bash_profile修改具体内容如下
    export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64
    PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
    export PATH
    
    #编辑环境变量.bashrc
    vi ~/.bashrc
    #.bashrc修改具体内容如下
    export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64
    PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
    export PATH
    
    #生效环境变量
    source ~/.bashrc
    source ~/.bash_profile
    

    三、配置Hadoop

    #先将hadoop和hbase文件复制到docker中,此执行在我的mac中
    docker cp Downloads/hadoop-2.9.2.tar.gz master:/usr/local
    docker cp Downloads/hbase-1.7.1-bin.tar.gz master:/usr/local
    #解压hadoop并修改路径名称
    cd /usr/local/
    tar -xzvf hadoop-2.9.2.tar.gz
    mv hadoop-2.9.2 hadoop
    
    #进入hadoop配置路径
    cd /usr/local/hadoop/etc/hadoop/
    
    #修改core-site.xml
    vi core-site.xml
      <configuration>
          <property>
              <name>fs.defaultFS</name>
              <value>hdfs://master:9000</value>
          </property>
          <property>
            <name>hadoop.tmp.dir</name>
            <value>/hadoop/tmp</value>
          </property>
      </configuration>
      
    #修改hdfs-site.xml
    vi hdfs-site.xml
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/hadoop/data</value>
      </property>
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/hadoop/name</value>
      </property>
    </configuration>
    
    #修改mapped-site.xml
    vi mapped-site.xml
    <configuration>
      <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
    </property>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>
    
    #备份并修改yarn-site.xml
    cp yarn-site.xml yarn-site.xml.bak
    rm yarn-site.xml
    vi yarn-site.xml
    <configuration>
            <property>
         <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
     </property>
     <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
       <property>
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value> </property> <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8031</value>
    </property>
    <property>
    <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:8088</value>
    </property>
    </configuration>
    
    #建立路径,与上面的配置保持一致
    mkdir -p /hadoop/name
    mkdir -p /hadoop/data
    mkdir -p /hadoop/tmp
    
    #到/usr/local/hadoop/bin/目录下运行,验证初始化是否有问题
    cd /usr/local/hadoop/bin/
    ./hadoop namenode -format./hadoop namenode -format
    
    --启动集群
    cd ../sbin
    ./start-all.sh
    
    --jps查看服务是否启动成功
    jps
    223 DataNode
    612 ResourceManager
    129 NameNode
    704 NodeManager
    1725 Jps
    388 SecondaryNameNode
    

    四、配置HBase

    #解压,在Hadoop部分已经cp到docker中了
    tar -xvzf hbase-1.7.1-bin.tar.gz
    
    #编辑配置文件hbase-site.xml
    cd /usr/local/hbase-1.7.1/conf
    vi hbase-site.xml
    <configuration>
      <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
      </property>
      <property>
        <name>hbase.rootdir</name>
        <value>hdfs://localhost:9000/hbase</value>
      </property>
      <property>
        <name>hbase.tmp.dir</name>
        <value>./tmp</value>
      </property>
      <property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
      </property>
      <property>
        <name>hbase.master.info.port</name>
        <value>16010</value>
      </property>
      <property>
        <name>hbase.regionserver.port</name>
        <value>16201</value>
      </property>
      <property>
        <name>hbase.regionserver.info.port</name>
        <value>16301</value>
      </property>
    </configuration>
    
    #编辑配置文件hbase-env.sh
    vi hbase-env.sh
    export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64
    
    #配置hbase环境变量
    vi ~/.bash_profile
    export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64
    export HADOOP_HOME=/usr/local/hadoop
    export HBASE_HOME=/usr/local/hbase-1.7.1
    PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
    export PATH
    source ~/.bash_profile
    
    #编辑环境变量.bashrc
    vi ~/.bashrc
    #.bashrc修改具体内容如下
    export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64
    export HADOOP_HOME=/usr/local/hadoop
    export HBASE_HOME=/usr/local/hbase-1.7.1
    PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
    export PATH
    source ~/.bashrc
    
    #建立hbase在Hadoop上的路径,与上面的hbase-site.xml配置的路径对应
    hadoop fs -mkdir /hbase
    
    #启动hbase
    ./start-hbase.sh
    
    #查看启动状态,这次出现了HMaster,HRegionServer,HQuorumPeer,启动成功
    jps
    3505 DataNode
    7711 HMaster
    8276 Jps
    7824 HRegionServer
    3699 SecondaryNameNode
    3952 NodeManager
    7611 HQuorumPeer
    3857 ResourceManager
    3408 NameNode
    
    #接下来看看一开始的问题,执行hbase shell进入shell
    hbase shell
    #进入shell命令行
    hbase(main):001:0> status
    1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
    

    五、HBase常用命令

    #查看HBase数据库状态
    hbase(main):001:0> status
    1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
    #查看当前操作HBase的用户
    hbase(main):002:0> whoami
    root (auth:SIMPLE)
        groups: root
    #查看所有表
    hbase(main):003:0> list
    TABLE 
    0 row(s) in 0.0950 seconds
    
    => []
    #建表
    hbase(main):004:0> create 'census','personal','professional'
    0 row(s) in 1.6170 seconds
    
    => Hbase::Table - census
    #再次查看所有表,可以看到刚建立的census
    hbase(main):005:0> list
    TABLE 
    census
    1 row(s) in 0.0170 seconds
    
    => ["census"]
    #通过describe查看表census信息
    hbase(main):006:0> describe 'census'
    Table census is ENABLED               
    census
    COLUMN FAMILIES DESCRIPTION           
    {NAME => 'personal', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>
     '0'} 
    {NAME => 'professional', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOP
    E => '0'}                             
    2 row(s) in 0.1050 seconds
    #查看census数据条数
    hbase(main):007:0> count 'census'
    0 row(s) in 0.1060 seconds
    
    => 0
    #插入数据,注意按照顺序来分别是:表名'census',行ID 1,列族personal,列名name,值Zhang San
    hbase(main):009:0> put 'census', 1,'personal:name','Zhang San'
    0 row(s) in 0.1590 seconds
    #插入数据personal列族数据中的婚姻状态
    hbase(main):010:0> put 'census', 1,'personal:marital_stauts','unmarried'
    0 row(s) in 0.0090 seconds
    #查看数据,id都为1表明是同一条数据,另外注意每条数据有个timestamp,表示这条数据的插入时间戳,以便来区分不同版本
    hbase(main):011:0> scan 'census'
    ROW        COLUMN+CELL                
     1         column=personal:marital_stauts, timestamp=1648028740145, value=unmarried             
     1         column=personal:name, timestamp=1648028690316, value=Zhang San                       
    1 row(s) in 0.0260 seconds
    #继续插入数据
    hbase(main):012:0> put 'census', 1,'personal:gender','male'
    0 row(s) in 0.0120 seconds
    hbase(main):014:0> put 'census', 1,'professional:employed','yes'
    0 row(s) in 0.0130 seconds
    hbase(main):016:0> put 'census', 1,'professional:education_level','high scholl'
    0 row(s) in 0.0070 seconds
    hbase(main):017:0> put 'census', 1,'professional:filed','construction'
    0 row(s) in 0.0110 seconds
    #查看数据
    hbase(main):018:0> scan 'census'
    ROW        COLUMN+CELL                
     1         column=personal:gender, timestamp=1648030369641, value=male                          
     1         column=personal:marital_stauts, timestamp=1648028740145, value=unmarried             
     1         column=personal:name, timestamp=1648028690316, value=Zhang San                       
     1         column=professional:education_level, timestamp=1648030509236, value=high scholl      
     1         column=professional:employed, timestamp=1648030420105, value=yes                     
     1         column=professional:filed, timestamp=1648030551921, value=construction  
    #继续插入新ID
    hbase(main):019:0> put 'census', 3,'personal:name','li si'
    0 row(s) in 0.0280 seconds
    hbase(main):020:0> put 'census', 3,'personal:marital_stauts','married'
    0 row(s) in 0.0150 seconds
    hbase(main):021:0> put 'census', 3,'personal:spouse','zhang san'
    0 row(s) in 0.0140 seconds
    hbase(main):016:0> put 'census', 3,'professional:education_level','middle scholl'
    0 row(s) in 0.5720 seconds
    #查看可以看到两条数据
    hbase(main):002:0>  scan 'census'
    ROW        COLUMN+CELL                
     1         column=personal:gender, timestamp=1648030369641, value=male                          
     1         column=personal:marital_stauts, timestamp=1648028740145, value=unmarried             
     1         column=personal:name, timestamp=1648028690316, value=Zhang San                       
     1         column=professional:education_level, timestamp=1648030509236, value=high scholl      
     1         column=professional:employed, timestamp=1648030420105, value=yes                     
     1         column=professional:filed, timestamp=1648030551921, value=construction               
     3         column=personal:marital_stauts, timestamp=1648030685792, value=married               
     3         column=personal:name, timestamp=1648030680907, value=li si                           
     3         column=personal:spouse, timestamp=1648030690084, value=zhang san                     
     3         column=professional:education_level, timestamp=1648030880173, value=middle scholl    
    2 row(s) in 0.1200 seconds
    #更新数据
    hbase(main):003:0>  put 'census', 3,'professional:education_level','high scholl'
    0 row(s) in 0.0260 seconds
    #可以看到学历变成了高中
    hbase(main):004:0>  scan 'census'
    ROW        COLUMN+CELL                
     1         column=personal:gender, timestamp=1648030369641, value=male                          
     1         column=personal:marital_stauts, timestamp=1648028740145, value=unmarried             
     1         column=personal:name, timestamp=1648028690316, value=Zhang San                       
     1         column=professional:education_level, timestamp=1648030509236, value=high scholl      
     1         column=professional:employed, timestamp=1648030420105, value=yes                     
     1         column=professional:filed, timestamp=1648030551921, value=construction               
     3         column=personal:marital_stauts, timestamp=1648030685792, value=married               
     3         column=personal:name, timestamp=1648030680907, value=li si                           
     3         column=personal:spouse, timestamp=1648030690084, value=zhang san                     
     3         column=professional:education_level, timestamp=1648031013249, value=high scholl      
    2 row(s) in 0.0810 seconds
    #最后我们增加一个id为2
    hbase(main):005:0> put 'census', 2,'personal:name','wang wu'
    0 row(s) in 0.0200 seconds
    #查询结果可以看到,ROW的id是按照升序排列的
    hbase(main):006:0>  scan 'census'
    ROW        COLUMN+CELL                
     1         column=personal:gender, timestamp=1648030369641, value=male                          
     1         column=personal:marital_stauts, timestamp=1648028740145, value=unmarried             
     1         column=personal:name, timestamp=1648028690316, value=Zhang San                       
     1         column=professional:education_level, timestamp=1648030509236, value=high scholl      
     1         column=professional:employed, timestamp=1648030420105, value=yes                     
     1         column=professional:filed, timestamp=1648030551921, value=construction               
     2         column=personal:name, timestamp=1648031084938, value=wang wu                         
     3         column=personal:marital_stauts, timestamp=1648030685792, value=married               
     3         column=personal:name, timestamp=1648030680907, value=li si                           
     3         column=personal:spouse, timestamp=1648030690084, value=zhang san                     
     3         column=professional:education_level, timestamp=1648031013249, value=high scholl      
    3 row(s) in 0.0600 seconds
    #数一下是3条数据了
    hbase(main):007:0> count 'census'
    3 row(s) in 0.0270 seconds
    
    => 3
    #查询id为1的数据
    hbase(main):008:0> get 'census',1
    COLUMN     CELL                       
     personal:gender                     timestamp=1648030369641, value=male                  
     personal:marital_stauts             timestamp=1648028740145, value=unmarried             
     personal:name                       timestamp=1648028690316, value=Zhang San             
     professional:education_level        timestamp=1648030509236, value=high scholl           
     professional:employed               timestamp=1648030420105, value=yes                   
     professional:filed                  timestamp=1648030551921, value=construction          
    1 row(s) in 0.0700 seconds
    #查询id为1的数据的name字段的值
    hbase(main):011:0> get 'census',1,'personal:name'
    COLUMN     CELL                       
     personal:name                       timestamp=1648028690316, value=Zhang San             
    1 row(s) in 0.0400 seconds
    #查询id为1的数据的name和employed字段的值
    hbase(main):012:0> get 'census',1,'personal:name','professional:employed'
    COLUMN     CELL                       
     personal:name                       timestamp=1648028690316, value=Zhang San             
     professional:employed               timestamp=1648030420105, value=yes                   
    1 row(s) in 0.0150 seconds
    #查看所有的,只显示name字段值
    hbase(main):014:0> scan 'census',{COLUMN=>['personal:name']}
    ROW        COLUMN+CELL                
     1         column=personal:name, timestamp=1648028690316, value=Zhang San                       
     2         column=personal:name, timestamp=1648031084938, value=wang wu                         
     3         column=personal:name, timestamp=1648030680907, value=li si                           
    3 row(s) in 0.0150 seconds
    #查看所有的,只显示name字段值,只查询一条
    hbase(main):015:0> scan 'census',{COLUMN=>['personal:name'],LIMIT=>1}
    ROW        COLUMN+CELL                
     1         column=personal:name, timestamp=1648028690316, value=Zhang San                       
    1 row(s) in 0.0210 seconds
    #查看所有的,只显示name字段值,从第二条数据开始只查询一条
    hbase(main):017:0> scan 'census',{COLUMN=>['personal:name'],LIMIT=>1,STARTROW=>"2"}
    ROW        COLUMN+CELL                
     2         column=personal:name, timestamp=1648031084938, value=wang wu                         
    1 row(s) in 0.0250 seconds
    #删除数据
    hbase(main):019:0> delete 'census',1,'personal:marital_stauts'
    0 row(s) in 0.0540 seconds
    #查看id为1的数据,没有了婚姻状态
    hbase(main):021:0> get 'census',1
    COLUMN     CELL                       
     personal:gender                     timestamp=1648030369641, value=male                  
     personal:name                       timestamp=1648028690316, value=Zhang San             
     professional:education_level        timestamp=1648030509236, value=high scholl           
     professional:employed               timestamp=1648030420105, value=yes                   
     professional:filed                  timestamp=1648030551921, value=construction          
    1 row(s) in 0.0180 seconds
    #删除表前,需要先disable表
    hbase(main):023:0> disable 'census'
    0 row(s) in 2.3830 seconds
    #如果不想删除则enable表即可
    hbase(main):024:0> enable 'census'
    0 row(s) in 1.3250 seconds
    #没有disable的表删除会报错
    hbase(main):025:0> drop 'census'
    
    ERROR: Table census is enabled. Disable it first.
    
    Here is some help for this command:
    Drop the named table. Table must first be disabled:
      hbase> drop 't1'
      hbase> drop 'ns1:t1'
    #删除表前,需要先disable表
    hbase(main):026:0> disable 'census'
    0 row(s) in 2.2280 seconds
    #删除表
    hbase(main):027:0> drop 'census'
    0 row(s) in 1.3040 seconds
    #再次查看已经没有该表了
    hbase(main):028:0> list
    TABLE 
    0 row(s) in 0.0290 seconds
    
    => []
    

    终于把这部分完成了,相当于又把hadoop重新搭建了好几遍才解决这个问题,其实就是几个配置的错误,来回查看才发现问题,网上找的帖子都是骗子,一点都不好用。。。。

    谢各位的阅读,谢谢您动动手指点赞,万分感谢各位。另外以下是我之前写过的文章,感兴趣的可以点进去继续阅读。

    历史文章

    Hadoop系列-入门安装
    Hadoop系列-HDFS命令
    Hadoop系列-Hive安装
    Hadoop系列-Hive数据库常见SQL命令
    Hadoop系列-HBase数据库
    Hadoop系列-HBase数据库(二)
    Hadoop系列-HBase数据库JAVA篇
    Hadoop系列-Spark安装以及HelloWorld
    Hadoop系列-MapReduce小例子
    Hadoop系列-Spark小例子
    JAVA面试汇总(五)数据库(一)
    JAVA面试汇总(五)数据库(二)
    JAVA面试汇总(五)数据库(三)
    JAVA面试汇总(四)JVM(一)
    JAVA面试汇总(四)JVM(二)
    JAVA面试汇总(四)JVM(三)
    JAVA面试汇总(三)集合(一)
    JAVA面试汇总(三)集合(二)
    JAVA面试汇总(三)集合(三)
    JAVA面试汇总(三)集合(四)
    JAVA面试汇总(二)多线程(一)
    JAVA面试汇总(二)多线程(二)
    JAVA面试汇总(二)多线程(三)
    JAVA面试汇总(二)多线程(四)
    JAVA面试汇总(二)多线程(五)
    JAVA面试汇总(二)多线程(六)
    JAVA面试汇总(二)多线程(七)
    JAVA面试汇总(一)Java基础知识

    相关文章

      网友评论

        本文标题:Hadoop系列-HBase数据库(二)

        本文链接:https://www.haomeiwen.com/subject/rrgajrtx.html