美文网首页
黑猴子的家:Kylin 快速入门之 Build Cube Err

黑猴子的家:Kylin 快速入门之 Build Cube Err

作者: 黑猴子的家 | 来源:发表于2020-02-07 14:08 被阅读0次

    1、kylin在build报错10020拒绝链接错误

    1)logs

    org.apache.kylin.engine.mr.exception.MapReduceException: Exception: java.net.ConnectException: Call From dxt102/192.168.1.102 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    java.net.ConnectException: Call From dxt102/192.168.1.102 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173)
        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:164)
    

    2)解决方案 -> map-site.xml

        <!-- 配置 MapReduce JobHistory Server 地址,默认端口10020 -->
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>hadoop102:10020</value>
        </property>
        <!-- 配置 MapReduce JobHistory Server web ui 地址,默认端口19888 -->
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>hadoop102:19888</value>
        </property>
    

    2、Aggregation is not enabled. Try the nodemanager at hadoop104:42370

    1)logs

    Aggregation is not enabled. Try the nodemanager at hadoop104:42370
    

    2)解决方案
    yarn-site.xml

    <property>
        <name>yarn.log-aggregation-enable</name>  
        <value>true</value>  
    </property>
    

    刷新配置

    [alex@hadoop102 hadoop-2.8.2]$ bin/hdfs dfsadmin -refreshNodes
    [alex@hadoop102 hadoop-2.8.2]$ bin/yarn rmadmin -refreshNodes
    

    3、org.apache.hadoop.hbase.ipc.CallTimeoutException

    内存资源太紧张,或者时间不统一等原因,导致hbase集群挂掉,设置容错

    <property>
        <name>hbase.rpc.timeout</name>
        <value>600000</value>
    </property>
    <property>
        <name>hbase.client.operation.timeout</name>
        <value>600000</value>
    </property>
    <property>
        <name>hbase.client.scanner.timeout.period</name>
        <value>600000</value>
    </property>
    <property>
        <name>hbase.regionserver.lease.period</name>
        <value>600000</value>
    </property>
    <property>
        <name>phoenix.query.timeoutMs</name>
        <value>600000</value>
    </property>
    <property>
        <name>phoenix.query.keepAliveMs</name>
        <value>600000</value>
    </property>
    <property>
        <name>hbase.client.ipc.pool.type</name>
        <value>RoundRobinPool</value>
    </property>
    <property>
        <name>hbase.client.ipc.pool.size</name>
        <value>10</value>
    </property>
    

    4、java.io.FileNotFoundException

    1)error
    java.io.FileNotFoundException: /opt/module/hadoop-2.8.2/logs/userlogs/application_1580972908133_0001/container_1580972908133_0001_01_000001 (是一个目录)
    2)解决方案
    restart build

    5、org.apache.hadoop.hbase.client.RetriesExhaustedException


    第一个报错:failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException : Call id=xxxxx waitTime=xxxxx,operationTimeout = 5000 expired
    不熟悉的人一下还真看不出来,还好组里有老司机,说hbase的rpc参数影响。问一问万能的度娘
    Base中的RPC是RegionServer,Master以及Client(如Hbase shell, JAVA client API)三者之间通信的纽带
    由此我们想到这个是通信等待时间太长,超多了设定的阈值
    查看hbase的参数:hbase.rpc.timeout 默认是1min, 通过和老司机沟通,调整为10分钟,重启集群,然后在大规模跑历史数据中,不在报错,问题解决。

    java.lang.RuntimeException: 
    org.apache.kylin.job.exception.PersistentException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1, exceptions:
    Thu Feb 06 16:28:55 CST 2020, RpcRetryingCaller{globalStartTime=1580977730735, pause=100, retries=1}, java.io.IOException: Call to hadoop104/192.168.2.104:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=6486, waitTime=5001, operationTimeout=5000 expired.
    
        at org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:528)
        at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:163)
    

    解决方案,改为10分钟
    hbase-site.xml

        <property>
            <name>hbase.rpc.timeout</name>
            <value>600000</value>
        </property>
    

    6、Kylin Build执行到底17步时报错:17 Step Name: Build Cube In-Mem

    The required MAP capability is more than the supported max container capability in the cluster. Killing the Job. mapResourceRequest: <memory:3072, vCores:1> maxContainerCapability:<memory:1024, vCores:2>
    Job received Kill while in RUNNING state.

    [alex@hadoop102 conf]$ pwd
    /opt/module/kylin/conf
    
    [alex@hadoop102 conf]$ vim kylin_job_conf_inmem.xml
    <!--Additional config for in-mem cubing, giving mapper more memory -->
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1024value>
        <description></description>
    </property>
    

    7、KeeperException$SessionExpiredException

    org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master

    1)原因
    HBase进程默认触发GC的时机是当年老代内存达到90%的时候,这个百分比由 -XX:CMSInitiatingOccupancyFraction=N 这个参数来设置。concurrent mode failed发生在这样一个场景:当年老代内存达到90%的时候,CMS开始进行并发垃圾收集,于此同时,新生代还在迅速不断地晋升对象到年老代。当年老代CMS还未完成并发标记时,年老 代满了,悲剧就发生了。CMS因为没内存可用不得不暂停mark,并触发一次全jvm的stop the world(挂起所有线程),然后采用单线程拷贝方式清理所有垃圾对象,也就是full gc。而我们的bulk的最开始的操作就是各种删表,建表频繁的操作,就会使用掉大量master的年轻代的内存,就会发生上面发生的场景,发生full gc。

    2)解决方案
    CMSInitiatingOccupancyFraction的值设置为70,这样年老代占到约70%时就开始执行CMS,这样就不会出现(或很少出现)Full GC了。

    3)步骤一
    使用vim $HBASE_HOME/conf/hbase-env.sh打开文件,找到export HBASE_OPTS,在其位置上方(避免下文取不到该变量的值)添加export HBASE_LOG_DIR=${HBASE_HOME}/logs,然后设置HBASE_OPTS

    export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xloggc:$HBASE_LOG_DIR/hbase.gc.log -XX:ErrorFile=$HBASE_LOG_DIR/hs_err_pid.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:+    UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70"
    

    4)步骤二

    [alex@hadoop102 conf]$ pwd
    /opt/module/zookeeper-3.4.10/conf
    [alex@hadoop102 conf]$ vim zoo.cfg
    tickTime=600000
    maxClientCnxns=60
    minSessionTimeout=600000
    maxSessionTimeout=6000000
    

    8、java.lang.RuntimeException: HRegionServer Aborted

    java.lang.RuntimeException: HRegionServer Aborted

    <property>
        <name>hbase.coprocessor.abortonerror</name>
        <value>false</value>
    </property>
    

    9、hive's usability failed

    ERROR: Check hive's usability failed, please check the status of your cluster
    bin/kylin.sh start

    相关文章

      网友评论

          本文标题:黑猴子的家:Kylin 快速入门之 Build Cube Err

          本文链接:https://www.haomeiwen.com/subject/bdymxhtx.html