kylo问题总结1
Getting unexpected error in Data Transformation: "Unable to instantiate SparkSession with LLAP
Kylo table page feature does not work with Hive zookeeper service discovery URI.
Spark configuration
cp /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml
# Snappy isn't working well for Spark on Cloudera
echo "spark.io.compression.codec=lz4" >> /etc/spark/conf/spark-defaults.conf
Kylo-Nifi安装后的相关问题
-
防火墙关闭
systemctl stop firewalld.service
-
8400,8079端口已开放
iptables -A INPUT -ptcp --dport 8400 -j ACCEPT
-
mysql有kylo库及kylo用户并正确授权
-
spark-shell能够正常启动
-
相应用户有操作hive的权限(Sentry集群中)
问题
目录与权限
安装完成后,如未自动创建下列目录,需要考虑权限问题或手动创建所需目录,并赋予相应权限。
hdfs dfs -mkdir /user/kylo
hdfs dfs -chown kylo:kylo /user/kylo
hdfs dfs -mkdir /user/nifi
hdfs dfs -chown nifi:nifi /user/nifi
hdfs dfs -mkdir /etl
hdfs dfs -chown nifi:nifi /etl
hdfs dfs -mkdir /model.db
hdfs dfs -chown nifi:nifi /model.db
hdfs dfs -mkdir /archive
hdfs dfs -chown nifi:nifi /archive
hdfs dfs -mkdir -p /app/warehouse
hdfs dfs -chown nifi:nifi /app/warehouse
本地/tmp目录也需要提供相应权限,如果无法读写/tmp/kylo-nifi/目录,怎有可能会报错
Elasticsearch索引
EsIndexException in Kylo services logs
Problem
Kylo services log contains errors similar to this:org.modeshape.jcr.index.elasticsearch.EsIndexException: java.io.IOException: Not Found
Solution
Pre-create the indexes used by Kylo in Elasticsearch. Execute this script: /opt/kylo/bin/create-kylo-indexes-es.sh
The script takes 4 parameters.
<host> <rest-port> <num-shards> <num-replicas>
Examples values:
host: localhost
rest-port: 9200
num-shards: 1
num-replicas: 1
Note: num-shards and num-replicas can be set to 1 for development environment
spark默认压缩格式
错误信息: UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
cp /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml
# Snappy isn't working well for Spark on Cloudera
echo "spark.io.compression.codec=lz4" >> /etc/spark/conf/spark-defaults.conf
return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
hdfs dfs -mkdir /user/dladmin
hdfs dfs -chown dladmin: dladmin /user/dladmin
- 增加(hive-site.xml)配置mapreduce.job.reduces=1
默认路径多为hdp的路径
在nifi的模版中,多数流程的默认路径都为hdp的路径,因此需要修改相应的路径为当前环境正确的路径,否则会造成Not Found的错误。
data_transformation模版中,Prepare Script生成的脚本存在问题
该问题待确认 Prepare Script中的判断存在问题,也可能是环境配置有误,导致生成的脚本同时存在insertInto()与partitionBy()。 当前解决方法为注释相关代码:
if (!isPreFeed && (sparkVersion == null || sparkVersion == "1")) {
//该行为注释行
//script = script + ".partitionBy(\"processing_dttm\")"
}
使用模版导入数据
File Filter中的数据类型识别存在一定问题,不够准确,导致后续的数据处理产生问题,因此在创建Feed时,需要留意每个field真实的数据类型。
原数据与目标数据的数据类型
在nifi的处理中,Feed中设定的数据格式为目标数据格式,由原数据ETL而来,因此,需要原数据与目标数据的数据格式相同,否则报错。
[图片上传失败...(image-89a745-1548150455520)]
这个错误是由于spark-shell启动不起来
查看配置,nifi服务的principal认证失败造成
这个错误是由于hive2连接不上
[root@kylo2 soft]# beeline
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Beeline version 1.1.0-cdh5.15.0 by Apache Hive
beeline>
beeline> !connect jdbc:hive2://10.88.88.120:10000/default;principal=hive/kylo1.hypers.cc@KYLO.CC
scan complete in 2ms
Connecting to jdbc:hive2://10.88.88.120:10000/default;principal=hive/kylo1.hypers.cc@KYLO.CC
18/09/30 17:10:54 [main]: ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
... 35 more
Unknown HS2 problem when communicating with Thrift server.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://10.88.88.120:10000/default;principal=hive/kylo1.hypers.cc@KYLO.CC: GSS initiate failed (state=08S01,code=0)
hive链接失败
kylo-service.log 日志
2018-10-12 13:56:54 ERROR http-nio-8420-exec-6:ConnectionPool:182 - Unable to create initial connections of pool.
java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://10.88.88.120:10000/default: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:215)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:163)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:248)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:190)
... 147 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 150 more
解决方案
vim /opt/kylo/kylo-services/conf/application.properties
hive.datasource.driverClassName=org.apache.hive.jdbc.HiveDriver
hive.datasource.url=jdbc:hive2://10.88.88.120:10000/default
hive.datasource.username=hive
hive.datasource.password=hive
hive.datasource.validationQuery=show tables 'test'
问题
kylo-service.log 日志
2018-10-12 14:07:17 ERROR http-nio-8420-exec-5:ThrowableMapper:43 - toResponse() caught throwable
org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLInvalidAuthorizationSpecException: Could not connect: Access denied for user 'kylo'@'10.88.88.122' (using password: NO)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLInvalidAuthorizationSpecException: Could not connect: Access denied for user 'kylo'@'10.88.88.122' (using password: NO)
at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:135)
at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101)
at org.mariadb.jdbc.internal.util.ExceptionMapper.throwException(ExceptionMapper.java:91)
at org.mariadb.jdbc.Driver.connect(Driver.java:109)
at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:307)
at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:200)
at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:710)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:644)
at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:466)
at org.apache.tomcat.jdbc.pool.ConnectionPool.<init>(ConnectionPool.java:143)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:115)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:102)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:126)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
... 126 more
Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Could not connect: Access denied for user 'kylo'@'10.88.88.122' (using password: NO)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.authentication(AbstractConnectProtocol.java:557)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.handleConnectionPhases(AbstractConnectProtocol.java:499)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:384)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:825)
at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:469)
at org.mariadb.jdbc.Driver.connect(Driver.java:104)
... 137 more
Visual Query
162 hive.datasource.driverClassName=org.apache.hive.jdbc.HiveDriver
163 hive.datasource.url=jdbc:hive2://10.16.4.68:10000/default;principal=hive/kylo-cdh5.cs1cloud.internal@CS1CLOUD.INTERNAL
164 hive.datasource.username=hive
165 hive.datasource.password=123456
166 hive.datasource.validationQuery=show tables 'test'
...
170 ##Also Clouder url should be /metastore instead of /hive
171 hive.metastore.datasource.driverClassName=org.mariadb.jdbc.Driver
172 hive.metastore.datasource.url=jdbc:mysql://10.16.4.68:3306/hive
173 #hive.metastore.datasource.url=jdbc:mysql://10.16.4.68:3306/hive
174 hive.metastore.datasource.username=metastore
175 hive.metastore.datasource.password=hive123
176 hive.metastore.datasource.validationQuery=SELECT 1
177 hive.metastore.datasource.testOnBorrow=true
hive的principal可以用nifi的principal
要给nifi的keytab权限777
hive的模拟用户设置为false
2018-11-23 15:30:46,289 ERROR [Timer-Driven Process Thread-2] c.t.nifi.v2.ingest.HdiMergeTable HdiMergeTable[id=0b301887-5de0-3dce-725d-b6e81225713d] Unable to execute merge doMerge for StandardFlowFileRecord[uuid=c822a690-7d3e-4a1d-8a28-d18dbc35e18d,claim=,offset=0,name=701401108440816,size=0] due to java.lang.RuntimeException: Failed to execute query; routing to failure: java.lang.RuntimeException: Failed to execute query
java.lang.RuntimeException: Failed to execute query
at com.thinkbiganalytics.ingest.TableMergeSyncSupport.doExecuteSQL(TableMergeSyncSupport.java:759)
at com.thinkbiganalytics.ingest.HdiTableMergeSyncSupport.doRolling(HdiTableMergeSyncSupport.java:97)
at com.thinkbiganalytics.nifi.v2.ingest.HdiMergeTable.onTrigger(HdiMergeTable.java:267)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1147)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:175)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
确认hive的hive.metastore链接信息是否正确
在hive的日志中,找没添加的用,在hdfs 上加上并即可
echo "spark.io.compression.codec=lz4" >> /etc/spark/conf/spark-defaults.conf
eleaseLocks start=1544077817490 end=1544077817541 duration=51 from=org.apache.hadoop.hive.ql.Driver>
2018-12-06 14:30:17,611 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-67]: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2018-12-06 14:30:17,627 INFO org.apache.hive.service.cli.operation.OperationManager: [HiveServer2-Handler-Pool: Thread-54]: Closing operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=6dc999b0-deb5-43cc-9b54-a63945f81bc3]
问题
Compression codec com.hadoop.compression.lzo.LzoCodec not found
2019-01-11 16:28:21,995 ERROR [Timer-Driven Process Thread-7] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=10f86baf-cc95-32df-4f6a-95237b99d4ed] Failed to write to HDFS due to java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getCompressionCodec(AbstractHadoopProcessor.java:415)
解决方案:
P:Archive Originals
Additional Classpath Resources: /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
P:Upload to HDFS
Additional Classpath Resources: /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
Transform Exception An error occurred while initializing the Spark Shell.
解决方案:重启kylo后,在UI上点击Visual Query 后等着,耐心等待。。。直到kylo-spark-shell.log文件中看到如下日志内容后在去查询
...Successfully registered client.
...
INFO SparkShellApp: Started SparkShellApp in 156.081 seconds (JVM running for 156.983)
网友评论