好久时间没有用spark了,最近工作需要,在提交spark任务的时候发现打jar一直很大,自己又是搞C++的,以前打jar的时候按照网上的提示,都是用的是在IDEA里面 File -> Project Structure -> Artifacts
,有时候发现会出现错误,什么META-INF 问题啥的,解决了,发现打出来的jar很大,90M-160M 不等。
今天运行任务,打成jar包的时候,直接用maven的package,(我这玩c++的有点out了,打个包都不会,菜的一笔,哎。。。 还是记录一下)
然后发现生成的有对应的tar.gz ,解压里面有对应的lib,lib里面都是jar,然后提交任务的时候,就把jar全部带上吧,所以在提交任务的时候加上一个 --jars参数
--jars ./lib/fastjson-1.2.39.jar,./lib/kafka-clients-0.10.0.1.jar,./lib/profiler-4.0.5.jar,./lib/sdk-2.3.jar,./lib/spark-core_2.11-2.1.0.jar,./lib/spark-hive_2.11-2.1.0.jar,./lib/spark-streaming_2.11-2.1.0.jar,./lib/spark-streaming-kafka-0-10_2.11-2.1.0.jar
然后发现总共才20M而已,以后就这样干吧,每个jar包之间通过逗号连接,逗号两边不要有空格,还有就是这么多jar肯定不能直接一个个把名字打上去,写个shell脚本就OK了
最终在driver端有两个文件,一个是lib(里面就是各种依赖的jar),一个是自己package的jar(可以通过maven打包的jar,几十KB),然后用spark-submit提交吧
出现一个问题
18/09/03 15:18:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on BJHTYD-Hope-27-34.hadoop..local:7949 (size: 2.2 KB, free: 2.8 GB)
18/09/03 15:18:03 WARN TaskSetManager: Lost task 7.0 in stage 0.0 (TID 1, BJHTYD-Hope-26-3.hadoop.local, executor 2): java.lang.NoClassDefFoundError: Could not initialize class.HbaseConnectionPool
at .JavaReceiver.buildHbaseClient(JavaReceiver.java:149)
at .JavaReceiver$1$1.call(JavaReceiver.java:75)
at .JavaReceiver$1$1.call(JavaReceiver.java:69)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/09/03 15:18:03 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 4) on BJHTYD-Hope-53-71.hadoop..local, executor 1: java.lang.NoClassDefFoundError (Could not initialize class Service.HbaseConnectionPool) [duplicate 1]
18/09/03 15:18:03 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 9, BJHTYD-Hope-53-71.hadoop.local, executor 1): java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/CellScannable
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at Service.HbaseConnectionPool.<clinit>(HbaseConnectionPool.java:27)
at JavaReceiver.buildHbaseClient(JavaReceiver.java:149)
at JavaReceiver$1$1.call(JavaReceiver.java:75)
at Receiver$1$1.call(JavaReceiver.java:69)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1955)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.CellScannable
看了一下提交,发现没有对应hbase的jar,然后把对应的jar包添加到lib中并在--jars中添加,然后提交后,成功
网友评论