spark udf 提示not serializable

作者: 南修子 | 来源:发表于2020-06-15 18:37 被阅读0次

spark udf 提示not serializable
Spark UDF变长参数的二三事儿
spark sql 调试技巧--内置udf查看
案例解析丨 Spark Hive 自定义函数应用
spark2.1 sql 自定义udf以及spark sql a
Spark（Java）的一些坑
spark+java的collect方法bug问题
Spark SQL 中 UDF 和 UDAF 的使用
spark UDF UDAF
spark 定制 UDF

20/06/08 16:41:06 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 327.2 KB, free 912.0 MB)
20/06/08 16:41:06 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 30.1 KB, free 912.0 MB)
20/06/08 16:41:06 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.42.76:35893 (size: 30.1 KB, free: 912.3 MB)
20/06/08 16:41:06 INFO spark.SparkContext: Created broadcast 0 from checkpoint at DataProcessingNew.java:323
20/06/08 16:41:07 INFO codegen.CodeGenerator: Code generated in 351.641059 ms
20/06/08 16:41:07 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Task not serializable
org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:840)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:839)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:839)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:371)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
    at org.apache.spark.sql.Dataset.checkpoint(Dataset.scala:512) 

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:646)
Caused by: java.io.NotSerializableException: javax.script.ScriptEngineManager
Serialization stack:
    - object not serializable (class: javax.script.ScriptEngineManager, value: javax.script.ScriptEngineManager@78aa31f2) 
    - field (class: org.apache.spark.sql.UDFRegistration$$anonfun$register$26, name: f$21, type: interface org.apache.spark.sql.api.java.UDF2)
    - object (class org.apache.spark.sql.UDFRegistration$$anonfun$register$26, <function1>)
    - field (class: org.apache.spark.sql.UDFRegistration$$anonfun$register$26$$anonfun$apply$2, name: $outer, type: class org.apache.spark.sql.UDFRegistration$$anonfun$register$26)
    - object (class org.apache.spark.sql.UDFRegistration$$anonfun$register$26$$anonfun$apply$2, <function2>)
    - field (class: org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$3, name: func$3, type: interface scala.Function2)
    - object (class org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$3, <function1>)
    - field (class: org.apache.spark.sql.catalyst.expressions.ScalaUDF, name: f, type: interface scala.Function1)
    - object (class org.apache.spark.sql.catalyst.expressions.ScalaUDF, UDF(input[2, double, true], 3*x+2))
    - element of array (index: 0)
    - array (class [Ljava.lang.Object;, size 2)
    - field (class: org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8, name: references$1, type: class [Ljava.lang.Object;)
    - object (class org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8, <function2>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
    ... 26 more
20/06/08 16:41:07 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.SparkException: Task not serializable)
20/06/08 16:41:07 INFO spark.SparkContext: Invoking stop() from shutdown hook

继承序列化类即可