美文网首页
spark udf 提示not serializable

spark udf 提示not serializable

作者: 南修子 | 来源:发表于2020-06-15 18:37 被阅读0次
    20/06/08 16:41:06 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 327.2 KB, free 912.0 MB)
    20/06/08 16:41:06 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 30.1 KB, free 912.0 MB)
    20/06/08 16:41:06 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.42.76:35893 (size: 30.1 KB, free: 912.3 MB)
    20/06/08 16:41:06 INFO spark.SparkContext: Created broadcast 0 from checkpoint at DataProcessingNew.java:323
    20/06/08 16:41:07 INFO codegen.CodeGenerator: Code generated in 351.641059 ms
    20/06/08 16:41:07 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Task not serializable
    org.apache.spark.SparkException: Task not serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
        at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:840)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:839)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:839)
        at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:371)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
        at org.apache.spark.sql.Dataset.checkpoint(Dataset.scala:512) 
    
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:646)
    Caused by: java.io.NotSerializableException: javax.script.ScriptEngineManager
    Serialization stack:
        - object not serializable (class: javax.script.ScriptEngineManager, value: javax.script.ScriptEngineManager@78aa31f2) 
        - field (class: org.apache.spark.sql.UDFRegistration$$anonfun$register$26, name: f$21, type: interface org.apache.spark.sql.api.java.UDF2)
        - object (class org.apache.spark.sql.UDFRegistration$$anonfun$register$26, <function1>)
        - field (class: org.apache.spark.sql.UDFRegistration$$anonfun$register$26$$anonfun$apply$2, name: $outer, type: class org.apache.spark.sql.UDFRegistration$$anonfun$register$26)
        - object (class org.apache.spark.sql.UDFRegistration$$anonfun$register$26$$anonfun$apply$2, <function2>)
        - field (class: org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$3, name: func$3, type: interface scala.Function2)
        - object (class org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$3, <function1>)
        - field (class: org.apache.spark.sql.catalyst.expressions.ScalaUDF, name: f, type: interface scala.Function1)
        - object (class org.apache.spark.sql.catalyst.expressions.ScalaUDF, UDF(input[2, double, true], 3*x+2))
        - element of array (index: 0)
        - array (class [Ljava.lang.Object;, size 2)
        - field (class: org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8, name: references$1, type: class [Ljava.lang.Object;)
        - object (class org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8, <function2>)
        at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
        at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
        ... 26 more
    20/06/08 16:41:07 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.SparkException: Task not serializable)
    20/06/08 16:41:07 INFO spark.SparkContext: Invoking stop() from shutdown hook
    

    继承序列化类即可

    相关文章

      网友评论

          本文标题:spark udf 提示not serializable

          本文链接:https://www.haomeiwen.com/subject/unwjtktx.html