美文网首页
【Spark】Spark DataFrame schema转换方

【Spark】Spark DataFrame schema转换方

作者: PowerMe | 来源:发表于2017-08-24 17:04 被阅读102次

比如原始表的schema如下:


image.png

现在想将该DataFrame 的schema转换成:
id:String,
goods_name:String
price: Array<String>

  1. sql 转换
    spark.sql("create table speedup_tmp_test_spark_schema_parquet12 using parquet as select cast(id as string),cast(goods_name as string),cast(price as array<string>) from tmp_test_spark_schema_parquet")

  2. case class 变换
    case class newSchemaClass(id: String, goods_name: String, price: Array[String])

// 原dataframe
val df = spark.sql("select * from tmp_test_spark_schema_parquet")

// 新dataframe
val newDF = df .rdd.map { r =>
newSchemaClass(r(0).toString, r(1).toString, r.getSeqInt.map(_.toString).toArray)
}.toDF()

// 获取具体数据
newDF.collect()(2).getListString

相关文章

网友评论

      本文标题:【Spark】Spark DataFrame schema转换方

      本文链接:https://www.haomeiwen.com/subject/rdyedxtx.html