美文网首页
Scala 输出CSV

Scala 输出CSV

作者: Reflection_ | 来源:发表于2018-03-04 05:01 被阅读0次
    1. spark RDD 去括号+输出为单个CSV
      rating 是 Dataframe,转为RDD
    val avgs = rating.rdd
          .map( t=> (t(0),t(1)).toString().replaceAll("\\(","").replaceAll("\\)",""))
          .collect()
    

    输出:

        printToFile(new File("Output/task1.csv")) {
          p => avgs.foreach(p.println) // 但是没有header
        }
    
    1. 直接对Dataframe 输出,会产生一个文件夹,下面有CSV 和 _SUCCESS
       //Create a folder
       val saveOptions = Map("header" -> "true", "path" -> "Output/Firstname_Li_task1.csv")
       rating.coalesce(1)
         .write.mode(SaveMode.Overwrite).format("csv")
         .options(saveOptions)
         .save()
    
       rating.write.option("header", "true").csv("Output/Firstname_Li_task1.csv")
    
    
        rating.repartition(1)
          .write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv")
          .option("header", "true")
          .save("Output/Firstname_Li_task2.csv")
    
       rating.toJavaRDD
      .coalesce(1)
      .saveAsTextFile("Firstname_Li_task1.csv") //Create a folder
    
    1. 输出单个CSV,且有header
    import java.io._
    
        def printToFile(f: java.io.File)(op: java.io.PrintWriter => Unit)
        {
          val p = new java.io.PrintWriter(f);
          p.write("asin,")
          p.write("rating_avg\n")
          try { op(p) }
          finally { p.close() }
        }
    
        val avgs = rating.rdd
          .map( t=> (t(0),t(1)).toString().replaceAll("\\(","").replaceAll("\\)",""))
          .collect()
    
        printToFile(new File("Output/Firstname_Li_task1.csv")) {
          p => avgs.foreach(p.println) // avgs.foreach(p.println)
        }
    

    相关文章

      网友评论

          本文标题:Scala 输出CSV

          本文链接:https://www.haomeiwen.com/subject/hhitzxtx.html