How to save a spark DataFrame as

作者: 雨笋情缘 | 来源:发表于2019-03-15 10:10 被阅读1次

How to save a spark DataFrame as
Spark DataFrame 实战
179、Spark 2.0之新特性介绍
spark sql DataFrame&DataSet
Spark（RDD,CSV）创建DataFrame方式
spark 2.0 新特性
Hadoop - SparkSQL
PySpark的使用
SparkSQL connect MySQL with JDBC
Spark从入门到精通52:Spark2.0特性介绍

Apache Spark does not support native CSV output on disk.

You have 4 available solutions though:

1. You can convert your Dataframe into an RDD :

方式一：

def convertToReadableString(r : Row) = ???

df.rdd.map{ convertToReadableString }.saveAsTextFile(filepath)

This will create a folder filepath. Under the file path, you'll find partitions files (e.g part-000*)

What I usually do if I want to append all the partitions into a big CSV is

cat filePath/part* > mycsvfile.csv

Some will use coalesce(1,false) to create one partition from the RDD. It's usually a bad practice, since it may overwhelm the driver by pulling all the data you are collecting to it.

Note that df.rdd will return an RDD[Row].

2.With Spark <2, you can use databricks spark-csv library:

Spark 1.4+:

方式二：

df.write.format("com.databricks.spark.csv").save(filepath)

Spark 1.3:

方式三：

df.save(filepath,"com.databricks.spark.csv")

With Spark 2.x the spark-csv package is not needed as it's included in Spark.

方式四：

df.write.format("csv").save(filepath)

You can convert to local Pandas data frame and use to_csv method (PySpark only).

Note: Solutions 1, 2 and 3 will result in CSV format files (part-*) generated by the underlying Hadoop API that Spark calls when you invoke save. You will have one part- file per partition.

另存为txt文件

方式一：

bank.rdd.repartition(1).saveAsTextFile("/tmp/df2.txt")

note: bank is a DataFrame

原文地址：

https://stackoverflow.com/questions/33174443/how-to-save-a-spark-dataframe-as-csv-on-disk

https://community.hortonworks.com/questions/42838/storage-dataframe-as-textfile-in-hdfs.html

网友评论

本文标题：How to save a spark DataFrame as

本文链接：https://www.haomeiwen.com/subject/viakmqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

How to save a spark DataFrame as

Apache Spark does not support native CSV output on disk.

You have 4 available solutions though:

另存为txt文件

相关文章

How to save a spark DataFrame as

Spark DataFrame 实战

179、Spark 2.0之新特性介绍

spark sql DataFrame&DataSet

Spark（RDD,CSV）创建DataFrame方式

spark 2.0 新特性

Hadoop - SparkSQL

PySpark的使用

SparkSQL connect MySQL with JDBC

Spark从入门到精通52:Spark2.0特性介绍

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

The Sca...

Spark