spark 读取mongodb中的数据然后存储到hdfs上
作者:
枫隐_5f5f | 来源:发表于
2019-06-12 18:22 被阅读0次from pyspark.sql import SparkSession
import sys
if __name__ == "__main__":
spark = SparkSession.builder \
.getOrCreate()
mongo_read_uri = "mongodb://user:passwd@ip:port/database_name"
table = "table_name"
device_statis_df = spark.read \
.option("uri",mongo_read_uri) \
.option("collection",table) \
.format("com.mongodb.spark.sql") \
.load()
device_statis_df.createOrReplaceTempView("devicestatistics")
sql_str = """
select * from devicestatistics
"""
sqlDF = spark.sql(sql_str)
sqlDF.repartition(10).write.format("parquet").mode("overwrite").save("/path/to/hdfs")
print ("Done ====")
spark.stop()
本文标题:spark 读取mongodb中的数据然后存储到hdfs上
本文链接:https://www.haomeiwen.com/subject/uwlrfctx.html
网友评论