美文网首页
spark创建DataFrame

spark创建DataFrame

作者: iE简 | 来源:发表于2020-12-15 18:26 被阅读0次

通过列表创建

# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .master("local")\
    .appName("create df")\
    .getOrCreate()

# 列表里是列表
list1 = [["Bom", 20, 97.6, 165],
         ["Alice", 23, 90.0, 160]]

df1 = spark.createDataFrame(list1, ["name", "age", "weight", "height"])
df1.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# |  Bom| 20|  97.6|   165|
# |Alice| 23|  90.0|   160|
# +-----+---+------+------+

# 列表里是元组
list2 = [("Bom", 20, 97.6, 165),
         ("Alice", 23, 90.0, 160)]

df2 = spark.createDataFrame(list2, ["name", "age", "weight", "height"])
df2.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# |  Bom| 20|  97.6|   165|
# |Alice| 23|  90.0|   160|
# +-----+---+------+------+

df2_no_header = spark.createDataFrame(list2)
df2_no_header.show()
# +-----+---+----+---+
# |   _1| _2|  _3| _4|
# +-----+---+----+---+
# |  Bom| 20|97.6|165|
# |Alice| 23|90.0|160|
# +-----+---+----+---+

# 列表里是字典
list3 = [{"name": "Bom", "age": 20, "weight": 97.6, "height": 165},
         {"name": "Alice", "age": 23, "weight": 90.0, "height": 160}]

df3 = spark.createDataFrame(list3)
df3.show()
# +---+------+-----+------+
# |age|height| name|weight|
# +---+------+-----+------+
# | 20|   165|  Bom|  97.6|
# | 23|   160|Alice|  90.0|
# +---+------+-----+------+

通过列表创建dataframe,列表里面可以是列表也可以是元组。

从json文件创建

json文件people.json:

{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}

spark代码:

# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .master("local")\
    .appName("create df from json")\
    .getOrCreate()

df = spark.read.json("file:///Users/zhi/Documents/pycharm/spark_project/spark_test/people.json")
df.show()

# +----+-------+
# | age|   name|
# +----+-------+
# |null|Michael|
# |  30|   Andy|
# |  19| Justin|
# +----+-------+

从字典创建

目前还没有想到直接从字典转的df的方式,现在只能借用pandas转成df,再转成spark的df格式:

# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession
import pandas as pd

spark = SparkSession\
    .builder\
    .master("local")\
    .appName("create df")\
    .getOrCreate()

dict1 = {"name": ["Bom", "Alice"],
         "age": [20, 23],
         "weight": [97.6, 90.0],
         "height": [165, 160]}

df1 = pd.DataFrame(dict1)
spark_df = spark.createDataFrame(df1)
spark_df.show()

# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# |  Bom| 20|  97.6|   165|
# |Alice| 23|  90.0|   160|
# +-----+---+------+------+

要是有直接转换的方法,望投评论,感激不尽:)

相关文章

网友评论

      本文标题:spark创建DataFrame

      本文链接:https://www.haomeiwen.com/subject/kdbwgktx.html