通过列表创建
# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.master("local")\
.appName("create df")\
.getOrCreate()
# 列表里是列表
list1 = [["Bom", 20, 97.6, 165],
["Alice", 23, 90.0, 160]]
df1 = spark.createDataFrame(list1, ["name", "age", "weight", "height"])
df1.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# | Bom| 20| 97.6| 165|
# |Alice| 23| 90.0| 160|
# +-----+---+------+------+
# 列表里是元组
list2 = [("Bom", 20, 97.6, 165),
("Alice", 23, 90.0, 160)]
df2 = spark.createDataFrame(list2, ["name", "age", "weight", "height"])
df2.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# | Bom| 20| 97.6| 165|
# |Alice| 23| 90.0| 160|
# +-----+---+------+------+
df2_no_header = spark.createDataFrame(list2)
df2_no_header.show()
# +-----+---+----+---+
# | _1| _2| _3| _4|
# +-----+---+----+---+
# | Bom| 20|97.6|165|
# |Alice| 23|90.0|160|
# +-----+---+----+---+
# 列表里是字典
list3 = [{"name": "Bom", "age": 20, "weight": 97.6, "height": 165},
{"name": "Alice", "age": 23, "weight": 90.0, "height": 160}]
df3 = spark.createDataFrame(list3)
df3.show()
# +---+------+-----+------+
# |age|height| name|weight|
# +---+------+-----+------+
# | 20| 165| Bom| 97.6|
# | 23| 160|Alice| 90.0|
# +---+------+-----+------+
通过列表创建dataframe,列表里面可以是列表也可以是元组。
从json文件创建
json文件people.json:
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
spark代码:
# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.master("local")\
.appName("create df from json")\
.getOrCreate()
df = spark.read.json("file:///Users/zhi/Documents/pycharm/spark_project/spark_test/people.json")
df.show()
# +----+-------+
# | age| name|
# +----+-------+
# |null|Michael|
# | 30| Andy|
# | 19| Justin|
# +----+-------+
从字典创建
目前还没有想到直接从字典转的df的方式,现在只能借用pandas转成df,再转成spark的df格式:
# -*- coding: utf-8 -*-
from pyspark.sql import SparkSession
import pandas as pd
spark = SparkSession\
.builder\
.master("local")\
.appName("create df")\
.getOrCreate()
dict1 = {"name": ["Bom", "Alice"],
"age": [20, 23],
"weight": [97.6, 90.0],
"height": [165, 160]}
df1 = pd.DataFrame(dict1)
spark_df = spark.createDataFrame(df1)
spark_df.show()
# +-----+---+------+------+
# | name|age|weight|height|
# +-----+---+------+------+
# | Bom| 20| 97.6| 165|
# |Alice| 23| 90.0| 160|
# +-----+---+------+------+
要是有直接转换的方法,望投评论,感激不尽:)
网友评论