美文网首页
spark创建DataFrame

spark创建DataFrame

作者: iE简 | 来源:发表于2020-12-15 18:26 被阅读0次

    通过列表创建

    # -*- coding: utf-8 -*-
    from pyspark.sql import SparkSession
    
    spark = SparkSession\
        .builder\
        .master("local")\
        .appName("create df")\
        .getOrCreate()
    
    # 列表里是列表
    list1 = [["Bom", 20, 97.6, 165],
             ["Alice", 23, 90.0, 160]]
    
    df1 = spark.createDataFrame(list1, ["name", "age", "weight", "height"])
    df1.show()
    # +-----+---+------+------+
    # | name|age|weight|height|
    # +-----+---+------+------+
    # |  Bom| 20|  97.6|   165|
    # |Alice| 23|  90.0|   160|
    # +-----+---+------+------+
    
    # 列表里是元组
    list2 = [("Bom", 20, 97.6, 165),
             ("Alice", 23, 90.0, 160)]
    
    df2 = spark.createDataFrame(list2, ["name", "age", "weight", "height"])
    df2.show()
    # +-----+---+------+------+
    # | name|age|weight|height|
    # +-----+---+------+------+
    # |  Bom| 20|  97.6|   165|
    # |Alice| 23|  90.0|   160|
    # +-----+---+------+------+
    
    df2_no_header = spark.createDataFrame(list2)
    df2_no_header.show()
    # +-----+---+----+---+
    # |   _1| _2|  _3| _4|
    # +-----+---+----+---+
    # |  Bom| 20|97.6|165|
    # |Alice| 23|90.0|160|
    # +-----+---+----+---+
    
    # 列表里是字典
    list3 = [{"name": "Bom", "age": 20, "weight": 97.6, "height": 165},
             {"name": "Alice", "age": 23, "weight": 90.0, "height": 160}]
    
    df3 = spark.createDataFrame(list3)
    df3.show()
    # +---+------+-----+------+
    # |age|height| name|weight|
    # +---+------+-----+------+
    # | 20|   165|  Bom|  97.6|
    # | 23|   160|Alice|  90.0|
    # +---+------+-----+------+
    

    通过列表创建dataframe,列表里面可以是列表也可以是元组。

    从json文件创建

    json文件people.json:

    {"name":"Michael"}
    {"name":"Andy", "age":30}
    {"name":"Justin", "age":19}
    

    spark代码:

    # -*- coding: utf-8 -*-
    from pyspark.sql import SparkSession
    
    spark = SparkSession\
        .builder\
        .master("local")\
        .appName("create df from json")\
        .getOrCreate()
    
    df = spark.read.json("file:///Users/zhi/Documents/pycharm/spark_project/spark_test/people.json")
    df.show()
    
    # +----+-------+
    # | age|   name|
    # +----+-------+
    # |null|Michael|
    # |  30|   Andy|
    # |  19| Justin|
    # +----+-------+
    

    从字典创建

    目前还没有想到直接从字典转的df的方式,现在只能借用pandas转成df,再转成spark的df格式:

    # -*- coding: utf-8 -*-
    from pyspark.sql import SparkSession
    import pandas as pd
    
    spark = SparkSession\
        .builder\
        .master("local")\
        .appName("create df")\
        .getOrCreate()
    
    dict1 = {"name": ["Bom", "Alice"],
             "age": [20, 23],
             "weight": [97.6, 90.0],
             "height": [165, 160]}
    
    df1 = pd.DataFrame(dict1)
    spark_df = spark.createDataFrame(df1)
    spark_df.show()
    
    # +-----+---+------+------+
    # | name|age|weight|height|
    # +-----+---+------+------+
    # |  Bom| 20|  97.6|   165|
    # |Alice| 23|  90.0|   160|
    # +-----+---+------+------+
    

    要是有直接转换的方法,望投评论,感激不尽:)

    相关文章

      网友评论

          本文标题:spark创建DataFrame

          本文链接:https://www.haomeiwen.com/subject/kdbwgktx.html