美文网首页
Spark学习(十):DataFrame

Spark学习(十):DataFrame

作者: CocoMama190227 | 来源:发表于2019-03-22 15:00 被阅读0次

DataFrame能够方便处理大规模结构化数据。在Scala API中,DataFrame只是Dataset [Row]的类型别名。(参考原文)
下面展示几个DataFrame的基础用法,适合小白入门,包括:

  • 创建DataFrame
  • 设置新的字段名
  • 添加新列
  • 改变元素类型
  • 选择列

例程

import java.io.File
import org.apache.spark.sql.SparkSession

object DataFrame_test {
  def main(args: Array[String]): Unit = {
    println("-------------------------------------通过文件直接创建DataFrame-------------------------------------------")
    val path = "F:/ScalaProject/test/collaborativeFilter/src/main/resources/S1LQG.cvs"
    val spark = SparkSession.builder.master("local").appName("Spark CSV Reader").getOrCreate
    val df = spark.read.format("csv").option("header", "true").load(path)
    df.show()

    println("-----------------------------------------设置新的字段名--------------------------------------------------")
    val newNames = List.range(0, 17).mkString(",").split(",")  // 从0到16的字段
    val dfRename = df.toDF(newNames: _*)
    dfRename.show()

    println("------------------------------------------添加一个新列---------------------------------------------------")
    val df2 = dfRename.withColumn("newColumn", dfRename("2") * 2)
    df2.show()

    println("--------------------------------------改变列的元素类型-------------------------------------------")
    val df3 = df2.withColumn("newColumn", df2("newColumn").cast("int"))  // 修改为int型
    val df4 = df3.select("newColumn")  //选择要返回的列
    df4.show()
  }
}

输出

-------------------------------------通过文件直接创建DataFrame-------------------------------------------
+---------+---------+-------+--------+---------+-----+--------+---------+---------+-------+------+-------+--------+---------+------+-------+-------+
|受端设备侧主轨低频|受端设备侧主轨电压|送端电缆侧电流|接收入口主轨低频|受端电缆侧主轨低频| 功出低频|接收入口主轨电压|受端电缆侧主轨载频|受端电缆侧主轨电压|送端电缆侧载频|  功出电压|送端电缆侧电压|接收入口主轨载频|受端设备侧主轨载频|  功出电流|   功出载频|送端电缆侧低频|
+---------+---------+-------+--------+---------+-----+--------+---------+---------+-------+------+-------+--------+---------+------+-------+-------+
|    235.0|  62340.0| 2460.0|   235.0|    235.0|235.0|  4860.0|  14380.0|    199.0|14380.0|1386.0|  793.0| 14390.0|  14380.0|3490.0|14390.0|  235.0|
|    235.0|  60740.0| 2450.0|   235.0|    235.0|235.0|  4860.0|  14380.0|    199.0|14380.0|1386.0|  793.0| 14390.0|  14380.0|3490.0|14390.0|  235.0|
|    235.0|  60740.0| 2450.0|   235.0|    235.0|235.0|  4860.0|  14380.0|    199.0|14380.0|1386.0|  793.0| 14390.0|  14380.0|3490.0|14390.0|  235.0|

-----------------------------------------设置新的字段名--------------------------------------------------
+-----+-------+------+-----+-----+-----+------+-------+-----+-------+------+-----+-------+-------+------+-------+-----+
|    0|      1|     2|    3|    4|    5|     6|      7|    8|      9|    10|   11|     12|     13|    14|     15|   16|
+-----+-------+------+-----+-----+-----+------+-------+-----+-------+------+-----+-------+-------+------+-------+-----+
|235.0|62340.0|2460.0|235.0|235.0|235.0|4860.0|14380.0|199.0|14380.0|1386.0|793.0|14390.0|14380.0|3490.0|14390.0|235.0|
|235.0|60740.0|2450.0|235.0|235.0|235.0|4860.0|14380.0|199.0|14380.0|1386.0|793.0|14390.0|14380.0|3490.0|14390.0|235.0|
|235.0|60740.0|2450.0|235.0|235.0|235.0|4860.0|14380.0|199.0|14380.0|1386.0|793.0|14390.0|14380.0|3490.0|14390.0|235.0|

------------------------------------------添加一个新列---------------------------------------------------
+-----+-------+------+-----+-----+-----+------+-------+-----+-------+------+-----+-------+-------+------+-------+-----+---------+
|    0|      1|     2|    3|    4|    5|     6|      7|    8|      9|    10|   11|     12|     13|    14|     15|   16|newColumn|
+-----+-------+------+-----+-----+-----+------+-------+-----+-------+------+-----+-------+-------+------+-------+-----+---------+
|235.0|62340.0|2460.0|235.0|235.0|235.0|4860.0|14380.0|199.0|14380.0|1386.0|793.0|14390.0|14380.0|3490.0|14390.0|235.0|   4920.0|
|235.0|60740.0|2450.0|235.0|235.0|235.0|4860.0|14380.0|199.0|14380.0|1386.0|793.0|14390.0|14380.0|3490.0|14390.0|235.0|   4900.0|
|235.0|60740.0|2450.0|235.0|235.0|235.0|4860.0|14380.0|199.0|14380.0|1386.0|793.0|14390.0|14380.0|3490.0|14390.0|235.0|   4900.0|

--------------------------------------改变列的元素类型-------------------------------------------
+---------+
|newColumn|
+---------+
|     4920|
|     4900|
|     4900|

相关文章

网友评论

      本文标题:Spark学习(十):DataFrame

      本文链接:https://www.haomeiwen.com/subject/jcnkvqtx.html