Dataset其他常用函数有
日期函数:current_date、current_timestamp
数学函数:round
随机函数:rand
字符串函数:concat、concat_ws
自定义udf和udaf函数
官网的函数介绍:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions
实践:
输入数据:
employee.json:
{"name": "Leo", "age": 25, "depId": 1, "gender": "male", "salary": 20000}
{"name": "Marry", "age": 30, "depId": 2, "gender": "female", "salary": 25000}
{"name": "Jack", "age": 35, "depId": 1, "gender": "male", "salary": 15000}
{"name": "Tom", "age": 42, "depId": 3, "gender": "male", "salary": 18000}
{"name": "Kattie", "age": 21, "depId": 3, "gender": "female", "salary": 21000}
{"name": "Jen", "age": 30, "depId": 2, "gender": "female", "salary": 28000}
{"name": "Jen", "age": 19, "depId": 2, "gender": "female", "salary": 8000}
department:
{"id": 1, "name": "Technical Department"}
{"id": 2, "name": "Financial Department"}
{"id": 3, "name": "HR Department"}
代码:
package com.spark.ds
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
object OtherFunction {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("AggregateFunction")
.master("local")
.config("spark.sql.warehouse.dir", "D:/spark-warehouse")
.getOrCreate()
val employee = spark.read.json("inputData/employee.json")
val department = spark.read.json("inputData/department.json")
import org.apache.spark.sql.functions._
employee.select(employee("name"), current_date(), current_timestamp(), rand(),
round(employee("salary"),2), concat(employee("gender"),employee("age")),
concat_ws("|",employee("gender"),employee("age")))
.show()
}
}
输出结果:
+------+--------------+--------------------+-------------------------+----------------+-------------------+-------------------------+
| name|current_date()| current_timestamp()|rand(9122844294398711437)|round(salary, 2)|concat(gender, age)|concat_ws(|, gender, age)|
+------+--------------+--------------------+-------------------------+----------------+-------------------+-------------------------+
| Leo| 2020-07-21|2020-07-21 00:47:...| 0.7503891080032685| 20000| male25| male|25|
| Marry| 2020-07-21|2020-07-21 00:47:...| 0.4340990615089132| 25000| female30| female|30|
| Jack| 2020-07-21|2020-07-21 00:47:...| 0.2020792875471602| 15000| male35| male|35|
| Tom| 2020-07-21|2020-07-21 00:47:...| 0.1784916488061976| 18000| male42| male|42|
|Kattie| 2020-07-21|2020-07-21 00:47:...| 0.3918989540118957| 21000| female21| female|21|
| Jen| 2020-07-21|2020-07-21 00:47:...| 0.3349504449575764| 28000| female30| female|30|
| Jen| 2020-07-21|2020-07-21 00:47:...| 0.8679772995821763| 8000| female19| female|19|
+------+--------------+--------------------+-------------------------+----------------+-------------------+-------------------------+
网友评论