美文网首页Spark
Spark从入门到精通66:Dataset的其他常用函数

Spark从入门到精通66:Dataset的其他常用函数

作者: 勇于自信 | 来源:发表于2020-07-21 00:47 被阅读0次

    Dataset其他常用函数有
    日期函数:current_date、current_timestamp
    数学函数:round
    随机函数:rand
    字符串函数:concat、concat_ws
    自定义udf和udaf函数
    官网的函数介绍:
    http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions

    实践:
    输入数据:
    employee.json:

    {"name": "Leo", "age": 25, "depId": 1, "gender": "male", "salary": 20000}
    {"name": "Marry", "age": 30, "depId": 2, "gender": "female", "salary": 25000}
    {"name": "Jack", "age": 35, "depId": 1, "gender": "male", "salary": 15000}
    {"name": "Tom", "age": 42, "depId": 3, "gender": "male", "salary": 18000}
    {"name": "Kattie", "age": 21, "depId": 3, "gender": "female", "salary": 21000}
    {"name": "Jen", "age": 30, "depId": 2, "gender": "female", "salary": 28000}
    {"name": "Jen", "age": 19, "depId": 2, "gender": "female", "salary": 8000}
    

    department:

    {"id": 1, "name": "Technical Department"}
    {"id": 2, "name": "Financial Department"}
    {"id": 3, "name": "HR Department"}
    

    代码:

    package com.spark.ds
    
    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.functions._
    
    object OtherFunction {
    
      def main(args: Array[String]): Unit = {
        val spark = SparkSession
          .builder()
          .appName("AggregateFunction")
          .master("local")
          .config("spark.sql.warehouse.dir", "D:/spark-warehouse")
          .getOrCreate()
        val employee = spark.read.json("inputData/employee.json")
        val department = spark.read.json("inputData/department.json")
        import org.apache.spark.sql.functions._
        employee.select(employee("name"), current_date(), current_timestamp(), rand(),
          round(employee("salary"),2), concat(employee("gender"),employee("age")),
          concat_ws("|",employee("gender"),employee("age")))
          .show()
    
      }
    }
    
    

    输出结果:

    +------+--------------+--------------------+-------------------------+----------------+-------------------+-------------------------+
    |  name|current_date()| current_timestamp()|rand(9122844294398711437)|round(salary, 2)|concat(gender, age)|concat_ws(|, gender, age)|
    +------+--------------+--------------------+-------------------------+----------------+-------------------+-------------------------+
    |   Leo|    2020-07-21|2020-07-21 00:47:...|       0.7503891080032685|           20000|             male25|                  male|25|
    | Marry|    2020-07-21|2020-07-21 00:47:...|       0.4340990615089132|           25000|           female30|                female|30|
    |  Jack|    2020-07-21|2020-07-21 00:47:...|       0.2020792875471602|           15000|             male35|                  male|35|
    |   Tom|    2020-07-21|2020-07-21 00:47:...|       0.1784916488061976|           18000|             male42|                  male|42|
    |Kattie|    2020-07-21|2020-07-21 00:47:...|       0.3918989540118957|           21000|           female21|                female|21|
    |   Jen|    2020-07-21|2020-07-21 00:47:...|       0.3349504449575764|           28000|           female30|                female|30|
    |   Jen|    2020-07-21|2020-07-21 00:47:...|       0.8679772995821763|            8000|           female19|                female|19|
    +------+--------------+--------------------+-------------------------+----------------+-------------------+-------------------------+
    

    相关文章

      网友评论

        本文标题:Spark从入门到精通66:Dataset的其他常用函数

        本文链接:https://www.haomeiwen.com/subject/nqkpkktx.html