美文网首页
186、Spark 2.0之Dataset开发详解-初步体验un

186、Spark 2.0之Dataset开发详解-初步体验un

作者: ZFH__ZJ | 来源:发表于2019-02-11 17:38 被阅读0次

    需求

    计算部门平均年龄与薪资
    计算部门性别平均年龄与薪资

    数据

    department.json

    {"id": 1, "name": "Technical Department"}
    {"id": 2, "name": "Financial Department"}
    {"id": 3, "name": "HR Department"}
    

    employee.json

    {"name": "Leo", "age": 25, "depId": 1, "gender": "male", "salary": 20000}
    {"name": "Marry", "age": 30, "depId": 2, "gender": "female", "salary": 25000}
    {"name": "Jack", "age": 35, "depId": 1, "gender": "male", "salary": 15000}
    {"name": "Tom", "age": 42, "depId": 3, "gender": "male", "salary": 18000}
    {"name": "Kattie", "age": 21, "depId": 3, "gender": "female", "salary": 21000}
    {"name": "Jen", "age": 30, "depId": 2, "gender": "female", "salary": 28000}
    {"name": "Jen", "age": 19, "depId": 2, "gender": "female", "salary": 8000}
    

    代码

    object DepartmentAvgSalaryAndAgeStat {
    
      def main(args: Array[String]): Unit = {
        val sparkSession= SparkSession
          .builder()
          .master("local")
          .appName("DepartmentAvgSalaryAndAgeStatScala")
          .getOrCreate()
    
        import sparkSession.implicits._
        import org.apache.spark.sql.functions._
    
        val deptmentPath = this.getClass.getClassLoader.getResource("department.json").getPath
        val employeePath = this.getClass.getClassLoader.getResource("employee.json").getPath
    
        val deptmentDF = sparkSession.read.json(deptmentPath)
        val employeeDF = sparkSession.read.json(employeePath)
    
    
        deptmentDF.show()
        deptmentDF.printSchema()
        employeeDF.show()
        employeeDF.printSchema()
        // 需求1
        deptmentDF.join(employeeDF, $"id" === $"depId")
            .groupBy(deptmentDF("id"))
            .agg(avg(employeeDF("age")), avg(employeeDF("salary")))
            .show()
        // 需求2
        employeeDF
          .filter("age > 20")
          .join(deptmentDF, $"depId"===$"id")
          .groupBy(deptmentDF("name"), employeeDF("gender"))
          .agg(avg(employeeDF("salary")), avg(employeeDF("age")))
          .show()
      }
    }
    

    相关文章

      网友评论

          本文标题:186、Spark 2.0之Dataset开发详解-初步体验un

          本文链接:https://www.haomeiwen.com/subject/kikqeqtx.html