数据透视表属于数据分析中比较常见的功能,如下面的数据:

我们以姓名为行,科目为列,数据透视后的结果为:

那么hive该如何实现这样的功能呢?我们首先把数据写入到hive表中:
val seqData = Seq(
("小A","语文",80),
("小A","数学",65),
("小A","英语",70),
("小B","语文",82),
("小B","数学",95),
("小B","英语",60),
("小C","语文",88),
("小C","数学",85),
("小C","英语",83)
)
val seq2df = seqData.toDF("name","course","score")
seq2df.write.saveAsTable("default.classinfo12")
结果如下:

hive中其实是没有直接实现数据透视功能的函数的,不过我们可以通过groupby实现类似的功能:
select
name,
sum(if(course = '语文',score,0)) as chinese_score,
sum(if(course = '数学',score,0)) as math_score,
sum(if(course = '英语',score,0)) as english_score
from
default.classinfo12
group by name
结果如下:

可以看到,通过groupby,我们就实现了类似pivot的功能,不过这样的写法也仅适合类别比较少的情况。当类别比较多的时候,可以使用spark来解决:
val df = spark.sql(
"""
|select
| *
|from
| default.classinfo12
""".stripMargin)
.groupBy("name")
.pivot("course")
.agg(sum("score") as "score")
结果如下:

好了,本篇就到这里啦~~
网友评论