1.mysql中的连接为group_concat()
hive中的连接为collect_set()
把每个分组的其他字段,按照逗号进行拼接,得到一个最终字符串;
https://blog.csdn.net/sam_hzj/article/details/80099735
2.hive中字段的合并
contact:简单合并功能;
select concat(`$date`, ' ',`$time`) as operate_time, course_id, `$tid`, `$code` from friday.t_fa_event_r_easicare where `$tid` = 'c420c8f2e42a49c78a642e7eb6894c35' order by operate_time
3.hive中在一个排序的基础上对另一字段进行排序
使用order by多个条件完成。
select * from
(select concat(easicare.`$date`, ' ',easicare.`$time`) as operate_time,
easicare.`$tid`,
easicare.`$code`,
trace.event_id
from friday.t_fa_event_r_easicare easicare
left join tmp_db.tmp_parent_event_trace trace
on easicare.`$code`= trace.event)friday_event_disorder
where friday_event_disorder.event_id is not null and friday_event_disorder.$tid != ''
order by friday_event_disorder.`$tid` desc ,friday_event_disorder.operate_time desc
其结果是先对数据表中的$tid
进行降序排列,然后再对降序排列后的顺序再跟据operate_time进行排序;
4.group多个字段
select count(a),b,c from test group by b,c
5.查询仅出现一次的数据
https://blog.csdn.net/u012408083/article/details/51648956
https://blog.csdn.net/chwow/article/details/79745508
select * from chat where nickname in (select nickname from chat group by nickname having count(nickname)>1);
6.如果存在相同的user_id,但是每个create_time不一样,现在的需求是根据create_time创建时间选取最早的,那么思路是现根据user_id进行分组,然后根据user_id,create_time进行排序,取row_number 为1的值
方法:ROW_NUMBER() OVER
SELECT * FROM(SELECT *, ROW_NUMBER() OVER(PARTITION BY id1,id2 ORDER BY id1,id2 DESC) RN FROM test_table)T WHERE T.RN = 1 AND id1 is not null and id2 is not null
其中partition by意思是根据其后面的条件进行分组;order by是根据其后面的条件进行排序;row_number() over会根据该两个条件自动生成RN,此时再通过where语句找到对应值为1的即可
网友评论