1.mysql中的连接为group_concat()

hive中的连接为collect_set()
把每个分组的其他字段，按照逗号进行拼接，得到一个最终字符串;
https://blog.csdn.net/sam_hzj/article/details/80099735

2.hive中字段的合并

contact：简单合并功能；

select concat(`$date`, ' ',`$time`) as operate_time, course_id, `$tid`, `$code` from friday.t_fa_event_r_easicare where `$tid` = 'c420c8f2e42a49c78a642e7eb6894c35' order by operate_time

3.hive中在一个排序的基础上对另一字段进行排序

使用order by多个条件完成。

select * from 
    (select concat(easicare.`$date`, ' ',easicare.`$time`) as operate_time,
        easicare.`$tid`,
        easicare.`$code`,
        trace.event_id
    from friday.t_fa_event_r_easicare easicare
    left join tmp_db.tmp_parent_event_trace trace
    on easicare.`$code`= trace.event)friday_event_disorder
where friday_event_disorder.event_id is not null and friday_event_disorder.$tid != ''
order by friday_event_disorder.`$tid` desc ,friday_event_disorder.operate_time desc

其结果是先对数据表中的$tid进行降序排列，然后再对降序排列后的顺序再跟据operate_time进行排序；
4.group多个字段

select count(a),b,c from test group by b,c

5.查询仅出现一次的数据
https://blog.csdn.net/u012408083/article/details/51648956
https://blog.csdn.net/chwow/article/details/79745508

select * from chat where nickname in (select nickname from chat group by nickname having count(nickname)>1);

6.如果存在相同的user_id，但是每个create_time不一样，现在的需求是根据create_time创建时间选取最早的，那么思路是现根据user_id进行分组，然后根据user_id,create_time进行排序，取row_number 为1的值

方法：ROW_NUMBER() OVER
SELECT * FROM(SELECT *, ROW_NUMBER() OVER(PARTITION BY id1,id2 ORDER BY id1,id2 DESC) RN FROM test_table)T WHERE T.RN = 1 AND id1 is not null and id2 is not null

其中partition by意思是根据其后面的条件进行分组；order by是根据其后面的条件进行排序；row_number() over会根据该两个条件自动生成RN，此时再通过where语句找到对应值为1的即可