HiveSql-常用语法

作者: LeonTung | 来源:发表于2020-03-08 21:53 被阅读0次

# 拼接排序

select category_id,

regexp_replace(

concat_ws(',', -- 对排序后array拼接

sort_array( -- 对数组排序

collect_list( -- 竖转横

concat_ws(':',lpad(cast(rn as string),4,'0'),cast(topic_id as string)) -- 字符串排序需转为同位数

)

),

'\\d+\:','') -- 替换掉用于排序的拼接 0001:

from topic_recommend_score

where rn >= 1 and rn <= 1000

group by category_id

;

# json字符串解析

get_json_object(page_attr,'$.goods_id')

# 时间函数

(1) 获取当前时间: from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss')

# 条件中位数

percentile_approx(if(click_ps_cnt>0 and label=1,click_ps_cnt,null),0.5)

# 排名

(1) row_number() over() -- 123

(2) rank() over() -- 113

(3) dense_rank() over() -- 112

(4) ntile(3) over() -- 1112233(7r), 按数量平均切成3片(每片数量差值不超过1, 默认增加第一切片), 常用归一化 rn/3

# 字符串split

(1) lateral view explode (split(tagids, ',')) s as tagid -- hive

(2) cross join unnest (split(tagids, ',')) as s (tagid) -- presto

# 判断a字段是否包含b字段的内容

(1) a like concat('%', b, '%')

(2) array_contains(split(a, ' '), b) -- 此方法只适合判断a数组切分后是否有b元素, 如 a='xy', b='x', 此时返回false.

网友评论

本文标题：HiveSql-常用语法

本文链接：https://www.haomeiwen.com/subject/aeqsdhtx.html

HiveSql-常用语法