特别注意:所有ClickHouse SQL中的函数
(以下用高亮
注明)都区分大小写,需要严格按照文档的写法!
- 基础逻辑
- union → union all
- 双引号(") → 单引号(')
- null 判断:
isNotNull
(),isNull
() - case 判断,分层处理:
- 改为
if
或者multiIf
-
multiIf
(left < right, 'left is smaller', left > right, 'left is greater', left = right, 'Both equal', 'Null value')
- 改为
- cast(注:cast/floor/ceiling/round不属于
函数
,可自行选用全大写或全小写)- 取整
- CAST(**foo **as bigint) 或 CAST(foo, 'bigint')均可
- CAST(**foo **as int) 或 CAST(foo, 'int')均可
- CAST(floor(foo) as int)
- CAST(ceiling(foo) as int)
- CAST(round(foo, 0) as int)
- 取整
- 字符串长度
- length()
- 字符个数
CHAR_LENGTH
()
- 数字→ 字符串
- cast(1111111 as
String
) - cast(1111111, '
String
') - 注意:String首字母大写
- cast(1111111 as
- UDF (日期相关、IP、汇率、数组处理)
-
find in set
-
find_in_set(channel,'{self.channel_list_str}') >= 1 →
has
(splitByChar
(',', '{self.channel_list_str}'), channel) = 1 -
注:{self.channel_list_str}不可带括号,逗号分隔之间不能有空格,否则需要进行处理
-
-
split
-
split(ip,'#')[0] → splitByChar('#',ip)[1]
-
注:clickhouse数组第一位index=1
-
-
汇率原UDF→
dictGet
('test.currency_exchange_rate_mysql', 'exchange_rate', (upper(assumeNotNull
(currencytype)), 'CNY',subString(ds,1,6))) as exchange_rate- 注意:如果原来是JOIN的可以暂时不做修改,但采用字典方式效率会高很多
-
日期相关
-
日期计算
- day + 1
- (yyyyMMdd或yyyy-MM-dd → yyyyMMdd)→
formatDateTime
(addDays
(parseDateTimeBestEffort
(ds), 1), '%Y%m%d')
- (yyyyMMdd或yyyy-MM-dd → yyyyMMdd)→
- day - 1
- (yyyyMMdd或yyyy-MM-dd → yyyyMMdd)→
formatDateTime
(addDays
(parseDateTimeBestEffort
(ds), -1), '%Y%m%d') - subtractDays(parseDateTimeBestEffort(ds),12)
- (yyyyMMdd或yyyy-MM-dd → yyyyMMdd)→
- day + 1
-
求一年的周数→
toISOWeek
(parseDateTimeBestEffort
(ds)) -
求当前日期是第几周→toYearWeek(toDate(regist_date))
-
两个日期相减→dateDiff('day',toDate(last_login_date),toDate(regist_date))
-
日期转月份(yyyy-MM)
- from_unixtime(to_unix_timestamp(ds,'yyyyMMdd'),'yyyy-MM') →
formatDateTime
(parseDateTimeBestEffort
(ds), '%Y-%m') - 注:parseDateTimeBestEffort,传入yyyyMMdd/yyyy-MM-dd都可
- from_unixtime(to_unix_timestamp(ds,'yyyyMMdd'),'yyyy-MM') →
-
日期转格式(yyyyMMdd与yyyy-MM-dd互转)
- from_unixtime(to_unix_timestamp(ds, 'yyyyMMdd'), 'yyyy-MM-dd')→
formatDateTime
(parseDateTimeBestEffort
(ds), '%Y-%m-%d') - from_unixtime(to_unix_timestamp(ds,'yyyy-MM-dd'), 'yyyyMMdd')→
formatDateTime
(parseDateTimeBestEffort
(ds), '%Y%m%d')
- from_unixtime(to_unix_timestamp(ds, 'yyyyMMdd'), 'yyyy-MM-dd')→
-
时间戳 → 日期时间
-
FROM_UNIXTIME
(CAST(floor(1600000000000 / 1000) as int))
-
-
时间戳 → 日期 (yyyyMMdd/yyyy-MM-dd)
-
formatDateTime
(FROM_UNIXTIME
(CAST(floor(1600000000000 / 1000) as int)), '%Y%m%d') -
formatDateTime
(FROM_UNIXTIME
(CAST(floor(1600000000000 / 1000) as int)), '%Y-%m-%d')
-
-
日期时间字符串 → 时间戳
- 采用默认时区
toUnixTimestamp
('2020-09-13 20:26:40') == 1600000000 - 限定北京时区
toUnixTimestamp
('2020-09-13 20:26:40', 'Asia/Shanghai') == 1600000000
- 采用默认时区
-
注:ds就是String类型,可以省略
toString
-
-
explore:使用array join
-
- UDTF (数组处理)
- 窗口分析函数
-- rank 取top n
order by cid, event_time desc
limit 1 by cid
-- rownumber 取top n
select cid, event_time , currencyamount, currencytype, row_number
from (
select cid,
groupArray(event_time) AS event_time,
groupArray(currencyamount) AS currencyamount,
groupArray(currencytype) AS currencytype,
arrayEnumerate(event_time) as row_number
from (
select cid, event_time, currencyamount, currencytype
from db_game_apps.game_s_currency
where ds = '20201212'
and appid = 'APPID_TEST'
order by cid, event_time asc
limit 3 by cid
) a
GROUP BY cid
) array join event_time , currencyamount, currencytype, row_number
-
game_event中字典字段的使用
- 用过在key中查询位置来定位:parameters.Value[indexOf(parameters.Key, 'tag_id')] AS tag_id,
- 后续可以等ClickHouse支持map type https://github.com/ClickHouse/ClickHouse/pull/15806
-
join
- local join需要满足左表不使用subquery,如果distributed_product_mode='local',右表可以使用分布式表。否则右表需要使用local表
-
其他
- select distinct a, b, c → select a, b, c ... group by a, b, c
- 注意:datatype,timestamp都是按数字类型建表了,等值筛选条件注意去掉单引号
- 注意:付费表都经过了实时去重和离线去重,不需要使用distinct了
-
可选
- 优化掉不必要的distinct
网友评论