一 官网安装脚本
用的是centos版本,命令如下:
sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo
sudo yum install clickhouse-server clickhouse-client
sudo /etc/init.d/clickhouse-server start
clickhouse-client # or "clickhouse-client --password" if you set up a password.
速度还挺快的顺利安装完毕。
更改下数据存储目录,便于测试大数据:
[root@localhost test]# grep test /etc/clickhouse-server/config.xml
<path>/home/test/clickhouse</path>
<tmp_path>/home/test/clickhouse/tmp/</tmp_path>
<user_files_path>/home/test/clickhouse/user_files/</user_files_path>
<path>/home/test/clickhouse/access/</path>
<format_schema_path>/home/test/clickhouse/format_schemas/</format_schema_path>
注意下,test目录需要更改下权限,简单点操作:
chown -R clickhouse:clickhouse /home/test/clickhouse
二 启动服务器
[root@localhost miaohq]# service clickhouse-server start
[root@localhost miaohq]# clickhouse-client
ClickHouse client version 22.1.3.7 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.1.3 revision 54455.
localhost :) show databases;
SHOW DATABASES
Query id: 83b6b5ae-3108-4c2f-970e-9091ab843052
┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default │
│ information_schema │
│ system │
└────────────────────┘
4 rows in set. Elapsed: 0.001 sec.
localhost :)
三 基本操作命令
- 新建数据库
create database flow;
use flow;
- 创建表
create table flow_org(
flow_id String,
print_timestamp DateTime,
create_date Date,
timestamp UInt32,
src_mac String,
dst_mac String,
ipversion UInt8,
src_ip String,
dst_ip String,
proto UInt8,
src_port UInt16,
dst_port UInt16
)engine=MergeTree(create_date,(src_ip,dst_ip,src_port,dst_port,timestamp),8192);
建表必须有一个date字段默认用于分区,还有其他很多标引擎,这里面用的是MergeTree,最常用。
含义:create_date 时间字段默认用于分区,如果不写这个字段,这个MergeTree里面不能有其他值了(见下面),(src_ip,dst_ip,src_port,dst_port,timestamp)构成主键。
如果不必须定义主键,可以用其他字段得到分区字段。
CREATE TABLE stats
(
`datetime` DateTime('UTC') DEFAULT now(),
`worker_id` UInt64,
`project_id` UInt64,
`platform_type` UInt8,
`temp` Array(Int32),
`fan` Array(Int32),
`units` Int32,
`power` Array(Float32),
`power_total` Float32
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(datetime)
ORDER BY (worker_id, datetime);
- 导入数据
clickhouse-client -m --format_csv_delimiter="," --query="insert into flow.flow_org format CSV" < 1644541860.csv
- 删除数据
虽然可以实现修改和删除,但是和一般的OLTP数据库不一样,Mutation语句是一种很“重”的操作,而且不支持事务。
“重”的原因主要是每次修改或者删除都会导致放弃目标数据的原有分区,重建新分区。所以尽量做批量的变更,不要进行频繁小数据的操作。
# 推荐删除方法
alter table flow_org drop partition '202202';
# 不推荐
alter table flow_org delete WHERE id=56;
- 查看数据的压缩比:
SELECT
sum(rows) AS `总行数`,
formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM system.parts;
SELECT
table AS `表名`,
sum(rows) AS `总行数`,
formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM system.parts
WHERE table IN ('flow_org')
GROUP BY table
粗略看了下,我们的数据压缩的后的数据占原来数据的大小为17%。
- 查看表结构
和mysql的很多命令都差不多:
#比如查看表结构的定义:
desc test;
- 导出数据
clickhouse-client --query "select * from test where create_time = '2022-02-14 12:00:00'" --format CSVWithNames> /opt/module/data/rs1.csv
四 参考网址
比较基础比较详细
https://www.cnblogs.com/shengyang17/p/10894490.html#_label3_0_0_3
操作函数
https://www.cnblogs.com/pengpenghuhu/p/14546305.html
手册:
https://clickhouse.com/docs/zh/sql-reference/aggregate-functions/reference/topk/
网友评论