美文网首页
ClickHouse 安装和基本操作

ClickHouse 安装和基本操作

作者: 明翼 | 来源:发表于2022-02-14 20:17 被阅读0次

    一 官网安装脚本

    用的是centos版本,命令如下:

    sudo yum install yum-utils
    sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
    sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo
    sudo yum install clickhouse-server clickhouse-client
    
    sudo /etc/init.d/clickhouse-server start
    clickhouse-client # or "clickhouse-client --password" if you set up a password.
    

    速度还挺快的顺利安装完毕。

    更改下数据存储目录,便于测试大数据:

    [root@localhost test]# grep test  /etc/clickhouse-server/config.xml
        <path>/home/test/clickhouse</path>
        <tmp_path>/home/test/clickhouse/tmp/</tmp_path>
        <user_files_path>/home/test/clickhouse/user_files/</user_files_path>
                <path>/home/test/clickhouse/access/</path>
        <format_schema_path>/home/test/clickhouse/format_schemas/</format_schema_path>
    

    注意下,test目录需要更改下权限,简单点操作:

    chown -R clickhouse:clickhouse /home/test/clickhouse
    

    二 启动服务器

    [root@localhost miaohq]# service clickhouse-server start
    [root@localhost miaohq]# clickhouse-client
    ClickHouse client version 22.1.3.7 (official build).
    Connecting to localhost:9000 as user default.
    Connected to ClickHouse server version 22.1.3 revision 54455.
    
    localhost :) show databases;
    
    SHOW DATABASES
    
    Query id: 83b6b5ae-3108-4c2f-970e-9091ab843052
    
    ┌─name───────────────┐
    │ INFORMATION_SCHEMA │
    │ default            │
    │ information_schema │
    │ system             │
    └────────────────────┘
    
    4 rows in set. Elapsed: 0.001 sec. 
    
    localhost :) 
    
    

    三 基本操作命令

    1. 新建数据库
    create database flow;
    use flow;
    
    1. 创建表
    create table flow_org(
        flow_id String,
        print_timestamp DateTime,
        create_date Date,
        timestamp UInt32,
        src_mac String,
        dst_mac String,
        ipversion UInt8,
        src_ip String,
        dst_ip String,
        proto UInt8,
        src_port UInt16,
        dst_port UInt16
    )engine=MergeTree(create_date,(src_ip,dst_ip,src_port,dst_port,timestamp),8192);
    

    建表必须有一个date字段默认用于分区,还有其他很多标引擎,这里面用的是MergeTree,最常用。
    含义:create_date 时间字段默认用于分区,如果不写这个字段,这个MergeTree里面不能有其他值了(见下面),(src_ip,dst_ip,src_port,dst_port,timestamp)构成主键。

    如果不必须定义主键,可以用其他字段得到分区字段。

    CREATE TABLE stats
    (
        `datetime`      DateTime('UTC') DEFAULT now(),
        `worker_id`     UInt64,
        `project_id`    UInt64,
        `platform_type` UInt8,
        `temp`          Array(Int32),
        `fan`           Array(Int32),
        `units`         Int32,
        `power`         Array(Float32),
        `power_total`   Float32
    )
    ENGINE = MergeTree()
    PARTITION BY toYYYYMM(datetime)
    ORDER BY (worker_id, datetime);
    
    1. 导入数据
    clickhouse-client -m --format_csv_delimiter="," --query="insert into flow.flow_org format CSV" < 1644541860.csv 
    
    
    1. 删除数据
      虽然可以实现修改和删除,但是和一般的OLTP数据库不一样,Mutation语句是一种很“重”的操作,而且不支持事务。
      “重”的原因主要是每次修改或者删除都会导致放弃目标数据的原有分区,重建新分区。所以尽量做批量的变更,不要进行频繁小数据的操作。
    # 推荐删除方法
    alter table flow_org drop partition '202202';
    # 不推荐
    alter table flow_org delete WHERE id=56;
    
    1. 查看数据的压缩比:
    SELECT 
        sum(rows) AS `总行数`,
        formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
        formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
        round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
    FROM system.parts;
    
     
    
    SELECT 
        table AS `表名`,
        sum(rows) AS `总行数`,
        formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
        formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
        round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
    FROM system.parts
    WHERE table IN ('flow_org')
    GROUP BY table
    
    

    粗略看了下,我们的数据压缩的后的数据占原来数据的大小为17%。

    1. 查看表结构
      和mysql的很多命令都差不多:
    #比如查看表结构的定义:
    desc test;
    
    1. 导出数据
    clickhouse-client  --query "select * from test where create_time = '2022-02-14 12:00:00'"  --format  CSVWithNames>  /opt/module/data/rs1.csv
    

    四 参考网址

    比较基础比较详细
    https://www.cnblogs.com/shengyang17/p/10894490.html#_label3_0_0_3

    操作函数
    https://www.cnblogs.com/pengpenghuhu/p/14546305.html

    手册:
    https://clickhouse.com/docs/zh/sql-reference/aggregate-functions/reference/topk/

    相关文章

      网友评论

          本文标题:ClickHouse 安装和基本操作

          本文链接:https://www.haomeiwen.com/subject/nquclrtx.html