大牛手把手来教!阿里云DLA如何分析Table Store的数据

作者: 阿里云数据库 | 来源:发表于2019-08-20 09:46 被阅读3次

    0. Data Lake Analytics(简称DLA)介绍

    数据湖(Data Lake)是时下热门的概念,更多阅读可以参考:https://en.wikipedia.org/wiki/Data_lake

    基于数据湖,可以不用做任何ETL、数据搬迁等过程,实现跨各种异构数据源进行大数据关联分析,从而极大的节省成本和提升用户体验。

    以及AWS和Azure关于Data Lake的解读:

    AWS:https://amazonaws-china.com/big-data/datalakes-and-analytics/what-is-a-data-lake/

    Azure:「链接」

    终于,阿里云现在也有了自己的数据湖分析产品:「链接」

    可以点击申请使用(目前公测阶段还属于邀测模式),体验本教程分析OTS数据之旅。

    产品文档:Data Lake Analytics-阿里云

    1. Table Store(简称OTS)介绍

    关于Table Store的详细介绍,请看:

    什么是表格存储产品简介表格存储-阿里云

    关于DLA+Table Store的入门体验:阿里云登录 - 欢迎登录阿里云,安全稳定的云计算服务平台

    2. DLA和OTS等存储引擎间的关系

    DLA是上层MPP架构的SQL执行引擎,以MySQL语法作为语法API而实现,解决实时OLAP分析需求;

    OTS是下层数据存储引擎,基于LSM实现,与HBase、BigTable有类似的设计和实现

    DLA支持连接多种存储引擎,除了OTS,还有OSS、ADS、RDS等,并且这些引擎可以做实时混合分析

    DLA在计算一个Query时,通过OTS核心接口,查询下层数据并参与上层分析

    DLA是大小写不敏感的,而OTS是大小写敏感的;

    3. DLA分析OTS最核心的链路

    image

    4. OTS和DLA元信息映射逻辑

    库和表等概念映射

    image

    OTS的概念与DLA的概念映射

    字段的映射关系

    OTS的概念与DLA的类型映射

    5. 购买OTS的实例,并写入数据

    下面,我们开始真正的操作:

    开通OTS服务,并购买OTS实例:概述快速入门表格存储-阿里云

    进入OTS的管理控制台,选择合适的region,创建实例和表:阿里云登录 - 欢迎登录阿里云,安全稳定的云计算服务平台

    当然,也可以选择通过SDK来创建表并写入数据:前言Java SDK_SDK 参考表格存储-阿里云

    6. 查看OTS的实例,获取关键信息

    下面,我就以我们的测试数据,来开启整个过程(跳过具体的申请步骤):

    a) 查看目前DLA已开通的Region(阿里云登录 - 欢迎登录阿里云,安全稳定的云计算服务平台),并确保与你的OTS在同一个Region:

    b) 进入OTS管理控制台,选择杭州Region,查看我的实例(标准TPC-H生成的测试集,有8张表;已提前建好库表,并通过SDK写入了数据):

    image

    c) 查看实例信息,看到相关的endpoint(DLA目前支持公网,所以请选择私网),这里以hz-tpch-1x-vol作测试:

    d) 查看nation表定义(表名、主键名、主键类型、多主键顺序等)和数据,用作后续对比测试:

    7. 用户开通DLA账号步骤:

    用户具备了阿里云账号(主账号);

    用户进入产品介绍页,开通DLA并进入控制台:Data Lake Analytics_云上交互式数据查询分析服务_数据分析平台|系统 - 阿里云

    等用户开通之后,会在你的短信、站内信、邮箱收到账号相关的信息(内容模板可能会升级):

    用户通过在页面上查看一下,得到如下的访问入口信息:

    如下是基于mysql/jdbc方式通过公网经典endpoint连接到dla杭州集群:

    MySQL命令行:
    
    mysql -h<您的DLA经典endpoint,在DLA的console上> -P10000 -u -p -c -A
    
    JDBC URL:
    
    jdbc:mysql://<您的DLA经典endpoint,在DLA的console上>:10000/
    
    username=
    
    password=
    

    8. DLA和OTS网络连通性问题

    目前DLA和OTS服务之间,通过VPC相关的策略,是直接为用户打通网络环境的,用户无需担心这个过程。但DLA目前不支持公网访问,请务必使用OTS的VPC Endpoint!

    9. 使用DLA,连接你的OTS,进行查询和分析

    注:我们是多租户场景的,所以新用户刚进去时看不到任何库表;

    1)创建自己的DLA库(相关信息从上述过程中查找):

    mysql> create database hangzhou_ots_test with dbproperties (
    
    catalog = 'ots',
    
    location = 'https://hz-tpch-1x-vol.cn-hangzhou.vpc.tablestore.aliyuncs.com',
    
    instance = 'hz-tpch-1x-vol'
    
    );
    
    Query OK, 0 rows affected (0.23 sec)
    
    #hangzhou_ots_test ---请注意库名,允许字母、数字、下划线
    
    #catalog = 'ots', ---指定为ots,是为了区分其他数据源,比如oss、rds等
    
    #location = 'https://xxx' ---ots的endpoint,从实例上可以看到
    
    #instance = 'hz-tpch-1x-vol' ---指定instance名,因为endpoint可以不带实例名;最终映射到DLA的schema
    

    2)查看自己创建的库:

    mysql> show databases;
    
    +------------------------------+
    
    | Database |
    
    +------------------------------+
    
    | hangzhou_ots_test |
    
    +------------------------------+
    
    1 rows in set (0.22 sec)
    
    mysql> show create database hangzhou_ots_test;
    
    +-------------------+-------------------------------------------------------------------------+
    
    | Database | Create Database |
    
    +-------------------+-------------------------------------------------------------------------+
    
    | hangzhou_ots_test | CREATE DATABASE `hangzhou_ots_test`
    
    WITH DBPROPERTIES (
    
    CATALOG = 'ots',
    
    LOCATION = 'https://hz-tpch-1x-vol.cn-hangzhou.vpc.tablestore.aliyuncs.com',
    
    INSTANCE = 'hz-tpch-1x-vol'
    
    ) |
    
    +-------------------+-------------------------------------------------------------------------+
    
    1 row in set (0.31 sec)
    

    3)查看自己的DLA表:

    mysql> use hangzhou_ots_test;
    
    Database changed
    
    mysql> show tables;
    
    Empty set (0.30 sec)
    

    4)创建DLA表,映射到OTS的表:

    mysql> CREATE EXTERNAL TABLE `nation` (
    
    `N_NATIONKEY` bigint not NULL ,
    
    `N_COMMENT` varchar(100) NULL ,
    
    `N_NAME` varchar(100) NULL ,
    
    `N_REGIONKEY` bigint NULL ,
    
    PRIMARY KEY (`N_NATIONKEY`)
    
    );
    
    Query OK, 0 rows affected (0.36 sec)
    
    ## `N_NATIONKEY` int not NULL ---- 如果是主键的话,必须要not null
    
    ## PRIMARY KEY (`N_NATIONKEY`) ---- 务必与ots中的主键顺序相同;名称的话也要对应
    

    5)查看自己创建的表和相关的DDL语句:

    mysql> show tables;
    
    +------------+
    
    | Table_Name |
    
    +------------+
    
    | nation |
    
    +------------+
    
    1 row in set (0.35 sec)
    
    mysql> show create table nation;
    
    +--------+--------------------------------------------------------------------------------------------------------------------------+
    
    | Table | Create Table |
    
    +--------+--------------------------------------------------------------------------------------------------------------------------+
    
    | nation | CREATE EXTERNAL TABLE `nation` (
    
    `n_nationkey` int not NULL COMMENT '',
    
    `n_comment` varchar(100) NULL COMMENT '',
    
    `n_name` varchar(100) NULL COMMENT '',
    
    `n_regionkey` int NULL COMMENT '',
    
    PRIMARY KEY (`n_nationkey`)
    
    )
    
    TBLPROPERTIES (COLUMN_MAPPING = 'n_nationkey,N_NATIONKEY; n_comment,N_COMMENT; n_name,N_NAME; n_regionkey,N_REGIONKEY; ')
    
    COMMENT '' |
    
    +--------+-------------------------------------------------------------------------------------------------------------------------+
    
    1 row in set (0.30 sec)
    

    6)开始查询和分析(没有做太复杂的query;用户可以分析自己的数据,符合mysql的语法)

    
    +-------+
    
    | count(*) |
    
    +-------+
    
    | 25 |
    
    +-------+
    
    1 row in set (1.19 sec)
    
    mysql> select * from nation;
    
    +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
    
    | n_nationkey | n_comment | n_name | n_regionkey |
    
    +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
    
    | 0 | haggle. carefully final deposits detect slyly agai | ALGERIA | 0 |
    
    | 1 | al foxes promise slyly according to the regular accounts. bold requests alon | ARGENTINA | 1 |
    
    | 2 | y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special | BRAZIL | 1 |
    
    | 3 | eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold | CANADA | 1 |
    
    | 4 | y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d | EGYPT | 4 |
    
    | 5 | ven packages wake quickly. regu | ETHIOPIA | 0 |
    
    | 6 | refully final requests. regular, ironi | FRANCE | 3 |
    
    | 7 | l platelets. regular accounts x-ray: unusual, regular acco | GERMANY | 3 |
    
    | 8 | ss excuses cajole slyly across the packages. deposits print aroun | INDIA | 2 |
    
    | 9 | slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull | INDONESIA | 2 |
    
    | 10 | efully alongside of the slyly final dependencies. | IRAN | 4 |
    
    | 11 | nic deposits boost atop the quickly final requests? quickly regula | IRAQ | 4 |
    
    | 12 | ously. final, express gifts cajole a | JAPAN | 2 |
    
    | 13 | ic deposits are blithely about the carefully regular pa | JORDAN | 4 |
    
    | 14 | pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t | KENYA | 0 |
    
    | 15 | rns. blithely bold courts among the closely regular packages use furiously bold platelets? | MOROCCO | 0 |
    
    | 16 | s. ironic, unusual asymptotes wake blithely r | MOZAMBIQUE | 0 |
    
    | 17 | platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun | PERU | 1 |
    
    | 18 | c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos | CHINA | 2 |
    
    | 19 | ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account | ROMANIA | 3 |
    
    | 20 | ts. silent requests haggle. closely express packages sleep across the blithely | SAUDI ARABIA | 4 |
    
    | 21 | hely enticingly express accounts. even, final | VIETNAM | 2 |
    
    | 22 | requests against the platelets use never according to the quickly regular pint | RUSSIA | 3 |
    
    | 23 | eans boost carefully special requests. accounts are. carefull | UNITED KINGDOM | 3 |
    
    | 24 | y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be | UNITED STATES | 1 |
    
    +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
    
    25 rows in set (1.63 sec)
    

    从图中的id,可以看到,与ots中的数据相同:

    10. 其他相关的文档参考:

    DLA文档专栏:Data Lake Analytics - 知乎

    Data Lake Analytics使用场景:使用场景_产品简介_Data Lake Analytics-阿里云

    OLAP on TableStore——基于Data Lake Analytics的Serverless SQL大数据分析OLAP on TableStore:基于Data Lake Analytics的Serverless SQL大数据分析-云栖社区-阿里云

    使用Data Lake Analytics从OSS清洗数据到AnalyticDB:使用Data Lake Analytics从OSS清洗数据到AnalyticDB-云栖社区-阿里云

    使用Data Lake Analytics 分析OSS数据:阿里云帮助中心-阿里云,领先的云计算服务提供商

    Data Lake Analytics数据库的连接方式:阿里云帮助中心-阿里云,领先的云计算服务提供商

    DLA用户与权限操作:DLA的权限操作的基本体验 - 知乎

    相关文章

      网友评论

        本文标题:大牛手把手来教!阿里云DLA如何分析Table Store的数据

        本文链接:https://www.haomeiwen.com/subject/qkwqsctx.html