美文网首页
Hive Table Sampling – Concept an

Hive Table Sampling – Concept an

作者: 光剑书架上的书 | 来源:发表于2021-09-09 23:21 被阅读0次

    Hive Table Sampling - Concept, Methods and Example, Hive Table Sampling Concept

    The Relational databases like SQL server supports writing queries on a relatively small number of rows from the very large table. In this article, we will check Hive table sampling concept, methods and some examples.

    The Hive TABLESAMPLE clause allows the users to write queries for samples of the data instead of the whole table. The sampling comes handy when you are working on the large tables and it takes time to return results. The TABLESAMPLE clause can be added to any table in the FROM clause.

    Type of Hive Sampling

    There are two type of Hive tables sampling

    • Sampling Bucketized Table
    • Hive Block Sampling

    Hive Table Sampling Syntax

    Bucketized Sampling

    Following is the syntax of the Bucketized Sampling

    SELECT *
    FROM source TABLESAMPLE (BUCKET x OUT OF y [ON colname]) s;
    

    Block Sampling

    SELECT *
    FROM source TABLESAMPLE (n PERCENT) s;
    

    Where, the BUCKET is numbered starting from 1. colname indicates the column on which to sample each row in the table. Instead of colname, use rand() indicating sampling on the entire row instead of an individual column.

    And n is the percent of data size in case of block sampling.

    Hive Sampling Bucketized Table

    The sampling Bucketized table allows you to get sample records using the number of buckets. The Bucketized sampling method can be used when your tables are bucketed.

    You can provide the bucket number starting from 1 along with colname on which to sample each row in the Hive table. You can also use rand() indicating sampling on the entire row instead of an individual column.

    For example, following example provides random sample rows from the bucket 1.

    SELECT *
    FROM SAMPLE_DEMO TABLESAMPLE(BUCKET 1 OUT OF 3 ON rand()) s;
    
    +-------+---------+--------+--+
    | s.id  | s.name  | s.mnt  |
    +-------+---------+--------+--+
    | 3     | ccc     | 1      |
    | 11    | kkk     | 4      |
    +-------+---------+--------+--+
    

    Hive Block Sampling

    This sampling method will allow Hive to pick up at least n% data size. Note that, PERCENT doesn’t necessarily mean the number of rows, it is the percentage of table size. If your table is small then it may return all rows.

    For example, in the following example the input size 0.1% or more will be used for the query.

    SELECT *
    FROM SAMPLE_DEMO TABLESAMPLE(0.01 PERCENT) s;
    
    +-------+---------+--------+--+
    | s.id  | s.name  | s.mnt  |
    +-------+---------+--------+--+
    | 1     | aaa     | 1      |
    | 2     | bbb     | 1      |
    | 3     | ccc     | 1      |
    +-------+---------+--------+--+
    

    Hope this helps 🙂

    相关文章

      网友评论

          本文标题:Hive Table Sampling – Concept an

          本文链接:https://www.haomeiwen.com/subject/hktgwltx.html