美文网首页大数据
使用parquet-tools工具查看parquet文件

使用parquet-tools工具查看parquet文件

作者: 代码足迹 | 来源:发表于2021-12-30 14:08 被阅读0次

cdh默认安装了。我安装6.2下面对应的路径是/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/bin/parquet-tools

如果找不到使用以下命令查看一下具体位置

find / -name parquet-tools

直接运行

/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/bin/parquet-tools
可以看到帮助信息。看了一下主要有8个选项;

  • cat命令 打印所有记录(与linux命令同)
parquet-tools cat:
Prints the content of a Parquet file. The output contains only the data, no
metadata is displayed
usage: parquet-tools cat [option...] <input>
where option is one of:
       --debug     Enable debug output
    -h,--help      Show this help string
    -j,--json      Show records in JSON format.
       --no-color  Disable color output even if supported
where <input> is the parquet file to print to stdout

  • head 打印前面几条记录的数据(默认5条)(与linux命令同)
parquet-tools head:
Prints the first n record of the Parquet file
usage: parquet-tools head [option...] <input>
where option is one of:
       --debug          Enable debug output
    -h,--help           Show this help string
    -n,--records <arg>  The number of records to show (default: 5)
       --no-color       Disable color output even if supported
where <input> is the parquet file to print to stdout

  • schema 打印parquet的schema
parquet-tools schema:
Prints the schema of Parquet file(s)
usage: parquet-tools schema [option...] <input>
where option is one of:
    -d,--detailed      Show detailed information about the schema.
       --debug         Enable debug output
    -h,--help          Show this help string
       --no-color      Disable color output even if supported
    -o,--originalType  Print logical types in OriginalType representation.
where <input> is the parquet file containing the schema to show

  • meta 输出元数据信息(可以看到文件是否压缩,压缩的方式)
parquet-tools meta:
Prints the metadata of Parquet file(s)
usage: parquet-tools meta [option...] <input>
where option is one of:
       --debug         Enable debug output
    -h,--help          Show this help string
       --no-color      Disable color output even if supported
    -o,--originalType  Print logical types in OriginalType representation.
where <input> is the parquet file to print to stdout

  • dump
parquet-tools dump:
Prints the content and metadata of a Parquet file
usage: parquet-tools dump [option...] <input>
where option is one of:
    -c,--column <arg>  Dump only the given column, can be specified more than
                       once
    -d,--disable-data  Do not dump column data
       --debug         Enable debug output
    -h,--help          Show this help string
    -m,--disable-meta  Do not dump row group and page metadata
    -n,--disable-crop  Do not crop the output based on console width
       --no-color      Disable color output even if supported
where <input> is the parquet file to print to stdout

  • merge 看介绍应该是合并多个文件使用(我没有具体操作过)
parquet-tools merge:
Merges multiple Parquet files into one. The command doesn't merge row groups,
just places one after the other. When used to merge many small files, the
resulting file will still contain small row groups, which usually leads to bad
query performance.
usage: parquet-tools merge [option...] <input> [<input> ...] <output>
where option is one of:
       --debug     Enable debug output
    -h,--help      Show this help string
       --no-color  Disable color output even if supported
where <input> is the source parquet files/directory to be merged
   <output> is the destination parquet file

  • rowcount 打印记录数
parquet-tools rowcount:
Prints the count of rows in Parquet file(s)
usage: parquet-tools rowcount [option...] <input>
where option is one of:
    -d,--detailed  Detailed rowcount of each matching file
       --debug     Enable debug output
    -h,--help      Show this help string
       --no-color  Disable color output even if supported
where <input> is the parquet file to count rows to stdout

  • size 打印文件的大小
parquet-tools size:
Prints the size of Parquet file(s)
usage: parquet-tools size [option...] <input>
where option is one of:
    -d,--detailed      Detailed size of each matching file
       --debug         Enable debug output
    -h,--help          Show this help string
       --no-color      Disable color output even if supported
    -p,--pretty        Pretty size
    -u,--uncompressed  Uncompressed size

相关文章

  • 使用parquet-tools工具查看parquet文件

    cdh默认安装了。我安装6.2下面对应的路径是/opt/cloudera/parcels/CDH-6.2.0-1....

  • 将Avro数据转换为Parquet格式

    本文主要测试将Avro数据转换为Parquet格式的过程并查看 Parquet 文件的 schema 和元数据。 ...

  • spark题03

    1.Spark使用parquet文件存储格式能带来哪些好处? 使用 parquet 主要是对 Spark SQL ...

  • 读取parquet文件工具

    https://github.com/chhantyal/parquet-cli[https://github.c...

  • parquet(2)读写

    1、大多数情况下,我们会使用高级工具来处理parquet文件,比如hive spark impala,不过有时候我...

  • spark 读取 hdfs 数据分区规则

    下文以读取 parquet 文件 / parquet hive table 为例: hive metastore ...

  • 简单命令行使用

    命令行工具的使用 初始了解 查看当前目录下的文件 切换目录 文件操作

  • APK反编译步骤

    一、使用的工具 apktool(资源文件获取) 作用:资源文件获取,可以提取出图片文件和布局文件进行使用查看 de...

  • Android内存泄漏分析

    1 .工具介绍 1.1使用Android Studio查看内存快照: (1)可以查看对象对应的文件目录及内容,比如...

  • IOS 逆向工具的使用

    工具的使用 MachO View 查看MachO(可执行文件)的文件信息 LLDB low level debug...

网友评论

    本文标题:使用parquet-tools工具查看parquet文件

    本文链接:https://www.haomeiwen.com/subject/rfeiqrtx.html