python读取excel生成HiveDDL

作者: 文艺小卿年 | 来源:发表于2020-02-10 10:38 被阅读0次

python读取excel生成HiveDDL
用python·批量生成含变量字段的excel文档
python操作excel
ArcPy基于Excel数据创建矢量图层并生成属性表字段、内容
Excel读取并存入数据库通用处理【原创】
python处理excel表格
Python生成文本格式的excel\xlwt生成文本格式的ex
python如何读取excel_Markdown测试
python如何读取excel
python 读取 excel 生成 html 页面

需求说明：

数仓建模过程中，一般会在excel中先将数仓各层的表和字段都梳理好，然后再建表和开发。该程序可以方便的利用excel中的字段名、字段类型、字段描述等信息生成HiveDDL语句

数据示例：

在这里插入图片描述

输出示例：

在这里插入图片描述

代码：

import os
import sys

import xlrd

# 要读取的excel的名称
excel_name = sys.argv[1]
# 存放ddl的目录
dir = sys.argv[2]
# 开始读取的sheet页的下标（默认要读取的sheet页是以此为分割）
num=sys.argv[3]


# 检验是否全是英文字符
def is_all_English(strs):
    for _char in str(strs):
        if '\u4e00' <= _char <= '\u9fa5':
            return False
    return True


book = xlrd.open_workbook(excel_name)
for sheet_num in range(int(num), len(book.sheets()) - 1):
    # 工作sheet
    sheet_index = book.sheets()[sheet_num]

    if sheet_index.cell(0,0).value.strip().startswith('SELECT'):
        print('-------请检查'+ sheet_index.name+'的数据------')

    else:
        # 如果是明确标出表名和标注释的格式
        if sheet_index.cell(1, 1).value != '':
            # 表注释
            table_comment = sheet_index.cell(1, 1).value

            # 表的总行数
            row_cnt = sheet_index.nrows

            # 拼接DDL语句
            sql = 'CREATE TABLE IF NOT EXISTS ' + sheet_index.name + '(  \n'
            str_columns = ''
            for i in range(3, row_cnt):
                # 判断第一列数据是否为英文，中文字符是注意事项，不是正常数据,空数据也不是正常数据
                if is_all_English(sheet_index.cell(i, 0).value) and sheet_index.cell(i, 0).value != '':
                    str_columns += '        ,' + '{:<30}'.format(sheet_index.cell(i, 1).value.strip()) + '{:<10}'.format(
                        sheet_index.cell(i, 2).value) + '              '
                    if sheet_index.cell(i, 3).value.replace("\n", ' ').replace('comment \"','').replace('\"','').replace('\"','')  != '':
                        str_columns += 'COMMENT  \'' + sheet_index.cell(i, 3).value.replace("\n", ' ').replace('comment \"','').replace('\"','').replace('\"','')  + '\''
                    str_columns += '\n'

        else:
            # 表注释
            table_comment = sheet_index.cell(0, 1).value

            # 表的总行数
            row_cnt = sheet_index.nrows

            # 拼接DDL语句
            sql = 'CREATE TABLE IF NOT EXISTS ' + sheet_index.name + '(  \n'
            str_columns = ''
            for i in range(3, row_cnt):
                # 判断第一列数据是否为英文，中文字符是注意事项，不是正常数据,空数据也不是正常数据
                if is_all_English(sheet_index.cell(i, 0).value) and sheet_index.cell(i, 0).value != '':
                    str_columns += '        ,' + '{:<30}'.format(sheet_index.cell(i, 0).value.strip()) + '{:<5}'.format(
                        sheet_index.cell(i, 1).value) + '              '
                    if sheet_index.cell(i, 2).value.replace("\n", ' ').replace('comment \"','').replace('\"','').replace('\"','')  != '':
                        str_columns += 'COMMENT  \'' + sheet_index.cell(i, 2).value.replace("\n", ' ').replace('comment \"','').replace('\"','').replace('\"','') + '\''
                    str_columns += '\n'

        str_columns = '         ' + str_columns.strip()[1:]

        sql += str_columns
        sql += '\n ) COMMENT \'' + table_comment + "\' \n ROW FORMAT DELIMITED \n FIELDS TERMINATED BY '\\t'  \n STORED AS parquet;\n"
        # print(sql)
    with open(os.path.join(dir,sheet_index.name + '.sql'), 'a')as file_handle:
        file_handle.write(sql)
    print(os.path.join(dir, sheet_index.name + '.sql') + '写入完毕')

总结：

梳理字段的过程中，也要有一定的开发规范，这样也便于不同人员之间的沟通和维护，保障其设计理念、处理方法在不同阶段保持连续性。

网友评论

本文标题：python读取excel生成HiveDDL

本文链接：https://www.haomeiwen.com/subject/tmxaxhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python读取excel生成HiveDDL

需求说明：

数据示例：

输出示例：

代码：

总结：

相关文章

python读取excel生成HiveDDL

用python·批量生成含变量字段的excel文档

python操作excel

ArcPy基于Excel数据创建矢量图层并生成属性表字段、内容

Excel读取并存入数据库通用处理【原创】

python处理excel表格

Python生成文本格式的excel\xlwt生成文本格式的ex

python如何读取excel_Markdown测试

python如何读取excel

python 读取 excel 生成 html 页面

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

呆鸟的Python数据分析

python数据分析人工智能机器学习

大数据爬虫Python AI Sql