5.从hive导入数据到Elasticsearch

作者: 依米兒 | 来源:发表于2020-04-10 15:46 被阅读0次

5.从hive导入数据到Elasticsearch
Sqoop数据迁移
Hive中导入数据和导出数据
Elasticsearch JDBC 导入器
Sqoop User Guide v1.4.6文档7.2.12.
es导出hive遇到的问题汇总
Hive基础@2019-09-02
sqoop从mysql导入数据到hive及hive导入数据到my
mysql数据同步ES问题汇总
hive 常用导入/导出

现有一批用户数据需要导入到Elasticsearch中，表名是users，包含字段：name，sex，age，id。

从hive中导入数据到Elasticsearch中

先确定数据是否已经在hive里，如果没有，还是文件，则先导入到hive中

//在hive里创建表：users
hive> create table users(name string, sex string, age int, id string) row format delimited fields terminated by '\t' stored as textfile;
//row format … :说明表中数据用什么分隔符进行分割，'\t'代表tab，并以textfile形式存储。
//这里后面导入数据到hive，将会以该分隔符进行分割，所以数据文本中用的什么分隔符，这里也用什么

// 将文本数据代入到hive中
hive> load data [local] inpath '文件所在路径' into table users;
// [local]：如果文本存在服务器本地机器上，则添加：
load data local inpath…
//如果文件在hdfs上，则不需要添加local

// 查询数据看插入是否正确
hive> select * from users limit 100

参考：hive中的字段类型
注：本地或者hdfs文件导入到hive中后，本地将会没有，若原文本需要保留，建议备份一份。

确定数据进入hive后，再创建外部连接Elasticsearch的表

//先在hive中添加elasticsearch-hadoop的jar包，jar包版本与集群中elasticsearch版本一致
hive> add jar jar包所在路径
// 在hive中创建外部表，这里的es.node参数值填写Elasticsearch中master所在节点的ip地址，端口填写设置的端口，默认端口9200
hive> create table ex_users(name string, sex string, age numeric, id string) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='test/user/','es.node'='ip','es.port'='9200','es,nodes.wan.only'='true')

配置参考：ES参数配置，ES中字段类型介绍
jar包下载链接：es-hadoop的jar包下载

最后将hive中users表中的数据导到ex_users中

hive> insert into table ex_users select * from users;

另外记录一下hive导出数据到本地或者hdfs

// 写到本地服务器上，users是hive中要导出的表
insert overwrite local directory '本地将保存导出文件的路径' select * from users

//写出到HDFS上，将上面的local删除即可：
insert overwrite directory '本地将保存导出文件的路径' select * from users

//导出文件的分隔符即为users是创建时设置的分隔符