分享一个今天做的题:
题目:
有原始json数据如下:
{"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"}
{"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}
{"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}
{"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}
{"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"}
{"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"}
{"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"}
需要将数据导入到hive数据仓库中,我不管你中间用几个表,最终我要得到一个结果表:
1193 5 978300760 1
661 3 978302109 1
914 3 978301968 1
3408 4 978300275 1
2355 5 978824291 1
1197 3 978302268 1
1287 5 978302039 1
step 1
第一步:先创建一个初始的表,表中只含有一个字段,这个字段包含了原始表中的一行数据。
create table if not exists basicJson(jsonname string)row format delimited fields terminated by '\t';
导入原始数据
load data local inpath "/export/softwares/movieJson.csv";
查询之后获得
{"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"}
{"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}
{"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}
{"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}
{"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"}
{"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"}
{"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"}
step 2
打开java工程,新建项目,
public Text evaluate(final Text jsonObject) throws Exception {
String jsonString = jsonObject.toString();
if (null != jsonString && !jsonString.toString().equals("")){
return new Text(change2String(jsonString));
}else{
return new Text("");
}
}
public static String change2String(String jsonString) throws Exception{
JSONObject json = new JSONObject(jsonString);
String movie = json.getString("movie");
String rate = json.getString("rate");
String timeStamp = json.getString("timeStamp");
String uid = json.getString("uid");
String finalString = movie+"\t"+rate+"\t"+timeStamp+"\t"+uid+"\t";
return finalString;
}
这里我们用到了JSONObject类,需要在Maven中导入依赖
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20180813</version>
</dependency>
然后打包获得jar包导入到hive中
add jar /export/servers/hive-1.1.0-cdh5.14.0/lib/json2String.jar;
Hive中关联相关函数
create temporary function json2String as 'cn.leon.json.json2String';
step 3
在hive中使用相关函数获得生成的查询内容并且导入到本地文件中
insert overwrite local directory '/export/softwares/json' select json2String(jsonname) from basicjson;
然后hive创建一个新表包含四个字段
create table if not exists finalMovies(movie string,rate string,timesstamp string,uid string)row format delimited fields terminated by '\t';
最后导入本地数据
load data local inpath '/export/softwares/json' into table finalMovies;
查询出来的结果为正确
1193 5 978300760 1
661 3 978302109 1
914 3 978301968 1
3408 4 978300275 1
2355 5 978824291 1
1197 3 978302268 1
1287 5 978302039 1
网友评论