Hive实战之Youtube数据集

作者: 和心数据 | 来源:发表于2017-07-19 21:54 被阅读189次

    1 数据来源

    本次实战的数据来自于"YouTube视频统计与社交网络"的数据集,是西蒙弗雷泽大学计算机学院在2008年所爬取的数据
    数据集地址

    1. 1 Youtube视频表格式如下:

    列名 注释
    视频ID 一个11位字符串,是唯一的
    上传 一个字符串的视频上传者的用户名
    年龄 视频上传日期和2007年2月15日之间的整数天(YouTube的设立)
    类别 由上传者选择的视频类别的字符串
    长度 视频长度的整数v
    观看数 一整数的视图
    一个浮点数的视频速率
    评分 整数的评分
    评论数 一整数的评论
    相关视频ID 最多20个字符串的相关视频ID

    数据之间采用"\t"作为分隔符

    具体数据如下:

    video ID uploader age category length views rate ratings comments related IDs
    ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 ... ifnlnji-Y4s

    数据量大小为1G,条数为500万+

    1.2 用户表

    列名 uploader videos friends
    类型 string int int
    解释 上传者 上传视频数 朋友数

    2 实战演练准备

    2.1 环境搭建

    使用环境为
    hive-1.1.0-cdh5.4.5
    hadoop-2.6.0-cdh5.4.5

    演示形式为使用hive shell

    2.2 数据清洗

    我们一起来看看数据

    video ID uploader age category length views rate ratings comments related IDs
    ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 ... ifnlnji-Y4s

    主要的问题在于category和relatedIDs处理,由于Hive是支持array格式的,所以我们想到的是使用array来存储category和relatedIDs,但是我们发现category的分割符是"&"而realatedIDs的分隔符是"\t",我们在创建表格的时候能够指定array的分隔符,但是只能指定一个,所以再将数据导入到Hive表格之前我们需要对数据进行一定转换和清洗

    并且数据中肯定会存在一些不完整数据和一些奇怪的格式,所以数据的清洗是必要的,我在这里所使用的数据清洗方式是使用Spark进行清洗,也可以使用自定义UDF函数来进行清洗

    数据清洗注意点
    1)我们可以看到每行数据以"\t"作为分隔符,每行有十列数据,最后一列关联ID可以为空,那么我们对数据进行split之后数组的大小要大于8
    2)数据中存在"uNiKXDA8eyQ KRQE 1035 News & Politics 107"这样格式的数据,所以在处理category时需要注意 News & Politics中间的&

    处理后的数据如下:

    video ID uploader age category length views rate ratings comments related IDs
    PkGUU_ggO3k theresident 704 Entertainment 262 11235 3.85 247 280 PkGUU_ggO3k&EYC5bWF0ss8&...shU2hfHKmU0&p0lq5-8IDqY
    RX24KLBhwMI lemonette 697 People&Blogs 512 24149 4.22 315 474 t60tW0WevkE&WZgoejVDZlo&...s8xf4QX1UvA&2cKd9ERh5-8

    下面的实战都是基于数据清洗后的数据进行的

    2.3 创建表格和数据导入

    2.3.1 youtube表格的创建和导入

    1)youtube1的创建,文件格式为textfile
    create table youtube1(videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
    row format delimited
    fields terminated by "\t"
    collection items terminated by "&"
    stored as textfile;

    2)youtube2的创建,文件格式为orc
    create table youtube2(videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
    row format delimited
    fields terminated by "\t"
    collection items terminated by "&"
    stored as orc;

    3)youtube3的创建,文件格式为orc,进行桶分区
    create table youtube3(videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
    clustered by (uploader) into 8 buckets
    row format delimited
    fields terminated by "\t"
    collection items terminated by "&"
    stored as orc;

    数据导入:
    1)load data inpath "path" into table youtube1;
    2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从youtube1导入到youtube2和youtube3:
    insert into table youtube2 select * from youtube1;
    insert into table youtube3 select * from youtube1;

    2.3.2 user表格的创建和导入

    1)user_tmp的创建,文件格式textfile,24buckets
    create table user_tmp(uploader string,videos int,friends int)
    clustered by (uploader) into 24 buckets
    row format delimited
    fields terminated by "\t"
    stored as textfile;

    2)user的创建,文件格式orc,24buckets
    create table user(uploader string,videos int,friends int)
    clustered by (uploader) into 24 buckets
    row format delimited
    fields terminated by "\t"
    stored as orc;

    user表的数据导入也是同理
    数据导入:
    1)load data inpath "path" into table user_tmp;
    2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从user_tmp导入到user:
    insert into table user select * from user_tmp;

    3 实战需求

    1)统计出观看数最多的10个视频
    2)统计出视频类别热度的前10个类型
    3)统计出视频观看数最高的50个视频的所属类别
    4)统计出观看数最多的前N个视频所关联的视频的所属类别排行
    5)筛选出每个类别中热度最高的前10个视频
    6)筛选出每个类别中评分最高的前10个视频
    7)找出用户中上传视频最多的10个用户的所有视频
    8)筛选出每个类别中观看数Top10

    4 实战演练

    4.1 统计出观看数最多的10个视频

    select * from youtube3 order by views desc limit 10;

    结果如下:

    hive> select * from youtube3 order by views desc limit 10;
    Query ID = hadoop_20170710155353_4bc057f0-bbd5-4bfe-a66c-ec0e17cb3ca9
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    Starting Job = job_1499153664137_0101, Tracking URL = http://master:8088/proxy/application_1499153664137_0101/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0101
    Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
    2017-07-10 15:53:55,297 Stage-1 map = 0%,  reduce = 0%
    ...
    2017-07-10 15:56:06,210 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 106.93 sec
    MapReduce Total cumulative CPU time: 1 minutes 46 seconds 930 msec
    Ended Job = job_1499153664137_0101
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 106.93 sec   HDFS Read: 574632526 HDFS Write: 2916 SUCCESS
    Total MapReduce CPU Time Spent: 1 minutes 46 seconds 930 msec
    OK
    dMH0bHeiRNg judsonlaipply   415 ["Comedy"]  360 79897120    4.65    287260  131356  ["HSoVKUVOnfQ","pv5zWaTEVkI","v1U8f7TmYkA","ZCT3MIUTc3o","mDiBOF8XI44","P9LmHXXWiJs","fo_QVq2lGMs","EtGQgSY9Nn4","5P6UU6m3cqk","61heClTFc5w","h3gdSHGcUU4","OPmYbP0F4Zw","gsOaQGF7kiQ","H2gw9VE16mo","rGkIUlYEQT8","innfyQZHPpo","cu8tUy14zQo","cQ25-glGRzI","WqDbn2iLwwY","jvz0bvYmnto"]
    cQ25-glGRzI RCARecords  742 ["Music"]   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
    12Z3J1uzd0Q kaejane 404 ["Film","Animation"]    615 65341925    3.03    9189    5508    ["innfyQZHPpo","-_CSo1gOd48","1Al4crLW9EM","nhSZs-aAZbo","Af9aPlG2WFw","5P6UU6m3cqk","uBHsOTYO6AI","lW19DnWz6vg","vr3x_RRJdd4","5GE82tqcYYQ","D2kJZOfq7zk","lsO6D1rwrKc","dMH0bHeiRNg","kNLXjXxj3J8","vQbYvjmfbr4","G-HonZGBWus","MuOvqeABHvQ","-2caf6KlSyw","cQ25-glGRzI","pv5zWaTEVkI"]
    LpAI8TzQDes IMVUinc 848 ["Entertainment"]   68  65078772    1.38    5678    2499    ["c2eP-UADAi8","5P6UU6m3cqk","cQ25-glGRzI","FYbqcgd97NQ","12Z3J1uzd0Q","VS4wFOWFUt8","RB-wUgnyGv0","sBQLq2VmZcA","w2xUzv6iZWo","oOF3T9Zvizc","244qR7SvvX0","pv5zWaTEVkI","dMH0bHeiRNg","b3u65f4CRLk"]
    7AVHXe-ol-s internmarket    603 ["Music"]   264 60349673    3.16    1033    594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]
    244qR7SvvX0 donotasyoudo    960 ["Entertainment"]   6   57790943    1.45    66412   14913   ["v3ARyAb_1Bs","nhSZs-aAZbo","2pNTrYd-4FQ","dMH0bHeiRNg","kHmvkRoEowc","yh0xYO3dYx8","5GE82tqcYYQ","ktUSIJEiOug","6PrDw6T3-rM","-xEzGIuY7kw","innfyQZHPpo","pv5zWaTEVkI","E7QOUJfRs3Y","9oRQJK61MWs"]
    ePyRrb2-fzs 1988basti   826 ["Music"]   204 45984219    4.87    51039   27065   ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]
    xsRWpK4pf90 universalmusicgroup 902 ["Music"]   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
    ktUSIJEiOug aliciakeys  953 ["Music"]   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
    b3u65f4CRLk srcrecords  729 ["Music"]   256 43511791    4.76    55148   27818   ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]
    Time taken: 172.566 seconds, Fetched: 10 row(s)
    

    4.2 统计出视频类别热度的前10个类型

    select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;

    结果:

    hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;
    Query ID = hadoop_20170710155757_614ec9dd-ffa0-465d-8d62-c47b5ad585f0
    Total jobs = 2
    Launching Job 1 out of 2
    Number of reduce tasks not specified. Defaulting to jobconf value of: 2
    Starting Job = job_1499153664137_0102, Tracking URL = http://master:8088/proxy/application_1499153664137_0102/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0102
    Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 2
    2017-07-10 15:58:23,797 Stage-1 map = 0%,  reduce = 0%
    ...
    2017-07-10 15:59:16,341 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.51 sec
    MapReduce Total cumulative CPU time: 19 seconds 510 msec
    Ended Job = job_1499153664137_0102
    Launching Job 2 out of 2
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1499153664137_0103, Tracking URL = http://master:8088/proxy/application_1499153664137_0103/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0103
    Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
    2017-07-10 15:59:51,645 Stage-2 map = 0%,  reduce = 0%
    2017-07-10 16:00:18,373 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.97 sec
    2017-07-10 16:00:48,410 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 2.57 sec
    MapReduce Total cumulative CPU time: 2 seconds 570 msec
    Ended Job = job_1499153664137_0103
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 4  Reduce: 2   Cumulative CPU: 19.51 sec   HDFS Read: 47327614 HDFS Write: 900 SUCCESS
    Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 2.57 sec   HDFS Read: 5596 HDFS Write: 148 SUCCESS
    Total MapReduce CPU Time Spent: 22 seconds 80 msec
    OK
    Entertainment   1304724
    Music   1274825
    Comedy  449652
    Blogs   447581
    People  447581
    Film    442109
    Animation   442109
    Sports  390619
    Politics    186753
    News    186753
    Time taken: 185.399 seconds, Fetched: 10 row(s)
    

    4.3 统计出视频观看数最高的50个视频的所属类别

    select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;

    结果:

    hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;
    Query ID = hadoop_20170710160909_c6cbbe29-4df3-4c0b-ad70-bd34857acc80
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks determined at compile time: 1
    Starting Job = job_1499153664137_0104, Tracking URL = http://master:8088/proxy/application_1499153664137_0104/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0104
    Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
    2017-07-10 16:09:48,197 Stage-1 map = 0%,  reduce = 0%
    2017-07-10 16:10:17,734 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 3.09 sec
    ...
    2017-07-10 16:10:59,048 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 47.74 sec
    MapReduce Total cumulative CPU time: 47 seconds 740 msec
    Ended Job = job_1499153664137_0104
    Launching Job 2 out of 3
    Number of reduce tasks not specified. Defaulting to jobconf value of: 2
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1499153664137_0105, Tracking URL = http://master:8088/proxy/application_1499153664137_0105/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0105
    Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 2
    2017-07-10 16:11:35,860 Stage-2 map = 0%,  reduce = 0%
    2017-07-10 16:12:04,633 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.07 sec
    2017-07-10 16:12:23,187 Stage-2 map = 100%,  reduce = 50%, Cumulative CPU 2.61 sec
    2017-07-10 16:12:32,480 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 3.93 sec
    MapReduce Total cumulative CPU time: 3 seconds 930 msec
    Ended Job = job_1499153664137_0105
    Launching Job 3 out of 3
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1499153664137_0106, Tracking URL = http://master:8088/proxy/application_1499153664137_0106/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0106
    Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 1
    2017-07-10 16:13:09,244 Stage-3 map = 0%,  reduce = 0%
    2017-07-10 16:13:37,263 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.18 sec
    2017-07-10 16:14:05,048 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 3.5 sec
    MapReduce Total cumulative CPU time: 3 seconds 500 msec
    Ended Job = job_1499153664137_0106
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 47.74 sec   HDFS Read: 57839263 HDFS Write: 228 SUCCESS
    Stage-Stage-2: Map: 1  Reduce: 2   Cumulative CPU: 3.93 sec   HDFS Read: 6065 HDFS Write: 324 SUCCESS
    Stage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 3.5 sec   HDFS Read: 6600 HDFS Write: 53 SUCCESS
    Total MapReduce CPU Time Spent: 55 seconds 170 msec
    OK
    Music   12
    Comedy  3
    Entertainment   3
    Film    2
    Animation   2
    Time taken: 297.466 seconds, Fetched: 5 row(s)
    

    4.4 统计出观看数最多的前50个视频所关联的视频的所属类别排行

    思路:

    1. 首先筛选出前50个视频所关联的视频,
      select * from youtube3 order by views desc limit 50
    2. 再将结果和youtube3进行join
      select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
    3. 然后去重
      select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
    4. 最后得出所有视频的类别排行
      select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;

    结果:

    hive> select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;
    Query ID = hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cb
    Total jobs = 7
    Launching Job 1 out of 7
    Number of reduce tasks determined at compile time: 1
    
    Starting Job = job_1499153664137_0111, Tracking URL = http://master:8088/proxy/application_1499153664137_0111/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0111
    Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
    2017-07-10 17:08:48,662 Stage-1 map = 0%,  reduce = 0%
    2017-07-10 17:09:25,175 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 20.83 sec
    ...
    2017-07-10 17:10:15,276 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 81.01 sec
    MapReduce Total cumulative CPU time: 1 minutes 21 seconds 10 msec
    Ended Job = job_1499153664137_0111
    Stage-10 is filtered out by condition resolver.
    Stage-11 is selected by condition resolver.
    Stage-2 is filtered out by condition resolver.
    17/07/10 17:10:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Execution log at: /tmp/hadoop/hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cb.log
    2017-07-10 05:10:23 Starting to launch local task to process map join;  maximum memory = 518979584
    2017-07-10 05:10:34 Dump the side-table for tag: 0 with group count: 743 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable
    2017-07-10 05:10:34 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable (22611 bytes)
    2017-07-10 05:10:34 End of local task; Time Taken: 11.392 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 3 out of 7
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1499153664137_0112, Tracking URL = http://master:8088/proxy/application_1499153664137_0112/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0112
    Hadoop job information for Stage-8: number of mappers: 4; number of reducers: 0
    2017-07-10 17:11:13,493 Stage-8 map = 0%,  reduce = 0%
    2017-07-10 17:11:57,229 Stage-8 map = 100%,  reduce = 0%, Cumulative CPU 15.79 sec
    MapReduce Total cumulative CPU time: 15 seconds 790 msec
    Ended Job = job_1499153664137_0112
    Launching Job 4 out of 7
    Number of reduce tasks not specified. Defaulting to jobconf value of: 2
    Starting Job = job_1499153664137_0113, Tracking URL = http://master:8088/proxy/application_1499153664137_0113/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0113
    Hadoop job information for Stage-3: number of mappers: 3; number of reducers: 2
    2017-07-10 17:12:31,982 Stage-3 map = 0%,  reduce = 0%
    2017-07-10 17:13:17,467 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.53 sec
    MapReduce Total cumulative CPU time: 6 seconds 530 msec
    Ended Job = job_1499153664137_0113
    Launching Job 5 out of 7
    Number of reduce tasks not specified. Defaulting to jobconf value of: 2
    Starting Job = job_1499153664137_0114, Tracking URL = http://master:8088/proxy/application_1499153664137_0114/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0114
    Hadoop job information for Stage-4: number of mappers: 2; number of reducers: 2
    2017-07-10 17:13:55,123 Stage-4 map = 0%,  reduce = 0%
    2017-07-10 17:14:22,876 Stage-4 map = 50%,  reduce = 0%, Cumulative CPU 0.91 sec
    2017-07-10 17:14:23,905 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 1.99 sec
    2017-07-10 17:14:40,601 Stage-4 map = 100%,  reduce = 50%, Cumulative CPU 3.35 sec
    2017-07-10 17:14:52,033 Stage-4 map = 100%,  reduce = 100%, Cumulative CPU 4.96 sec
    MapReduce Total cumulative CPU time: 4 seconds 960 msec
    Ended Job = job_1499153664137_0114
    Launching Job 6 out of 7
    Number of reduce tasks determined at compile time: 1
    Starting Job = job_1499153664137_0115, Tracking URL = http://master:8088/proxy/application_1499153664137_0115/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0115
    Hadoop job information for Stage-5: number of mappers: 2; number of reducers: 1
    2017-07-10 17:15:27,647 Stage-5 map = 0%,  reduce = 0%
    2017-07-10 17:15:54,560 Stage-5 map = 50%,  reduce = 0%, Cumulative CPU 0.91 sec
    2017-07-10 17:15:55,587 Stage-5 map = 100%,  reduce = 0%, Cumulative CPU 2.03 sec
    2017-07-10 17:16:23,357 Stage-5 map = 100%,  reduce = 100%, Cumulative CPU 3.59 sec
    MapReduce Total cumulative CPU time: 3 seconds 590 msec
    Ended Job = job_1499153664137_0115
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 81.01 sec   HDFS Read: 463074822 HDFS Write: 26949 SUCCESS
    Stage-Stage-8: Map: 4   Cumulative CPU: 15.79 sec   HDFS Read: 47322670 HDFS Write: 30829 SUCCESS
    Stage-Stage-3: Map: 3  Reduce: 2   Cumulative CPU: 6.53 sec   HDFS Read: 43694 HDFS Write: 1126 SUCCESS
    Stage-Stage-4: Map: 2  Reduce: 2   Cumulative CPU: 4.96 sec   HDFS Read: 8888 HDFS Write: 725 SUCCESS
    Stage-Stage-5: Map: 2  Reduce: 1   Cumulative CPU: 3.59 sec   HDFS Read: 7001 HDFS Write: 207 SUCCESS
    Total MapReduce CPU Time Spent: 1 minutes 51 seconds 880 msec
    OK
    Music   399
    Entertainment   103
    Comedy  94
    People  36
    Blogs   36
    Film    19
    Animation   19
    Pets    12
    Animals 12
    UNA 11
    Style   6
    Politics    6
    News    6
    Howto   6
    Sports  4
    Vehicles    3
    Autos   3
    Travel  2
    Events  2
    Technology  1
    Science 1
    Time taken: 496.822 seconds, Fetched: 21 row(s)
    

    4.5 筛选出某个类别(如music)中热度最高的前10个视频

    思路:

    • 创建一个表格,存储每个视频对应一个标签的信息
      create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
      row format delimited
      fields terminated by "\t"
      collection items terminated by "&"
      stored as orc;
    • 将转换后的数据进行插入
      insert into table youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;
    • 根据观看数和类别进行查询
      select * from youtube_category where categoryId="Music" order by views desc limit 10;

    结果如下:

    hive> create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
        > row format delimited
        > fields terminated by "\t"
        > collection items terminated by "&"
        > stored as orc;
    OK
    Time taken: 0.256 seconds
    hive> insert into youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;
    Query ID = hadoop_20170711091010_7c76d7bd-5af0-43f9-812f-e30c612ee60b
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1499153664137_0116, Tracking URL = http://master:8088/proxy/application_1499153664137_0116/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0116
    Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0
    2017-07-11 09:11:18,741 Stage-1 map = 0%,  reduce = 0%
    ...
    2017-07-11 09:19:05,239 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 402.71 sec
    MapReduce Total cumulative CPU time: 6 minutes 42 seconds 710 msec
    Ended Job = job_1499153664137_0116
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to: hdfs://bydcluster1/user/hive/warehouse/youtube_category/.hive-staging_hive_2017-07-11_09-10-38_970_8191962088714912782-1/-ext-10000
    Loading data to table default.youtube_category
    Table default.youtube_category stats: [numFiles=4, numRows=6742209, totalSize=608748677, rawDataSize=10187373874]
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 4   Cumulative CPU: 404.25 sec   HDFS Read: 574634998 HDFS Write: 608749047 SUCCESS
    Total MapReduce CPU Time Spent: 6 minutes 44 seconds 250 msec
    OK
    Time taken: 511.914 seconds
    hive> select * from youtube_category where categoryId="music" order by views desc limit 10;
    Query ID = hadoop_20170711091919_a48aa98b-9cfa-4f58-a106-9da85484f0dd
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1499153664137_0117, Tracking URL = http://master:8088/proxy/application_1499153664137_0117/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0117
    Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
    2017-07-11 09:20:37,607 Stage-1 map = 0%,  reduce = 0%
    ...
    2017-07-11 09:21:32,382 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 36.15 sec
    MapReduce Total cumulative CPU time: 36 seconds 150 msec
    Ended Job = job_1499153664137_0117
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 36.15 sec   HDFS Read: 607706126 HDFS Write: 0 SUCCESS
    Total MapReduce CPU Time Spent: 36 seconds 150 msec
    OK
    Time taken: 96.836 seconds
    hive> select * from youtube_category where catagoryId="Music" order by views desc limit 10;
    FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryId': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)
    hive> select * from youtube_category where catagoryid="Music" order by views desc limit 10;
    FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryid': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)
    hive> select * from youtube_category where categoryid="Music" order by views desc limit 10;
    Query ID = hadoop_20170711092525_53f32ccf-7d04-4615-b446-9009cf16dc7f
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1499153664137_0118, Tracking URL = http://master:8088/proxy/application_1499153664137_0118/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0118
    Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
    2017-07-11 09:26:08,727 Stage-1 map = 0%,  reduce = 0%
    ...
    2017-07-11 09:27:17,297 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 57.89 sec
    MapReduce Total cumulative CPU time: 57 seconds 890 msec
    Ended Job = job_1499153664137_0118
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 58.14 sec   HDFS Read: 607706243 HDFS Write: 3043 SUCCESS
    Total MapReduce CPU Time Spent: 58 seconds 140 msec
    OK
    cQ25-glGRzI RCARecords  742 Music   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
    7AVHXe-ol-s internmarket    603 Music   264 60349673    3.16    1033    594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]
    ePyRrb2-fzs 1988basti   826 Music   204 45984219    4.87    51039   27065   ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]
    xsRWpK4pf90 universalmusicgroup 902 Music   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
    ktUSIJEiOug aliciakeys  953 Music   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
    b3u65f4CRLk srcrecords  729 Music   256 43511791    4.76    55148   27818   ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]
    iWg3IMN_rhU TimbalandMusic  853 Music   216 43323757    4.8 58761   36142   ["SIM4DCn7AlE","GojTUmjxVHU","ePyRrb2-fzs","2__Qdd11rfA","P_TKpULdDvo","Dn6kGzRSjhU","1iLDIj0pDHk","xsRWpK4pf90","XewBty95JVg","y4jiyjDQGLY","h8oBykb_Pqs","xNp7OxgFJJM","b3u65f4CRLk","S-783rzHIS4","cZd1Js0QaOI","kZGEgVxyPHU","43o0vwAmFM8","BMMUWvavORI","v_-1peCW6Ok","9gkjyM7ZOUc"]
    innfyQZHPpo chai0322    468 Music   210 41564032    2.61    13564   6126    ["nhSZs-aAZbo","hh0nVc0NYTE","12Z3J1uzd0Q","V1s9queYhF8","1Al4crLW9EM","61e1h4vALS0","8h98jb9Lk74","Z-HjmP7BCVc","o_pIQIV_NuU","82BDV1gYjdM","Af9aPlG2WFw","cC27PTFP4V8","-_CSo1gOd48","00c08ijIlHo","cIKxlWuTviE","dMH0bHeiRNg","iWg3IMN_rhU","v3ARyAb_1Bs","5P6UU6m3cqk","QjA5faZF1A8"]
    Lt6o8NlrbHg seankingston    867 Music   257 41171303    4.74    101352  67579   ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]
    QjA5faZF1A8 guitar90    308 Music   320 40294882    4.83    329108  169563  ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]
    Time taken: 111.631 seconds, Fetched: 10 row(s)
    

    4.6 筛选出每个类别中评分最高的前10个视频

    select * from youtube_category where categoryId="Music" order by ratings desc limit 10;

    结果如下:

    hive> select * from youtube_category where categoryId="Music" order by ratings desc limit 10;
    Query ID = hadoop_20170711093838_6ab5573a-964f-486e-b0dc-77e5853fcfb5
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1499153664137_0119, Tracking URL = http://master:8088/proxy/application_1499153664137_0119/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0119
    Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
    2017-07-11 09:39:27,903 Stage-1 map = 0%,  reduce = 0%
    ...
    2017-07-11 09:40:36,298 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 57.41 sec
    MapReduce Total cumulative CPU time: 57 seconds 410 msec
    Ended Job = job_1499153664137_0119
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 57.41 sec   HDFS Read: 607706243 HDFS Write: 2957 SUCCESS
    Total MapReduce CPU Time Spent: 57 seconds 410 msec
    OK
    QjA5faZF1A8 guitar90    308 Music   320 40294882    4.83    329108  169563  ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]
    cQ25-glGRzI RCARecords  742 Music   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
    pv5zWaTEVkI OkGo    531 Music   184 32022043    4.83    121516  39974   ["dMH0bHeiRNg","x49WZRyXGe0","RbdbVhBGETQ","yRmqZRPgK1w","DjCL0_0Il7w","gq7r3F1SoX0","k66epna2Sss","1LNbzqoOPu4","vr3x_RRJdd4","ERV-wh4VwZI","xsRWpK4pf90","iAQZ_uui1SY","5P6UU6m3cqk","bav63MWNUKg"]
    ktUSIJEiOug aliciakeys  953 Music   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
    Lt6o8NlrbHg seankingston    867 Music   257 41171303    4.74    101352  67579   ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]
    xsRWpK4pf90 universalmusicgroup 902 Music   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
    K2cYWfq--Nw FrEckleStudios  841 Music   224 17005186    4.82    90991   50788   ["SyIC3Munnyw","cZd1Js0QaOI","lLYD_-A_X5E","alqM0IYeH54","wQVEPFzkhaM","oGECJP3phyY","nPLOiBM8hLk","VpBDqtUEWcM","MJPdVVOmbz4","bPZJYQXQsm8","nPBmXEO3yUU","3jzSh_MLNcY","_EXeBLvmllg","Cva_sGN_0VA","Sr2JneittqQ","SYpYmkcadRA","71eBjNbdXDg","DgBgnoEY4iM","bl6RJyZdBSU","FAK_jtOf70g"]
    -xEzGIuY7kw alyankovic  580 Music   171 24095019    4.84    90091   45517   ["HYokLWfqbaU","N26KWq7MmSc","JCAt9WcCFbM","Rt1_6uz_sVU","Nh9mVsBKwYs","p9Zt8mn14hY","8n7ncJEFuSw","FT060JGp9sQ","E6Zc9NyYH-k","ixyTNd-Ln38","zIllRdSzSug","GsfVw9xxoNY","XkDeJgGrtdU","fqz1ojIQTBk","5GE82tqcYYQ","ODdGhOOUOpI","v3ARyAb_1Bs","XbVtbc_XzrI","U1ULxKM75rY","Jw00EUh0GT4"]
    EwTZ2xpQwpA TayZonday   796 Music   292 16841569    4.23    83514   129200  ["2x2W12A8Qow","P6dUCOS1bM0","NattlyH0IeM","nTQOpibv_OA","9mSKBgvHdoE","hjD6iigdB-g","caIBKOztlAo","xUz2YMmiq0k","aWY3eYOX3U0","mD5_GUovjiM","1oFS-q8BIps","deXAEN70CDY","0pElTyjfxe0","eyDuGwlrFRs","m6SjPfc_xNA","qYGvGWY1FDs","N0amCfgnwY8","JPu4uErBFks","pgSA-ErKd8c","ZZgGGlOGyUg"]
    xWHf_vYZzQ8 universalmusicgroup 771 Music   262 25607299    4.79    83419   70529   ["jvz0bvYmnto","zElEs8yw7fw","9FcBnaLjxY4","95wgKdSJGDo","ueOgZPBXY4A","k3O5uy-MBBk","O3sGTaQ9s9c","aT434G38OBg","4PdDPrwIwhI","Y-VifE8EK8w","9wpxno6qUd0","B17SYROT3GI","wJtUuxmm-B0","LBDnkJ5h1ho","CfzpBjKpSPE","QF2AmC2xyXM","eoa6Gx4HxTc","mvPvcV44rCc"]
    Time taken: 109.053 seconds, Fetched: 10 row(s)
    

    4.7 找出用户中上传视频最多的10个用户的所有视频

    思路:

    • 筛选出用户表中上传视频数最多的前十位
      select * from user order by videos desc limit 10;
    • 将筛选结果跟youtube3进行join得出结果
      select b.*,a.videos,a.friends from (select * from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;

    结果如下:

    hive> select b.*,a.videos,a.friends from (select * from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;
    Query ID = hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebe
    Total jobs = 5
    Launching Job 1 out of 5
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1499153664137_0125, Tracking URL = http://master:8088/proxy/application_1499153664137_0125/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0125
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2017-07-11 10:17:30,594 Stage-1 map = 0%,  reduce = 0%
    2017-07-11 10:18:03,439 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.03 sec
    2017-07-11 10:18:25,025 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.28 sec
    MapReduce Total cumulative CPU time: 10 seconds 280 msec
    Ended Job = job_1499153664137_0125
    Stage-8 is filtered out by condition resolver.
    Stage-9 is selected by condition resolver.
    Stage-2 is filtered out by condition resolver.
    17/07/11 10:18:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Execution log at: /tmp/hadoop/hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebe.log
    2017-07-11 10:18:33 Starting to launch local task to process map join;  maximum memory = 518979584
    2017-07-11 10:18:44 Dump the side-table for tag: 0 with group count: 10 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable
    2017-07-11 10:18:44 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable (611 bytes)
    2017-07-11 10:18:44 End of local task; Time Taken: 11.244 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 3 out of 5
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1499153664137_0126, Tracking URL = http://master:8088/proxy/application_1499153664137_0126/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0126
    Hadoop job information for Stage-6: number of mappers: 4; number of reducers: 0
    2017-07-11 10:19:25,285 Stage-6 map = 0%,  reduce = 0%
    ...
    2017-07-11 10:20:26,162 Stage-6 map = 100%,  reduce = 0%, Cumulative CPU 44.09 sec
    MapReduce Total cumulative CPU time: 44 seconds 90 msec
    Ended Job = job_1499153664137_0126
    Launching Job 4 out of 5
    Number of reduce tasks determined at compile time: 1
    Starting Job = job_1499153664137_0127, Tracking URL = http://master:8088/proxy/application_1499153664137_0127/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0127
    Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 1
    2017-07-11 10:21:01,746 Stage-3 map = 0%,  reduce = 0%
    2017-07-11 10:21:30,462 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 4.27 sec
    2017-07-11 10:21:48,968 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.36 sec
    MapReduce Total cumulative CPU time: 6 seconds 360 msec
    Ended Job = job_1499153664137_0127
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 10.28 sec   HDFS Read: 9796823 HDFS Write: 434 SUCCESS
    Stage-Stage-6: Map: 4   Cumulative CPU: 44.09 sec   HDFS Read: 574641054 HDFS Write: 2133846 SUCCESS
    Stage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 6.36 sec   HDFS Read: 2144607 HDFS Write: 6335 SUCCESS
    Total MapReduce CPU Time Spent: 1 minutes 0 seconds 730 msec
    OK
    lUOe76YPY7M expertvillage   801 ["Howto","Style"]   119 1014983 3.36    596 689 ["5cEsa-rswrg","HOoQPbcF8Rk","ht3DMDx1spo","VFnf17cp3Eg","IP35rm_31RQ","hXuiuwZPX0M","s8pH5jgSuP8","KIEkwm_6DD4","ovFSahIPaVQ","rQtwcSDh3Hk","COeT3WR7SFc","ZNLBKRpWt7w","qncPBZ9DRRk","tOBqg3yDlyE","YgfWUXsbr7w","qBB3AsgjN1M","XX8ouWx6xm8","BtF9dYRuiGU","NRanI5qR81I","evfl0NGLjVE"]   86228   5659
    2Aj2wx9PYxI expertvillage   815 ["Howto","Style"]   170 918553  3.82    1596    1533    ["hb5QaCfm7bg","YVDPSZe21KI","C1elpMjZ4wE","8gD1RG-tnNk","d-IDrRSVkCQ","IAOr6SYGAyw","3mbx03mP5eg","o-pN8qAiZhQ","K_qw03-3gFg","JgsO3A_vbtk","ffKr98Ium-M","WKYcBKa8_7Q","dgeX8CZmqvo","jgUCs72RDHs","PaNhROW-wEo","WhRCVm-1r2k","5I5O8P-r5Rk","reJSIZ3ugsE","-51iHvqP0Rs","9wItsn3r_kc"]   86228   5659
    VBGHer3yFyc libertaddigitaltv   998 ["News","Politics"] 29  828616  4.68    1337    10763   ["HL9p8I3lOFA","Fb3HZ7K1gTk","AARX6XOA5h0","t3DPDKbRxio","VAmz8MNZdlk","WrC6_uBe4eA","d9X8DYOA5DE","pZuZv8UK7YA","NwswzwoA4pA","g1J2gvjp4fs","3SMhrQZdABc","h6M42Il4kN8","izM_JwEkWPs","_FM_IoPIFq4","uMZCCPtl_Do","5cZFinVFubQ","om0iHwO4d6o","lUZxlXkbaxM","iSJ3qKpC-3o","tRdq9_Si0Ws"]   6874    1
    07jnqD8wvyE expertvillage   776 ["Howto","Style"]   217 574365  4.55    1073    1526    ["3wNpOp50uGI","-taU9d26wT4","_kly-fVUi1I","kZtRmnly_sU","LKe50AJvhz8","0GAUnuuBkW4","eM6Bpz6-dio","i9xf62PKC5M","mwFfk7igYC0","ilLTA4p4MMw","qyoLuTjguJA","aC-KOYQsIvU","0V6LWQZjYRk","_uZkvzYEXp0","auQbi_fkdGE","qgiUSEpg8Xc","4I79yc3uKfE","7Lz-XGjynN8","taHOwsdQXWI","2Aj2wx9PYxI"]   86228   5659
    NDwZcLGekE8 expertvillage   801 ["Howto","Style"]   94  491582  4.35    553 588 ["CEmYX76UJy0","0sF-PoXR9ZU","qDfoNYpozxk","T4P3yp6mFfY","rzcj_Wq6BIE","A2LTVhqHAdo","4gyH0mJPqY8","m3PsQCMz3sY","wckYj_PAhgM","YkvwdM9Xstc","Ph2UAjGfeB4","vT_uXPfSAVE","89mAouUu8YM","0t5pPZ4AXX0","sLNFN75CYbw","QQGHXEPfMcQ","p6gcz4hkOmw","qk6R-X-YmUw","NR_9VlHFOu8","0gA_3BAxtVM"]   86228   5659
    CEmYX76UJy0 expertvillage   801 ["Howto","Style"]   90  422284  4.48    448 459 ["NDwZcLGekE8","jRAKgd50Jh0","0sF-PoXR9ZU","IjjJWXgqWL4","m3PsQCMz3sY","8Di58l1dfPA","DpEVqy954OY","dmYMeImNy1s","qk6R-X-YmUw","wckYj_PAhgM","qZSD3JLPURk","YkvwdM9Xstc","fdSFh_z0HJY","n5LDyWyHJTo","Dmm5O1DPQYc","A2LTVhqHAdo","I2mM0r76b7o","iE16Z9Gh3TQ","rzcj_Wq6BIE","8szPfWiK-ag"]   86228   5659
    E6VWi7IaroA expertvillage   946 ["Howto","Style"]   228 338889  4.46    300 251 ["fLy3M2jfe4w","cC1FBmbYgMk","xEJhInSxsQg","RxMLoqzKZ8s","Z9a--4RQ9O8","7neOXKLrgzk","JS-lS9ngMtY","aKIFfgPLY8I","EsqPNhtkWIo","NJUPNtZN2rc","vV7pajY0zOk","GTtZapaRDbg","-0UVDD6ZwYE","B9jNaUP59Vo","2QDGPe50E4Y","lJDZO_pX4aw","wuD2CemlvP0","ZTx2ZFI1spg","CzZghYdupiQ","5ueEKSJ8J-U"]   86228   5659
    0sF-PoXR9ZU expertvillage   801 ["Sports"]  77  326365  4.21    512 663 ["24FJ58Q22D4","m3PsQCMz3sY","qk6R-X-YmUw","CEmYX76UJy0","NDwZcLGekE8","QiIunH47qew","I2mM0r76b7o","c_qSPdLPReM","5eYBaQ5Cg-c","fdSFh_z0HJY","8szPfWiK-ag","Dmm5O1DPQYc","A2LTVhqHAdo","n-Cvzk84X8o","T4P3yp6mFfY","1vo3CCBIcaY","fnGFR1AFCJc","n5LDyWyHJTo","YkvwdM9Xstc","emNfLmLTQb0"]    86228  5659
    ohdiRHLSw5Y expertvillage   447 ["Howto","Style"]   78  321521  3.79    139 102 ["12_nJamoyTk","dXLhjYgMZ68","ukzFwRoFj4k","sOUgrJo2kIg","HhO39nCDfMg","3iVP0tzwhVc","MFNA0PqLynY","ZlcMzLhiBjg","7F8ajh_DDYs","6vaPIz6S6sk","-L_uhGXOtF0","-libzR5AV58","CSdSH7XKTtQ","KZX5jXPAWIk","Vkj5fbrQpNs","YCgnFIk5Acg","hjPDmVf6KJw","BZnhMl85dq4","iuqVpMdb1NM","GO2_3q6euug"]   86228   5659
    0sF-PoXR9ZU expertvillage   801 ["Howto","Style"]   77  294737  4.21    438 599 ["24FJ58Q22D4","m3PsQCMz3sY","CEmYX76UJy0","NDwZcLGekE8","A2LTVhqHAdo","qk6R-X-YmUw","5eYBaQ5Cg-c","fdSFh_z0HJY","jGdXcitOUzY","n-Cvzk84X8o","wckYj_PAhgM","I2mM0r76b7o","QMZaxtjhZ5k","Dmm5O1DPQYc","vT_uXPfSAVE","8szPfWiK-ag","QiIunH47qew","YkvwdM9Xstc","DpEVqy954OY","c_qSPdLPReM"]   86228   5659
    3Xc-kxSznZ0 expertvillage   430 ["Howto","Style"]   107 292329  3.33    565 964 ["9YAKfkVvvEc","wEkcPjBaHjs","zMl3ixv1kHw","Q4ZUPEgbzu0","-jIEGZwLPvo","YxdxGLBCCRA","Jp0LaJ_ftT0","PgCRWvwdFHM","yZL2MOFk6I0","qDJk3-ofbk0","e4nHAAuDiPE","cuRk9bTbU6A","z53cPMciVio","MREw0dIWaQ0","237AQAvXtw8","a3dnOupmt7Y","3-va5g_QVss","mFdaiY8YhEY","D6li4tYmKf0","bEVHBHKqBfQ"]   86228   5659
    VxYkdycN3WE expertvillage   801 ["Howto","Style"]   77  257702  4.19    153 137 ["ZpCE7mnYJK0","NYDdCGtbr8E","904S9W9AZg8","Rl5KEODnCfI","7-i2gl9288I","TC1XlhRwhsU","rD_4nCKf5Ns","pFKZZ120fyU","oN4Esih54V8","N1Ov6cCUXkc","rDM97S4jPiw","IK2uLNND3i0","OZXbM6kRuuA","5pvyXkFTa18","bExUJAnCLZw","VoREkZvkyI8","BWhfpzAItx4","EEybT3IeyzA","QteH0KOJx0g","dbXHvd_SGkk"]   86228   5659
    -libzR5AV58 expertvillage   447 ["Howto","Style"]   391 249682  3.34    73  62  ["-L_uhGXOtF0","4LvTYBbgMjE","-wO07skEauo","CSdSH7XKTtQ","kzQDoXKDgTM","e1kzLj-FZ6k","Guhgb3pWRAQ","KIbyySDSGEc","DGUBzs-GNxE","KZX5jXPAWIk","ohdiRHLSw5Y","6vaPIz6S6sk","5MrAuq_F2dU","gOv-yPqZ3LE","BZnhMl85dq4","oCDdDCqygOg","12_nJamoyTk","7F8ajh_DDYs","HhO39nCDfMg","X3-6IT-7S80"]   86228   5659
    HL9p8I3lOFA libertaddigitaltv   998 ["News","Politics"] 17  235313  4.61    114 581 ["VBGHer3yFyc","Fb3HZ7K1gTk","Z3Pe8ff37gk","t3DPDKbRxio","pZuZv8UK7YA","_FM_IoPIFq4","1P_EQjd7lq4","yEzNx2g7ors","_neKPvLdG8Y","7i_WDprxACU","zCM0nsym0cE","om0iHwO4d6o","h6M42Il4kN8","Bjk1McjrWtg","VAmz8MNZdlk","-t2hwljZmA8","gH6PHFrA6g8","ysAwzxqbWD8","U5lX_EIFOOA","4zu_X_3qHLA"]   6874    1
    oE5ZhBHy6Rs expertvillage   776 ["Howto","Style"]   143 234551  4.64    358 280 ["rCWRUtqJgzI","476vNb6thyM","eg3eo8x_9rw","grK43Poye1U","HGncDLMNPRI","iylaKSDfLSc","JFqiDcvRW2Y","XrbXGRLIDTc","NmxoR9lsu-c","JW8FJefw82k","n0Mi550FDng","QXsW44Wh6tY","6oJDYT5MlEA","IJy05ssELhg","TvT-M7tTQHE","GYNVEWCO_X0","5pJ_o19GVRA","q_Nn_CnvD9s","uqpQNkFxVAU","KKP8vWeEc30"]   86228   5659
    wOnP_oAXUMA expertvillage   429 ["Howto","Style"]   120 228842  3.56    236 601 ["BlDWdfTAx8o","WF1kRLJ_4B8","6bu9csQC45c","xTAYAl4g7HE","871VTEF7qIY","UheCchftswc","TKg_cdwq9l4","ADQ4Bjq42wE","cSJCDcAKShA","SYklbxHP2tI","fktuPPNkQpQ","L6267-JZu3E","_IwJ3Aj2XrQ","hf-4ppNmH-U","aO8evzfTR8E","E-kNUEv0YgA","Hsr1-xcXFL8","VCiLZMjo_PQ","rMEgKgZOJi8","YeYU_OE5oIU"]   86228   5659
    MVPCc4eODys expertvillage   448 ["Howto","Style"]   69  225879  4.29    171 315 ["iPPgmvcsJNw","xc8sysT31mQ","svt_fnKE_80","kBjUDCyDCuI","ysa50-plo48","yutDlQL9Ct0","gCra4qOrjFw","CN1QJDGinwE","dFJjaj7pXsA","LvJzFdcYSag","4I79yc3uKfE","AcryZ1o4RQE","WyyB0M6wAV8","A0sOVKbYR20","8XCyd0XEejc","5BmVKKpu_CU","8_QNzZ9WPRo","pNiX_l-HEGM","CQJSZs-euZU","ZwHHYzK8UCg"]   86228   5659
    7neOXKLrgzk expertvillage   946 ["Howto","Style"]   111 215943  4.07    215 314 ["xEJhInSxsQg","PSR5kinMX3Y","E6VWi7IaroA","5ovGW0CzQp4","DeMcNhz0Vz8","CzZghYdupiQ","72EYsQEw35k","tScm-eZInBE","Hz4lnt7d88s","dgLUbXSqwSs","G3P4VFrB_zM","pU0O2-o67C0","pDGP8Q3Zzb8","hQhdK8l6CuY","VK9VflDsunI","LWgzG8I9bWY","csL7WNTWK9U","LE93Q5k5-fI","Uirspi_t3xM","wGLNJK4zcdE"]   86228   5659
    AaO1aLwk53Y expertvillage   443 ["Howto","Style"]   156 204450  4.14    210 290 ["Q8YYZGvKM-8","Bn59zha-uAQ","BjnkKuI-YAY","aQeXHXkL6ow","Qtp2ibwd2Ms","nstbrkjk3OM","Odj4ATUPh9w","v2mm2-9JQ-I","328YVJVtMKU","B-jzkyKxDLo","yB-yGNxNEZg","b0msCCnzmNA","sfh5AJhRto8","SEILiSkGzrY","jm4uxgVDU6o","svOBjuvvy-k","y_1nj8fBQOE","zb8ArkvxOxo","VgWayeJdV0w","Yor1dHH6orE"]   86228   5659
    m3PsQCMz3sY expertvillage   801 ["Howto","Style"]   99  203912  4.38    177 207 ["CEmYX76UJy0","0sF-PoXR9ZU","NDwZcLGekE8","qk6R-X-YmUw","DpEVqy954OY","Ph2UAjGfeB4","fdSFh_z0HJY","YkvwdM9Xstc","8szPfWiK-ag","Dmm5O1DPQYc","T4P3yp6mFfY","I2mM0r76b7o","7GtVwKZBf9U","mugebyUqK6o","qDfoNYpozxk","XA7dRqkp8YA","a28QcMXjHyI","n5LDyWyHJTo","ovoNU_CbmfA","Q2j6MlACIoc"]   86228   5659
    Time taken: 300.414 seconds, Fetched: 20 row(s)
    

    4.8 筛选出每个类别中观看数Top10

    ** select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;**

    hive> select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;
    Query ID = hadoop_20170713170101_76ffdd80-e72c-4c7f-b9ad-9b8c3f7e17ab
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 3
    Starting Job = job_1499153664137_0143, Tracking URL = http://master:8088/proxy/application_1499153664137_0143/
    Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0143
    Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3
    2017-07-13 17:01:52,690 Stage-1 map = 0%,  reduce = 0%
    2017-07-13 17:02:24,374 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 10.49 sec
    ...
    2017-07-13 17:03:08,803 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 89.07 sec
    MapReduce Total cumulative CPU time: 1 minutes 29 seconds 70 msec
    Ended Job = job_1499153664137_0143
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 3  Reduce: 3   Cumulative CPU: 89.07 sec   HDFS Read: 73070026 HDFS Write: 7487 SUCCESS
    Total MapReduce CPU Time Spent: 1 minutes 29 seconds 70 msec
    OK
    qruSOZq-wJg Activism    2673823 1
    tFxk7glmMbo Activism    1063928 2
    dYN278GB2Kg Activism    831836  3
    n-Akqik3hME Activism    778987  4
    LtqRAYWjz2Q Activism    768422  5
    2v-whrgAbf4 Activism    742172  6
    ja4d02s0WKQ Activism    733014  7
    IpAOYskH1s8 Activism    714580  8
    ieuVTOJyLBs Activism    640994  9
    6CXBgbP_ZKs Activism    562163  10
    12Z3J1uzd0Q Animation   65341925    1
    bFytHZHFXhA Animation   38937813    2
    sdUUx5FdySs Animation   16151661    3
    vQbYvjmfbr4 Animation   12114231    4
    rNKefRRvV7g Animation   11884311    5
    6HIavxnUHls Animation   11735507    6
    gm5DHKI8o5o Animation   9381674 7
    316BF17k5d8 Animation   9372064 8
    ZuQMn6Z0k_w Animation   9195588 9
    6B26asyGKDo Animation   8915176 10
    dMH0bHeiRNg Comedy  79897120    1
    5P6UU6m3cqk Comedy  42525795    2
    Tx1XIm6q4r4 Comedy  38378910    3
    Q5im0Ssyyus Comedy  21170471    4
    k66epna2Sss Comedy  17944367    5
    AYxu_MQSTTY Comedy  15665340    6
    pYak2F1hUYA Comedy  15634357    7
    _OBlgSz8sSM Comedy  14567743    8
    wCF3ywukQYA Comedy  14227802    9
    sHzdsFiBbFc Comedy  13606292    10
    Mz9BlgiV45I Education   2807232 1
    VnGb6UQvwJs Education   2441432 2
    j6lADdhmAec Education   2306689 3
    pYfTQpsErsM Education   2066429 4
    7GNE6AhL0qI Education   2037257 5
    NiGZf15W9xc Education   1867126 6
    -FeAK-q5Cok Education   1741504 7
    N5P6o2YaqVI Education   1656512 8
    Dl-agXA63nw Education   1529566 9
    JX3VmDgiFnY Education   1373008 10
    cQ25-glGRzI Music   77674728    1
    7AVHXe-ol-s Music   60349673    2
    ePyRrb2-fzs Music   45984219    3
    xsRWpK4pf90 Music   44614530    4
    ktUSIJEiOug Music   43583367    5
    b3u65f4CRLk Music   43511791    6
    iWg3IMN_rhU Music   43323757    7
    innfyQZHPpo Music   41564032    8
    Lt6o8NlrbHg Music   41171303    9
    QjA5faZF1A8 Music   40294882    10
    qruSOZq-wJg Nonprofits  2673823 1
    tFxk7glmMbo Nonprofits  1063928 2
    dYN278GB2Kg Nonprofits  831836  3
    n-Akqik3hME Nonprofits  778987  4
    LtqRAYWjz2Q Nonprofits  768422  5
    2v-whrgAbf4 Nonprofits  742172  6
    ja4d02s0WKQ Nonprofits  733014  7
    IpAOYskH1s8 Nonprofits  714580  8
    ieuVTOJyLBs Nonprofits  640994  9
    6CXBgbP_ZKs Nonprofits  562163  10
    SkELRp4wKPs Politics    12730681    1
    AFFQrUyi8-s Politics    11953598    2
    YgW7or1TuFk Politics    8516433 3
    JgiGrXpOhYg Politics    8211043 4
    up5jmbSjWkw Politics    7869023 5
    hr23tpWX8lM Politics    7209885 6
    Kje7NUNebL8 Politics    6707868 7
    I4u3449L5VI Politics    6675798 8
    jjXyqcx-mYY Politics    6462051 9
    a9Vde3FHMmc Politics    6028642 10
    8h98jb9Lk74 UNA 33880568    1
    AR5yq0aMxI8 UNA 18409445    2
    DH7FVB15EPU UNA 15487752    3
    Qvfx6UiAAG0 UNA 13977592    4
    n-FdoZdfpmE UNA 12241555    5
    MddPeH1DAvY UNA 12211192    6
    sI8Sus_KRpY UNA 10460047    7
    14oqm9ywLIw UNA 10344366    8
    LytRWuhn4BM UNA 9899270 9
    iSByZARPJSU UNA 9316897 10
    Oro28yg7W74 Gaming  727015  1
    0to1HDxtkM4 Gaming  529346  2
    M-w-gLVfi30 Gaming  396471  3
    i6aH2F1WjsY Gaming  382350  4
    EgbUSsblCSQ Gaming  266718  5
    7SMnWr9MqwQ Gaming  210147  6
    NQMBIRipp5A Gaming  207783  7
    vi1lVqJSbsM Gaming  186709  8
    aId2hK4pI2M Gaming  186054  9
    tWPWcyLdrko Gaming  181979  10
    sLGLum5SyKQ Howto   31121122    1
    eMhGpzyFdhE Howto   16320501    2
    KPOOWvP_dd8 Howto   11131527    3
    91wuBqlny50 Howto   8579613 4
    mr5ghuaTK14 Howto   8361812 5
    GfPJeDssBOM Howto   6101232 6
    6gmP4nk0EOE Howto   4970382 7
    mM-30cmM33s Howto   4936417 8
    STQ3nhXuuEM Howto   4655542 9
    XZGgeGHU1Bs Howto   4424698 10
    LU8DDYz68kM Pets    27721690    1
    epUk3T2Kfno Pets    10352882    2
    z3U0udLH974 Pets    9461084 3
    kkT7A3jegBc Pets    9269896 4
    TZ860P4iTaM Pets    9009434 5
    7tRWRSfcDuQ Pets    8538635 6
    Qit3ALTelOo Pets    7939352 7
    Zi9GOvR3Ynw Pets    7351184 8
    Kxa0mnDj0bs Pets    7289545 9
    PadauuWF94w Pets    6271287 10
    W1czBcnX1Ww Science 3234852 1
    D99NHb6B03s Science 3176792 2
    tk_F2Y-F2kE Science 3121903 3
    nhyH7lQ6D2k Science 2879861 4
    JCbKv9yiLiQ Science 2672391 5
    U5vs2ly_grk Science 2611389 6
    M0ODskdEPnQ Science 2555284 7
    p4ebtj1jR7c Science 2536109 8
    8wTlureUMP8 Science 2477804 9
    NZNTgglPbUA Science 2230729 10
    vt4X7zFfv4k Sports  12598542    1
    P-bWsOK-h98 Sports  12101588    2
    OS5tQvQOB-Y Sports  9047732 3
    P9LmHXXWiJs Sports  7415813 4
    NIKdK-T-jZM Sports  7175403 5
    euMu1SKi-ak Sports  7040023 6
    zKQgTiqhPbw Sports  7017699 7
    yeXoxNP8_xY Sports  6422039 8
    q8t7iSGAKik Sports  6312667 9
    COcczatkNP4 Sports  6258611 10
    sLGLum5SyKQ Style   31121122    1
    eMhGpzyFdhE Style   16320501    2
    KPOOWvP_dd8 Style   11131527    3
    91wuBqlny50 Style   8579613 4
    mr5ghuaTK14 Style   8361812 5
    GfPJeDssBOM Style   6101232 6
    6gmP4nk0EOE Style   4970382 7
    mM-30cmM33s Style   4936417 8
    STQ3nhXuuEM Style   4655542 9
    XZGgeGHU1Bs Style   4424698 10
    ZeBd_F2Bz5Y Vehicles    8623041 1
    ju6t-yyoU8s Vehicles    7325889 2
    uUurALr_Ckk Vehicles    7258142 3
    WShY1ObPvhQ Vehicles    7047156 4
    q3idQKi5EqM Vehicles    6470816 5
    9JWywFpZkg4 Vehicles    6465830 6
    tth9krDtxII Vehicles    6394563 7
    npTRXr4Sgxg Vehicles    5450317 8
    aCamHfJwSGU Vehicles    5354163 9
    S-ppGiwc0wQ Vehicles    5039415 10
    LU8DDYz68kM Animals 27721690    1
    epUk3T2Kfno Animals 10352882    2
    z3U0udLH974 Animals 9461084 3
    kkT7A3jegBc Animals 9269896 4
    TZ860P4iTaM Animals 9009434 5
    7tRWRSfcDuQ Animals 8538635 6
    Qit3ALTelOo Animals 7939352 7
    Zi9GOvR3Ynw Animals 7351184 8
    Kxa0mnDj0bs Animals 7289545 9
    PadauuWF94w Animals 6271287 10
    ZeBd_F2Bz5Y Autos   8623041 1
    ju6t-yyoU8s Autos   7325889 2
    uUurALr_Ckk Autos   7258142 3
    WShY1ObPvhQ Autos   7047156 4
    q3idQKi5EqM Autos   6470816 5
    9JWywFpZkg4 Autos   6465830 6
    tth9krDtxII Autos   6394563 7
    npTRXr4Sgxg Autos   5450317 8
    aCamHfJwSGU Autos   5354163 9
    S-ppGiwc0wQ Autos   5039415 10
    v3ARyAb_1Bs Blogs   31812447    1
    5GE82tqcYYQ Blogs   30209692    2
    ervaMPt4Ha0 Blogs   23859297    3
    uWow42TCwzg Blogs   22389389    4
    -_CSo1gOd48 Blogs   21176701    5
    D2kJZOfq7zk Blogs   20458159    6
    nhSZs-aAZbo Blogs   20423158    7
    GuMMfgWhm3g Blogs   18634621    8
    4jbkRGPxvaM Blogs   15980887    9
    hMnk7lh9M3o Blogs   13553069    10
    LpAI8TzQDes Entertainment   65078772    1
    244qR7SvvX0 Entertainment   57790943    2
    1uwOL4rB-go Entertainment   39883413    3
    w2xUzv6iZWo Entertainment   25222946    4
    vr3x_RRJdd4 Entertainment   25093671    5
    lj3iNxZ8Dww Entertainment   23737579    6
    RB-wUgnyGv0 Entertainment   23067889    7
    lsO6D1rwrKc Entertainment   21758300    8
    7iYWxfNSjYk Entertainment   19029677    9
    5pGJCkCDK5A Entertainment   18858695    10
    p0aQvKDA1K0 Events  12239023    1
    bNF_P281Uu4 Events  9125026 2
    AlPqL7IUT6M Events  6441558 3
    J833f9fqWBA Events  4776832 4
    xIvIWJbzimo Events  4284770 5
    QuTj9a04o-s Events  4053317 6
    eejQPUyeNiY Events  3573812 7
    3QL97xldoXc Events  3010296 8
    r43yCiKlbCo Events  2806995 9
    z42fchrzhHY Events  2565257 10
    12Z3J1uzd0Q Film    65341925    1
    bFytHZHFXhA Film    38937813    2
    sdUUx5FdySs Film    16151661    3
    vQbYvjmfbr4 Film    12114231    4
    rNKefRRvV7g Film    11884311    5
    6HIavxnUHls Film    11735507    6
    gm5DHKI8o5o Film    9381674 7
    316BF17k5d8 Film    9372064 8
    ZuQMn6Z0k_w Film    9195588 9
    6B26asyGKDo Film    8915176 10
    SkELRp4wKPs News    12730681    1
    AFFQrUyi8-s News    11953598    2
    YgW7or1TuFk News    8516433 3
    JgiGrXpOhYg News    8211043 4
    up5jmbSjWkw News    7869023 5
    hr23tpWX8lM News    7209885 6
    Kje7NUNebL8 News    6707868 7
    I4u3449L5VI News    6675798 8
    jjXyqcx-mYY News    6462051 9
    a9Vde3FHMmc News    6028642 10
    v3ARyAb_1Bs People  31812447    1
    5GE82tqcYYQ People  30209692    2
    ervaMPt4Ha0 People  23859297    3
    uWow42TCwzg People  22389389    4
    -_CSo1gOd48 People  21176701    5
    D2kJZOfq7zk People  20458159    6
    nhSZs-aAZbo People  20423158    7
    GuMMfgWhm3g People  18634621    8
    4jbkRGPxvaM People  15980887    9
    hMnk7lh9M3o People  13553069    10
    W1czBcnX1Ww Technology  3234852 1
    D99NHb6B03s Technology  3176792 2
    tk_F2Y-F2kE Technology  3121903 3
    nhyH7lQ6D2k Technology  2879861 4
    JCbKv9yiLiQ Technology  2672391 5
    U5vs2ly_grk Technology  2611389 6
    M0ODskdEPnQ Technology  2555284 7
    p4ebtj1jR7c Technology  2536109 8
    8wTlureUMP8 Technology  2477804 9
    NZNTgglPbUA Technology  2230729 10
    p0aQvKDA1K0 Travel  12239023    1
    bNF_P281Uu4 Travel  9125026 2
    AlPqL7IUT6M Travel  6441558 3
    J833f9fqWBA Travel  4776832 4
    xIvIWJbzimo Travel  4284770 5
    QuTj9a04o-s Travel  4053317 6
    eejQPUyeNiY Travel  3573812 7
    3QL97xldoXc Travel  3010296 8
    r43yCiKlbCo Travel  2806995 9
    z42fchrzhHY Travel  2565257 10
    Time taken: 121.381 seconds, Fetched: 250 row(s)
    

    5 总结

    上面的8个例子向大家展示Hive中简单和稍微复杂的操作,有的启动一个Job就可以完成,有的则需要启动多个Job才能完成,我们也可以看到,启动的Job越多运行时间就会越长,但是实际工作中的操作只会远比我们所演示要更加复杂,越是复杂的操作就更加需要去优化,来达到减少运行时间的目的,所以下一篇我们来看看Hive的优化实践

    相关文章

      网友评论

        本文标题:Hive实战之Youtube数据集

        本文链接:https://www.haomeiwen.com/subject/asguhxtx.html