美文网首页
使用Hive进行mapreduce计算

使用Hive进行mapreduce计算

作者: tonyemail_st | 来源:发表于2017-09-29 21:38 被阅读0次

创建表

hive> create table wordcount(line String);

加载数据

hive> load data inpath '/wcinput' overwrite into table wordcount;
Loading data to table default.wordcount
OK
hive> select * from wordcount;
OK
hello world
hello world
hello java
hello c
hello c++
hello java
Time taken: 0.16 seconds, Fetched: 6 row(s)
hive> select split(line, ' ') from wordcount;
OK
["hello","world"]
["hello","world"]
["hello","java"]
["hello","c"]
["hello","c++"]
["hello","java"]
Time taken: 0.571 seconds, Fetched: 6 row(s)
hive> select explode(split(line, ' ')) from wordcount;
OK
hello
world
hello
world
hello
java
hello
c
hello
c++
hello
java
Time taken: 0.183 seconds, Fetched: 12 row(s)

hive> select word, count(1) as count from (select explode(split(line, ' ')) as word from wordcount) w group by word;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170929034857_f67738b5-8744-4694-8621-f855bcc57cf5
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1506604275847_0002, Tracking URL = http://master:8088/proxy/application_1506604275847_0002/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1506604275847_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-09-29 03:49:17,089 Stage-1 map = 0%,  reduce = 0%
2017-09-29 03:49:29,157 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.04 sec
2017-09-29 03:49:37,721 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.48 sec
MapReduce Total cumulative CPU time: 3 seconds 480 msec
Ended Job = job_1506604275847_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.48 sec   HDFS Read: 8812 HDFS Write: 180 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 480 msec
OK
c   1
c++ 1
hello   6
java    2
world   2
Time taken: 42.133 seconds, Fetched: 5 row(s)

相关文章

  • 使用Hive进行mapreduce计算

    创建表 加载数据

  • Hive调优

    Fetch抓取(Hive可以避免进行MapReduce)Hive中对某些情况的查询可以不必使用MapReduce计...

  • Hive调优

    10.hive调优 10.1 Fetch抓取 Hive中对某些情况的查询可以不必使用MapReduce计算。例如:...

  • Presto介绍与常用查询优化方法

    Presto Hive使用MapReduce作为底层计算框架,是专为批处理设计的。但随着数据越来越多,使用Hive...

  • Hive常用设置

    切换计算引擎(Hive CLI内设置)设置MapReduce为计算引擎 set hive.execution....

  • Hive 公司调优总结(一)

    1.开启Fetch抓取 Hive优化(十四)- Fetch抓取(Hive可以避免进行MapReduce) 2.使用...

  • HIVE 调优—— hive.fetch.task.conver

    Fetch 抓取是指,Hive 中对某些情况的查询可以不必使用 MapReduce 计算。 启用 MapReduc...

  • Hive的调优你都知道那些?

    前言 一、Fetch抓取(Hive可以避免进行MapReduce) Hive中对某些情况的查询可以不必使用Map...

  • 数据仓库工具Hive

    数据仓库工具Hive Hive产生背景 直接使用MapReduce处理大数据,问题: MapReduce开放难度大...

  • 企业级调优

    9.1 Fetch抓取(了解) Fetch抓取是指,Hive中对某些情况的查询可以不必使用MapReduce计算。...

网友评论

      本文标题: 使用Hive进行mapreduce计算

      本文链接:https://www.haomeiwen.com/subject/ulviextx.html