一、配置Spark环境变量及将jars中 以下jar包拷贝至hive lib目录中
spark-core_2.12-3.0.0.jar
spark-kvstore_2.12-3.0.0.jar
spark-launcher_2.12-3.0.0.jar
spark-network-common_2.12-3.0.0.jar
spark-network-shuffle_2.12-3.0.0.jar
spark-tags_2.12-3.0.0.jar
spark-unsafe_2.12-3.0.0.jar
二、在Hive中创建spark配置文件
1、vim /opt/module/hive/conf/spark-defaults.conf
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hsfs://hadoop102:8020/spark-history spark.executor.memory 1g
spark.driver.memory 1g
2、hadoop fs -mkdir /spark-history
三、上传纯净版spark jar包至hdfs
1、hadoop fs -mkdir /spark-jars
2、hadoop fs -put /opt/module/spark-3.0.0-bin-without-hadoop/jars/* /spark-jars
四、修改Hive中hive-site.xml
<name>spark.yarn.jars</name>
<value>hdfs://hadoop102:8020/spark-jars/*</value>
</property>
<!--hive执行引擎 -->
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<!--hive和spark连接超时时间 -->
<property>
<name>hive.spark.client.connect.timeout</name>
<value>10000ms</value>
</property>
五、 配置yarn相关配置
cd /opt/module/hadoop-3.1.4/etc/hadoop/
vim capacity-scheduler.xml
1、yarn容量调度并发度
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.5</value>
</property>
2、yarn容量调度器多队列
<property>
<name>yarn.scheduler.capacity.root.queues</name
<value>default,hive</value>
</property>
<property
<name>yarn.scheduler.capacity.root.default.capacity</name
<value>50</value>
<description>Default queue target capacity.</description></property>
3、新增其他属性
<property>
<name>yarn.scheduler.capacity.root.hive.capacity </name>
<value>50</value>
<description>hive队列的容量为50% </description>
</property>
<property>
<name>yarn.scheduler.capacity.root.hive.user-limit-factor </name>
<value>1</value>
<description>一个用户最多能够获取该队列资源容量的比例,取值0-1 </description>
</property>
<property>
<name>yarn.scheduler.capacity.root.hive.maximum-capacity</name>
<value>80</value>
<description>hive队列的最大容量(自己队列资源不够,可以使用其他队列资源上限)</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.hive.state</name>
<value>RUNNING</value>
<description>开启hive队列运行,不设置队列不能使用</description>
</property>
<property><name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name>
<value>*</value>
<description>访问控制,控制谁可以将任务提交到该队列,*表示任何人</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name>
<value>*</value>
<description>访问控制,控制谁可以管理(包括提交和取消)该队列的任务,*表示任何人</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.hive.acl_application_max_priority</name>
<value>*</value>
<description>指定哪个用户可以提交配置任务优先级</description></property>
<property>
<name>yarn.scheduler.capacity.root.hive.maximum-application-lifetime</name>
<value>-1</value>
<description>hive队列中任务的最大生命时长,以秒为单位。任何小于或等于零的值将被视为禁用。</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.hive.default-application-lifetime</name>
<value>-1</value>
<description>hive队列中任务的默认生命时长,以秒为单位。任何小于或等于零的值将被视为禁用。</description>
</property>
//测试队列
hadoop jar /opt/module/hadoop-3.1.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.4.jar pi -Dmapreduce.job.queuename=hive 1 1
网友评论