美文网首页
Pig与ES继承

Pig与ES继承

作者: 逸章 | 来源:发表于2020-07-24 10:34 被阅读0次

一、向ES导入数据

1.1 从CSV数据源导入

启动进程

yay@yay-ThinkPad-T470-W10DG:~/software/hadoop-2.6.0/sbin$ ./start-dfs.sh

yay@yay-ThinkPad-T470-W10DG:~/software/hadoop-2.6.0/sbin$ ./start-yarn.sh

//主要提供Mapreduce Job历史信息查询
yay@yay-ThinkPad-T470-W10DG:~/software/hadoop-2.6.0/sbin$ ./mr-jobhistory-daemon.sh start historyserver
jps进程情况: 图片.png

把es-hadoop和csv文件放到hdfs上:

yay@yay-ThinkPad-T470-W10DG:~/software/hadoop-2.6.0/sbin$ hdfs dfs -mkdir /lib
yay@yay-ThinkPad-T470-W10DG:~/software/hadoop-2.6.0/sbin$ hdfs dfs -put /home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar /lib/elasticsearch-hadoop-2.1.1.jar

yay@yay-ThinkPad-T470-W10DG:~/software/hadoop-2.6.0/sbin$ hdfs dfs -mkdir /ch07
yay@yay-ThinkPad-T470-W10DG:~/software/hadoop-2.6.0/sbin$ hdfs dfs -put /home/yay/下载/Elasticsearch-for-Hadoop_code/eshadoop-master/Chapter7/data/crimes_dataset.csv /ch07/crime_dataset.csv

pig里面执行的内容:

yay@yay-ThinkPad-T470-W10DG:~/software/pig-0.17.0/bin$ ./pig

//1. 首先需要在Pig中注册ES-HADOOP的jar文件
//grunt> REGISTER hdfs://localhost:9000/lib/elasticsearch-hadoop-7.8.0.jar;
grunt> REGISTER hdfs://localhost:9000/lib/elasticsearch-hadoop-2.1.1.jar


//2. 加载CSV数据文件
grunt> SOURCE = load '/ch07/crime_dataset.csv' using PigStorage(',') as (id:chararray, caseNumber:chararray,date:datetime,block:chararray,iucr:chararray,primaryType:chararray,description:chararray,location:chararray,arrest:boolean,domestic:boolean, lat:double,lon:double);

//3.生成与ES索引匹配的数据结构
grunt> TARGET = foreach SOURCE generate id, caseNumber,date,block,iucr,primaryType,description,location,arrest,domestic, TOTUPLE(lon, lat) AS geoLocation;

//4. 数据存入ES索引中(这个实际是在执行MapReduce命令)
grunt> STORE TARGET INTO 'es_pig/crimes' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.timeout = 5m','es_index.auto.create = true', 'es.mapping.names=arrest:isArrest, domestic:isDomestic','es.mapping.id=id');

看看执行最后一条命令的回显信息:

grunt> STORE TARGET INTO 'es_pig/crimes' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.timeout = 5m','es_index.auto.create = true', 'es.mapping.names=arrest:isArrest, domestic:isDomestic','es.mapping.id=id');
2020-07-24 10:04:57,323 [main] WARN  org.elasticsearch.hadoop.mr.EsOutputFormat - Speculative execution enabled for reducer - consider disabling it to prevent data corruption
2020-07-24 10:04:57,339 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2020-07-24 10:04:57,375 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2020-07-24 10:04:57,408 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2020-07-24 10:04:57,474 [main] INFO  org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2020-07-24 10:04:57,570 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2020-07-24 10:04:57,605 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2020-07-24 10:04:57,605 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2020-07-24 10:04:57,644 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
2020-07-24 10:04:57,645 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2020-07-24 10:04:57,709 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2020-07-24 10:04:57,716 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2020-07-24 10:04:57,716 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2020-07-24 10:04:57,719 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2020-07-24 10:04:57,725 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2020-07-24 10:04:57,738 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
2020-07-24 10:04:57,971 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/tmp/pig1493399292987591061tmp/elasticsearch-hadoop-7.8.0.jar to DistributedCache through /tmp/temp-1797060858/tmp-1378164927/elasticsearch-hadoop-7.8.0.jar
2020-07-24 10:04:58,081 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/yay/software/pig-0.17.0/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1797060858/tmp1140381131/pig-0.17.0-core-h2.jar
2020-07-24 10:04:58,170 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/yay/software/pig-0.17.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1797060858/tmp-1965548869/automaton-1.11-8.jar
2020-07-24 10:04:58,249 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/yay/software/pig-0.17.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1797060858/tmp2116659353/antlr-runtime-3.4.jar
2020-07-24 10:04:58,339 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/yay/software/pig-0.17.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1797060858/tmp1679799996/joda-time-2.9.3.jar
2020-07-24 10:04:58,408 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2020-07-24 10:04:58,427 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2020-07-24 10:04:58,427 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2020-07-24 10:04:58,427 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2020-07-24 10:04:58,470 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2020-07-24 10:04:58,471 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2020-07-24 10:04:58,476 [JobControl] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2020-07-24 10:04:58,491 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2020-07-24 10:04:58,493 [JobControl] WARN  org.elasticsearch.hadoop.mr.EsOutputFormat - Speculative execution enabled for reducer - consider disabling it to prevent data corruption
2020-07-24 10:04:58,670 [JobControl] WARN  org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2020-07-24 10:04:58,707 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2020-07-24 10:04:58,714 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2020-07-24 10:04:58,714 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2020-07-24 10:04:58,739 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2020-07-24 10:04:58,799 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2020-07-24 10:04:58,897 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local1469778342_0001
2020-07-24 10:04:59,236 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /home/yay/hadoop_tmp/mapred/local/1595556299054/elasticsearch-hadoop-7.8.0.jar <- /home/yay/software/pig-0.17.0/bin/elasticsearch-hadoop-7.8.0.jar
2020-07-24 10:04:59,277 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp-1797060858/tmp-1378164927/elasticsearch-hadoop-7.8.0.jar as file:/home/yay/hadoop_tmp/mapred/local/1595556299054/elasticsearch-hadoop-7.8.0.jar
2020-07-24 10:04:59,300 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /home/yay/hadoop_tmp/mapred/local/1595556299055/pig-0.17.0-core-h2.jar <- /home/yay/software/pig-0.17.0/bin/pig-0.17.0-core-h2.jar
2020-07-24 10:04:59,309 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp-1797060858/tmp1140381131/pig-0.17.0-core-h2.jar as file:/home/yay/hadoop_tmp/mapred/local/1595556299055/pig-0.17.0-core-h2.jar
2020-07-24 10:04:59,309 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /home/yay/hadoop_tmp/mapred/local/1595556299056/automaton-1.11-8.jar <- /home/yay/software/pig-0.17.0/bin/automaton-1.11-8.jar
2020-07-24 10:04:59,315 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp-1797060858/tmp-1965548869/automaton-1.11-8.jar as file:/home/yay/hadoop_tmp/mapred/local/1595556299056/automaton-1.11-8.jar
2020-07-24 10:04:59,316 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /home/yay/hadoop_tmp/mapred/local/1595556299057/antlr-runtime-3.4.jar <- /home/yay/software/pig-0.17.0/bin/antlr-runtime-3.4.jar
2020-07-24 10:04:59,323 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp-1797060858/tmp2116659353/antlr-runtime-3.4.jar as file:/home/yay/hadoop_tmp/mapred/local/1595556299057/antlr-runtime-3.4.jar
2020-07-24 10:04:59,323 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /home/yay/hadoop_tmp/mapred/local/1595556299058/joda-time-2.9.3.jar <- /home/yay/software/pig-0.17.0/bin/joda-time-2.9.3.jar
2020-07-24 10:04:59,328 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp-1797060858/tmp1679799996/joda-time-2.9.3.jar as file:/home/yay/hadoop_tmp/mapred/local/1595556299058/joda-time-2.9.3.jar
2020-07-24 10:04:59,374 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/home/yay/hadoop_tmp/mapred/local/1595556299054/elasticsearch-hadoop-7.8.0.jar
2020-07-24 10:04:59,374 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/home/yay/hadoop_tmp/mapred/local/1595556299055/pig-0.17.0-core-h2.jar
2020-07-24 10:04:59,374 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/home/yay/hadoop_tmp/mapred/local/1595556299056/automaton-1.11-8.jar
2020-07-24 10:04:59,374 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/home/yay/hadoop_tmp/mapred/local/1595556299057/antlr-runtime-3.4.jar
2020-07-24 10:04:59,374 [JobControl] INFO  org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/home/yay/hadoop_tmp/mapred/local/1595556299058/joda-time-2.9.3.jar
2020-07-24 10:04:59,377 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
2020-07-24 10:04:59,378 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1469778342_0001
2020-07-24 10:04:59,378 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases SOURCE,TARGET
2020-07-24 10:04:59,378 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: SOURCE[1,9],TARGET[-1,-1] C:  R: 
2020-07-24 10:04:59,383 [Thread-22] INFO  org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
2020-07-24 10:04:59,383 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2020-07-24 10:04:59,384 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local1469778342_0001]
2020-07-24 10:04:59,408 [Thread-22] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2020-07-24 10:04:59,409 [Thread-22] INFO  org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2020-07-24 10:04:59,456 [Thread-22] INFO  org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2020-07-24 10:04:59,457 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1469778342_0001_m_000000_0
2020-07-24 10:04:59,523 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.Task -  Using ResourceCalculatorProcessTree : [ ]
2020-07-24 10:04:59,528 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 131156
Input split[0]:
   Length = 131156
   ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
   Locations:

-----------------------

2020-07-24 10:04:59,542 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2020-07-24 10:04:59,545 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed hdfs://localhost:9000/ch07/crime_dataset.csv:0+131156
2020-07-24 10:04:59,578 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2020-07-24 10:04:59,579 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2020-07-24 10:04:59,596 [LocalJobRunner Map Task Executor #0] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: SOURCE[1,9],TARGET[-1,-1] C:  R: 
2020-07-24 10:04:59,730 [LocalJobRunner Map Task Executor #0] INFO  org.elasticsearch.hadoop.util.Version - Elasticsearch Hadoop v7.8.0 [5a921b0e52]
2020-07-24 10:04:59,752 [LocalJobRunner Map Task Executor #0] WARN  org.elasticsearch.hadoop.rest.Resource - Detected type name in resource [es_pig/crimes]. Type names are deprecated and will be removed in a later release.
2020-07-24 10:04:59,752 [LocalJobRunner Map Task Executor #0] INFO  org.elasticsearch.hadoop.mr.EsOutputFormat - Writing to [es_pig/crimes]
2020-07-24 10:04:59,778 [LocalJobRunner Map Task Executor #0] WARN  org.elasticsearch.hadoop.rest.Resource - Detected type name in resource [es_pig/crimes]. Type names are deprecated and will be removed in a later release.
2020-07-24 10:05:02,038 [LocalJobRunner Map Task Executor #0] WARN  org.elasticsearch.hadoop.rest.Resource - Detected type name in resource [es_pig/crimes]. Type names are deprecated and will be removed in a later release.
2020-07-24 10:05:02,074 [LocalJobRunner Map Task Executor #0] WARN  org.elasticsearch.hadoop.rest.Resource - Detected type name in resource [es_pig/crimes]. Type names are deprecated and will be removed in a later release.
2020-07-24 10:05:02,252 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner - 
2020-07-24 10:05:02,562 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.Task - Task:attempt_local1469778342_0001_m_000000_0 is done. And is in the process of committing
2020-07-24 10:05:02,571 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner - map
2020-07-24 10:05:02,571 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.Task - Task 'attempt_local1469778342_0001_m_000000_0' done.
2020-07-24 10:05:02,571 [LocalJobRunner Map Task Executor #0] INFO  org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local1469778342_0001_m_000000_0
2020-07-24 10:05:02,571 [Thread-22] INFO  org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2020-07-24 10:05:02,886 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2020-07-24 10:05:02,886 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local1469778342_0001]
2020-07-24 10:05:04,395 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2020-07-24 10:05:04,408 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2020-07-24 10:05:04,409 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2020-07-24 10:05:04,409 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2020-07-24 10:05:04,410 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2020-07-24 10:05:04,461 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2020-07-24 10:05:04,463 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
2.6.0   0.17.0  yay 2020-07-24 10:04:57 2020-07-24 10:05:04 UNKNOWN

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime  MinMapTime  AvgMapTime  MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime    Alias   Feature Outputs
job_local1469778342_0001    1   0   n/a n/a n/a n/a 0   0   0   0   SOURCE,TARGET   MAP_ONLY    es_pig/crimes,

Input(s):
Successfully read 999 records (7914968 bytes) from: "/ch07/crime_dataset.csv"

Output(s):
Successfully stored 999 records (6769797 bytes) in: "es_pig/crimes"

Counters:
Total records written : 999
Total bytes written : 6769797
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_local1469778342_0001


2020-07-24 10:05:04,464 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2020-07-24 10:05:04,465 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2020-07-24 10:05:04,466 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2020-07-24 10:05:04,471 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1009 time(s).
2020-07-24 10:05:04,472 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
grunt>  
图片.png

1.2 从JSON数据源导入

json文件上传到hdfs上

yay@yay-ThinkPad-T470-W10DG:~/software/hadoop-2.6.0/sbin$ hdfs dfs -put /home/yay/下载/Elasticsearch-for-Hadoop_code/eshadoop-master/Chapter7/data/crimes.json /ch07/crimes.json

相关文章

  • Pig与ES继承

    一、向ES导入数据 1.1 从CSV数据源导入 启动进程 jps进程情况: 把es-hadoop和csv文件放到h...

  • ES的类与继承

    ES5中的类与继承 构造函数继承,原型继承,组合式继承 静态方法,静态属性,实例方法,实例属性 ES6中的类与继承...

  • 面向对象类

    类与实例 类的声明 ES5 ES6 生成实例 类与继承 如何实现继承 继承的几种方式 原型链是实现继承的主要方法 ...

  • 浅谈JavaScript原型、原型链的概念与继承的实现原理

    关于js对象的继承,es5和es6提供了两种不同的继承机制。es5通过修改原型链的方式实现继承,由此可见继承与原型...

  • 原型与原型链以及继承

    今天复习下原型与原型链的知识,以及记录新学的继承知识点。 知识点纲要 原型与原型链 es5与es6继承 什么是原型...

  • 继承方式

    es6的继承: es6的继承写法简洁方便,十分直观。 es5的继承:(只推荐组合继承--最常用) 组合继承 除了S...

  • ES6

    ES6是一个语言标准,不是一个框架。 ES6中的class与继承 class是创建类对象与实现类继承的语法糖,旨在...

  • ES5 和 ES6 继承比较:

    ES5构造函数和继承: ES6构造函数和继承:

  • 构造函数

    es5 es6 继承

  • JavaScript的继承

    前言 忘了整理的理一理。 ES5继承 先不涉及ES6的继承,后面会涉及。这里主要是ES5的继承: 原型链继承 构造...

网友评论

      本文标题:Pig与ES继承

      本文链接:https://www.haomeiwen.com/subject/yptjlktx.html