美文网首页
14 ElasticSearch For Hadoop 源代码问

14 ElasticSearch For Hadoop 源代码问

作者: 逸章 | 来源:发表于2020-06-23 23:55 被阅读0次

    第一章问题

    1. 问题

    1、hadoop-hdfs这个artifactId需要修改

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.2.1</version>
        </dependency>
    

    2。下面的一个错误which might be less than configured maximum allocation=<memory:8192, vCores:4>

    3。job卡死
    3。1 分配给Yarn的资源不够(我的就是这个原因)
    在yarn-site.xml中配置下面的参数即可

    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>20480</value>
    </property>
    <property>
       <name>yarn.scheduler.minimum-allocation-mb</name>
       <value>2048</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
    </property>
    

    3。2 原因可能二(网上说的,和我们这里无关):主要原因是执行过程中我们执行了Ctrl+c,出现的现象是:

    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter1/target$ hadoop jar ch01-0.0.1-job.jar /input/ch01/sample.txt
    2020-06-23 19:58:08,230 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
    2020-06-23 19:58:08,414 WARN mr.EsOutputFormat: Speculative execution enabled for reducer - consider disabling it to prevent data corruption
    2020-06-23 19:58:08,768 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    2020-06-23 19:58:08,845 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/yay/.staging/job_1592904887052_0004
    2020-06-23 19:58:09,052 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
    2020-06-23 19:58:10,658 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
    2020-06-23 19:58:12,441 INFO input.FileInputFormat: Total input files to process : 1
    2020-06-23 19:58:13,754 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
    2020-06-23 19:58:14,284 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
    2020-06-23 19:58:14,324 INFO mapreduce.JobSubmitter: number of splits:1
    2020-06-23 19:58:14,520 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
    2020-06-23 19:58:14,549 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1592904887052_0004
    2020-06-23 19:58:14,549 INFO mapreduce.JobSubmitter: Executing with tokens: []
    2020-06-23 19:58:14,732 INFO conf.Configuration: resource-types.xml not found
    2020-06-23 19:58:14,732 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
    2020-06-23 19:58:14,794 INFO impl.YarnClientImpl: Submitted application application_1592904887052_0004
    2020-06-23 19:58:14,832 INFO mapreduce.Job: The url to track the job: http://yay-ThinkPad-T470-W10DG:8088/proxy/application_1592904887052_0004/
    2020-06-23 19:58:14,832 INFO mapreduce.Job: Running job: job_1592904887052_0004
    

    然后没有动静
    重启yarn和hdfs后再次执行终于看到错误信息:

    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter1/target$ hadoop jar ch01-0.0.1-job.jar /input/ch01/sample.txt
    2020-06-23 20:02:30,428 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
    2020-06-23 20:02:30,632 WARN mr.EsOutputFormat: Speculative execution enabled for reducer - consider disabling it to prevent data corruption
    2020-06-23 20:02:31,036 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    2020-06-23 20:02:31,057 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/yay/.staging/job_1592913738368_0001
    Exception in thread "main" org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hadoop-yarn/staging/yay/.staging/job_1592913738368_0001. Name node is in safe mode.
    The reported blocks 21 has reached the threshold 0.9990 of total blocks 21. The minimum number of live datanodes is not required. In safe mode extension. Safe mode will be turned off automatically in 3 seconds. NamenodeHostName:localhost
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1476)
    
    

    解决方法是:

    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-masterhapter1/target$ hadoop dfsadmin -safemode leave
    WARNING: Use of this script to execute dfsadmin is deprecated.
    WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.
    
    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter1/target$ 
    

    <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>1024</value>
    </property>
    <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>1</value>
    </property>
    改为
    <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>1536</value>
    </property>
    <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>4</value>
    </property>
    3。我们这里需要使用hadoo2.6.0

    2. 其他注释

    图片.png 图片.png

    第二章问题

    1.一个错误(在中国地区运行会出错)

    图片.png

    2. 其他注释

    2.1 ES中字段是如何形成的

    图片.png

    2.2 体现ES统计功能的例子(Top n的使用)

    图片.png

    TOP5的查询条件:

    post esh_network/_search?pretty -d `{
      "aggs":{
        "top-catagories":{
          "terms":
          {
           "field":"category",
            "size":5
          }
        }
      },
      "size":0
    }`
    
    效果图: 图片.png

    2.3 如果数据量过大,可以以天为粒度建立索引

    注意:执行过程可能有报错,但是多个索引的确都建立了

    图片.png
    public class Driver {
    
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            // ElasticSearch Server nodes to point to
            conf.set("es.nodes", "localhost:9200");
            // ElasticSearch index and type name in {indexName}/{typeName} format
    //      conf.set("es.resource", "esh_network/network_logs_{action}");
            conf.set("es.resource", "esh_network_{@timestamp:YYYY.MM.dd}/network_logs_{action}");
            //EEE MMM dd hh:mm:ss yyyy
    
            // Create Job instance
            Job job = new Job(conf, "network monitor mapper");
            // set Driver class
            job.setJarByClass(Driver.class);
            job.setMapperClass(NetworkLogsMapper.class);
            // set OutputFormat to EsOutputFormat provided by ElasticSearch-Hadoop jar
            job.setOutputFormatClass(EsOutputFormat.class);
            job.setNumReduceTasks(0);
            FileInputFormat.addInputPath(job, new Path(args[0]));
    
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    
    }
    
    用模糊匹配删除索引: 图片.png

    2.4 Tweeter数据入ES然后入HDFS
    2.4.1. 数据入ES

    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-masterhapter2$ hdfs dfs -put data/tweets.csv /input/ch02/tweets.csv
    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-masterhapter2$ hdfs dfs -ls /input/ch02
    Found 2 items
    -rw-r--r--   1 yay supergroup    7330547 2020-06-27 00:20 /input/ch02/network-logs.txt
    -rw-r--r--   1 yay supergroup     391727 2020-06-30 16:53 /input/ch02/tweets.csv
    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-masterhapter2$ hadoop jar target/ch02-0.0.1-tweets2es-job.jar /input/ch02/tweets.csv
    
    图片.png
    2.4.2 从ES读书数据到HDFS上
    注意,HDFS上的输出目录需要确保不存在,比如我实际使用的目录是/input/ch02/tohdfs,否则你就需要考虑先删除已经存在的目录:$ hdfs dfs -rm -r /input/ch03/
    图片.png
    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ hadoop jar target/ch02-0.0.1-tweets2hdfs-job.jar /input/ch02/tohdfs
    
    图片.png 图片.png

    第三章问题

    2、其他

    curl命令换行使用单引号

    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ curl -XPUT http://localhost:9200/hrms1/candidate/1?pretty -d '
    {"firstname":"ay",
    "lastname":"Y",
    "skill":["Java","Scala","what"]
    }'
    {
      "_index" : "hrms1",
      "_type" : "candidate",
      "_id" : "1",
      "_version" : 1,
      "created" : true
    }
    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ curl -XPOST http://localhost:9200/hrms1/candidate/1/_update?pretty -d '
    {"doc":{"newkey":"newvalue"}}'
    {
      "_index" : "hrms1",
      "_type" : "candidate",
      "_id" : "1",
      "_version" : 2
    }
    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ curl -XGET http://localhost:9200/hrms1/candidate/1?pretty
    {
      "_index" : "hrms1",
      "_type" : "candidate",
      "_id" : "1",
      "_version" : 2,
      "found" : true,
      "_source":{"firstname":"ay","lastname":"Y","skill":["Java","Scala","what"],"newkey":"newvalue"}
    }
    yay@yay-ThinkPad-T470-W10DG:~/下载/Elasticsearch for Hadoop_code/eshadoop-master/Chapter2$ 
    
    图片.png

    2.1 URI查询

    图片.png
    curl  http://localhost:9200/hrms/candidate/_search?pretty=true&q=skills:elasticsearch
    

    2.2 match_all查询

    图片.png
    yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
    {
       "query":
       {
        "match_all":{}
       }
    }'
    

    2.3 term匹配

    yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
    {
       "query":
     {
       "term":
       {
         "skills":
         {
           "value":"elasticsearch"
         }
       }
     },
    
    "size":"2"
    }'
    
    
    图片.png

    2.4 boolean查询

    yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
    {
       "query":
     {
       "bool":{
        "must":[{
          "term":{
            "address.city":{
                "value":"Mumbai"
         
                 }
                }
               }
             ],
       "should":[{
          "terms":{
            "skills":["elasticsearch","lucene"]
                  }
                }]
          }
     }
    }'
    
    图片.png

    2.5 match查询

    yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
    {
       "query":{
       "match":{
          "comments":{
            "query":"hacking java"
                  }
              }
           }
    }'
    
    图片.png

    如果加上type,可以限制精确匹配:

    yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
    {
       "query":{
       "match":{
          "comments":{
            "query":"Ethical hacking","type":"phrase"
                  }
              }
           }
    }'
    
    
    
    图片.png

    2.6 range

    yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
    {
       "query":{
         "range":{
         "experience":{
           "gte":5,
           "lte":10}
              }
           }
    }'
    
    图片.png

    2.7 wildcard查询(针对精确查询的通配)

    yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
    {
       "query":{
         "wildcard":{
           "address.city":{
             "value":"Mu*"
               }
            }
       }
    }'
    
    
    图片.png

    3 过滤器

    和query的区别是没有相关性概念,过滤器不会按照记分的方法返回相似结果


    图片.png

    3.1 exists(字段存在且非空)

    注意下面filtered是旧版本用法,新版本已经废弃

    yay@yay-ThinkPad-T470-W10DG:~$ curl -XPOST  http://localhost:9200/hrms/candidate/_search?pretty -d '
    {
       "query":{
         "filtered":{
        "filter":{
          "exists":{
            "field":"achievements"
                  }
          }
        }
      }
    }'
    
    图片.png

    第四章问题

    1. 问题

    2.其他

    **1. **


    图片.png
     private static Writable convertMMddYYTimeToWritable(String timeStr)
        {
            if(timeStr == null){
                return NullWritable.get();
            }
            SimpleDateFormat dateFormat=new SimpleDateFormat("MM/dd/yyyy", Locale.ENGLISH);
            try {
                return new LongWritable(dateFormat.parse(timeStr).getTime());
            } catch (ParseException e) {
                e.printStackTrace();
                return NullWritable.get();
            }        
        }
    
    图片.png
    图片.png 图片.png 图片.png

    **2. 堆积柱状图

    图片.png
    图片.png 图片.png
    图片.png 图片.png 图片.png
    图片.png

    面积图: 图片.png

    环形图: 图片.png
    图片.png

    相关文章

      网友评论

          本文标题:14 ElasticSearch For Hadoop 源代码问

          本文链接:https://www.haomeiwen.com/subject/plihfktx.html