美文网首页ELK文集
logstash的webhdfs使用问题

logstash的webhdfs使用问题

作者: 不正经运维 | 来源:发表于2018-04-25 10:35 被阅读30次

    2018年4月25日 星期三

    10:11

    现象

    Logstash使用webhdfs插件,配置完成后无法正常输出到HDFS中,日志中报错:

    [2018-04-25T00:00:26,915][WARN ][logstash.outputs.webhdfs ] Failed to flush outgoing items {:outgoing_count=>1, :exception=>"WebHDFS::ServerError", :backtrace=>["/opt/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:351:in `request'", "/opt/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:270:in `operate_requests'", "/opt/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:73:in `create'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:228:in `write_data'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:211:in `block in flush'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:199:in `flush'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:219:in `block in buffer_flush'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:216:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:159:in `buffer_receive'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:182:in `receive'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `block in multi_receive'", "org/jruby/RubyArray.java:1734:in `each'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "/opt/logstash/logstash-core/lib/logstash/output_delegator_strategies/legacy.rb:22:in `multi_receive'", "/opt/logstash/logstash-core/lib/logstash/output_delegator.rb:49:in `multi_receive'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:477:in `block in output_batch'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:476:in `output_batch'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:428:in `worker_loop'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:386:in `block in start_workers'"]}
    

    分析

    检查配置

    既然报这个错误,就确定是访问WebHDFS过程中出错,那么首先检查下配置。

    配置内容如下:

    input {
      beats {
            port => "5044"
        }
    }
    output {
      stdout{
        codec => rubydebug
      }
      webhdfs {
        host => "x.x.x.x"                
        port => 9870                     
        path => "/weblog/iis/%{@source_host}/%{+YYYY-MM-dd}/iislog-%{@source_host}-%{YYYYMMddHH}.log"  
        user => "root"             
        retry_times => 100        
      }
    }
    

    WebHDFS: ServerError

    因为没用过Logstash,但是直到很简单易用。所以直接搜关键字,查找到了logstash-output-webhdfs Failed to flush outgoing items这篇文章,提到:

    It seems you should set user option of logstash-output-webhdfs to the hdfs supergroup user,which is the user you use to start hdfs.For example ,if you use root to run start-dfs.sh bash,then the user option shuold be root.
    In addition, you should edit /etc/hosts ,add the hdfs cluster node route list .

    可以确认两点常见问题:

    1. HDFS访问账户问题;
    2. HDFS的主机解析问题;

    解决

    HDFS访问账户问题

    这个很容易确认,HDFS上使用的账户就是root。

    HDFS主机解析问题

    查看/etc/hosts内容,发现只有namenode做了配置。

    简单思考了下,Logstash默认可能使用主机名进行解析的,而且从namenode获取到的也应该是主机名。因此Answer中才说要加入节点路由列表。

    增加hosts

    直接将所有Hadoop的节点/IP映射放入/etc/hosts中。

    修改配置

    然后修改logstash配置。

    input {
      beats {
            port => "5044"
        }
    }
    output {
      stdout{
        codec => rubydebug
      }
      webhdfs {
        host => "namenode"                
        port => 9870                     
        path => "/weblog/iis/%{+YYYY-MM-dd}/%{@source_host}/iislog-%{+HH}.log"
        user => "root"             
        retry_times => 100        
      }
    }
    

    确认结果

    查看HDFS中,如果有建立对应目录和文件就OK了。

    遗留问题

    实际上还存在一些问题:

    1. 按照官网示例,年月日前面有个dt=不明白什么作用。
    2. %{@source_host}无法解析。
    3. {+HH}不是按照UTC+0800来建立的。

    参考

    1. logstash-output-webhdfs Failed to flush outgoing items

    相关文章

      网友评论

        本文标题:logstash的webhdfs使用问题

        本文链接:https://www.haomeiwen.com/subject/puuelftx.html