美文网首页elasticsearch
logstash过滤mysql大字段content字段的html

logstash过滤mysql大字段content字段的html

作者: alfred88 | 来源:发表于2019-08-07 18:58 被阅读0次

    众多初学者,如果有老的数据,从编辑器里生成出来的html代码片段,导入elasticsearch中,会出现搜索高亮时把html显示出来,体验不好,同步logstash时,需要进行filter过滤器先过滤掉html代码

    filter{
        mutate{
            gsub => [ "content", "<script(.*?)</script>", "" ]
        }
        mutate{
            gsub => [ "content", "<iframe(.*?)</iframe>", "" ]
        }
        mutate{
            gsub => [ "content", "<style(.*?)</style>", "" ]
        }
        mutate{
            gsub => [ "content", "<(.*?)>", "" ]
        }
        mutate{
            gsub => [ "content", "&nbsp;", "" ]
        }
    }
    

    许多需要先在mysql中过滤,尤其是时间类型字段,建索引时也要指定格式:

    "format"=>"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis||strict_date_optional_time"
    
    SELECT a.id,a.title,b.content,b.content as content_old,CONCAT(a.addtime) AS addtime,CONCAT(a.autotime) AS autotime,a.views,a.zans,a.type_a,a.type_b,CONCAT(a.isshow) AS isshow,CONCAT(a.isdelete) AS isdelete,if(isnull(a.deletetime),0,a.deletetime) as deletetime FROM web_information a
    

    相关文章

      网友评论

        本文标题:logstash过滤mysql大字段content字段的html

        本文链接:https://www.haomeiwen.com/subject/uwvadctx.html