美文网首页
爬虫及ES查询分析器搜索指令

爬虫及ES查询分析器搜索指令

作者: 木木_bfe8 | 来源:发表于2018-08-11 18:10 被阅读0次

    用户代理模块
    https://github.com/hellysmile/fake-useragent
    http代理 西刺代理

    http://www.xicidaili.com/

    elasticsearch-dsl-py

    可以借助接口帮我们分词
    GET _analyze
    {
    "analyzer": "ik_smart",
    "text": "python工程师"
    }
    es搜索
    jobbole:index
    article:type
    1、match 会对输入的查询进行分词
    GET jobbole/article/_search
    {
    "query" : {
    "match " : { "content" : "27" }
    }
    }
    2、term 不会对输入查询进行分词,进行全量匹配,但可以传递数据进行匹配
    GET jobbole/article/_search
    {
    "query" : {
    //"term" : { "content" : "27" }
    "term" : { "content" : ["27","29","30"] }
    },
    "from":1,
    "size":3
    }
    3、match_all 全查,{}为空
    GET jobbole/article/_search
    {
    "query" : {
    "match_all" : {}
    }
    }
    3、match_phrase 会将查询先分词,查询结果包含全部分词才行,slop标示分词之间的距离,小于就不行
    GET jobbole/article/_search
    {
    "query" : {
    "match_phrase" : {
    "content": {
    "query": "python视频",
    "slop":3
    }}
    }
    }
    4、multi_match 多个字段中,匹配查询条件
    GET jobbole/article/_search
    {
    "query" : {
    "multi_match" : {
    "query": "python",
    "fields": ["content","url"]
    }
    }
    }
    5、返回指定字段,前提是mapping中该字段的 store属性为true
    GET jobbole/article/_search
    {
    "query" : {
    "match" : {
    "content": "python"
    }
    },
    "stored_fields": ["content"]
    }
    6、还有sort之类的就不列了,看文档去
    7、通配符查询
    GET jobbole/article/_search
    {
    "query" : {
    "wildcard" : {
    "content": {"value": "pyth*n","boost": 2}
    }
    }
    }
    8、组合查询 bool查询
    GET jobbole/article/_search
    {
    "query" : {
    "bool": {
    "must": [
    {"match": {
    "FIELD": "TEXT"
    }}
    ],
    "filter": {"term": {
    "FIELD": "VALUE"
    }},
    "must_not": [
    {}
    ],
    "should": [
    {}
    ]
    }
    }
    }
    9、高量查询 默认 高亮词会用<em>和</em>包围,你可以通过pre_tags 和 post_tags修改它:
    GET jobbole/article/_search
    {
    "query" : {
    "match": { "content": "小姐姐" }
    },
    "highlight" : {
    "pre_tags" : ["<tag1>", "<tag2>"],
    "post_tags" : ["</tag1>", "</tag2>"],
    "fields" : {
    "content" : {"type" : "plain"}
    }
    }
    }

    10、联想词(Suggesters)
    POST jobbole/article/_search
    {
    "query" : {
    "match": {
    "content": "。。。"
    }
    },
    "suggest" : {
    "my-suggestion" : {
    "text" : "",
    "term" : {
    "field" : "content"
    }
    }
    }
    }

    相关文章

      网友评论

          本文标题:爬虫及ES查询分析器搜索指令

          本文链接:https://www.haomeiwen.com/subject/idtdbftx.html