美文网首页
突发奇想地总结2.xpath2018-11-02

突发奇想地总结2.xpath2018-11-02

作者: 研小生 | 来源:发表于2018-11-02 09:07 被阅读0次

    主要是循环节的使用,复制html改为了复制xpath,

    1. 出现这种明显带有分隔符性的要素,


      image.png

      程序当中,在整体处,即红圈处,加[0]的原因。

    name=info.xpath('div[2]/p[2]/span/text()')[0]   #[0]为什么要加?
    name1=name.split('-')[0]
    name2 = name.split('-')[1]
    
    image.png

    2.在 后面加[0],区别就是在下方显示的是否带有[]符号


    image.png

    关于灰色部分,还需弄清楚原因


    image.png
    1. 存取需要像之前正则一样,重新定义函数
      未存取定义函数之前
    import requests
    from lxml import etree
    
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
    }
    url='https://xiaoyuan.zhaopin.com/full/industry/0/0_0_0_0_-1_0_1_0'
    
    res = requests.get(url, headers=headers)
    html = etree.HTML(res.text)
    infos = html.xpath('//ul[@class="searchResultListUl"]/li')
    for info in infos:
        # rank_1=info.xpath('span[3]')[0]
        # rank=rank_1.xpath('string(.)').strip()
        name=info.xpath('div[2]/p[2]/span/text()')[0]   #[0]为什么要加?
        name1=name.split('-')[0]
        name2 = name.split('-')[1]
        job=info.xpath('div[2]/p[1]/a/text()')[0]
        place=info.xpath('div[2]/p[3]/span[1]/span/em/text()')[0]
        job_type=info.xpath('div[2]/p[4]/span[4]/span/em/text()')
        print(name1,name2,job,place,job_type)
    
    

    存取定义函数之后

    1. 正则定义函数之后


      image.png

      xpath定义函数之后


      image.png
    2. 完善之后完整代码
    import requests
    from lxml import etree
    import csv
    
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
    }
    url='https://xiaoyuan.zhaopin.com/full/industry/0/0_0_0_0_-1_0_1_0'
    
    def get_info(url):
        res = requests.get(url, headers=headers)
        html = etree.HTML(res.text)
        infos = html.xpath('//ul[@class="searchResultListUl"]/li')
        for info in infos:
            # rank_1=info.xpath('span[3]')[0]
            # rank=rank_1.xpath('string(.)').strip()
            name=info.xpath('div[2]/p[2]/span/text()')[0]   #[0]为什么要加?
            name1=name.split('-')[0]
            name2 = name.split('-')[1]
            job=info.xpath('div[2]/p[1]/a/text()')[0]
            place=info.xpath('div[2]/p[3]/span[1]/span/em/text()')[0]
            job_type=info.xpath('div[2]/p[4]/span[4]/span/em/text()')[0]
            print(name1,name2,job,place,job_type)
    
    if __name__ == '__main__':
        fp = open('C:/Users/秦振凯/Desktop/text2.csv', 'w', encoding='utf-8', newline='')
        writer = csv.writer(fp)
        writer.writerow(['name1', 'name2','job','place','job_type'])
        urls = ['https://xiaoyuan.zhaopin.com/full/industry/0/0_0_0_0_-1_0_{}_0'.format(str(i)) for i in range(0,5)]
        for url in urls:
            get_info(url)
    

    相关文章

      网友评论

          本文标题:突发奇想地总结2.xpath2018-11-02

          本文链接:https://www.haomeiwen.com/subject/iysxxqtx.html