美文网首页
Python小试牛刀之爬取本地城市新房价格和地址

Python小试牛刀之爬取本地城市新房价格和地址

作者: 木䬕 | 来源:发表于2020-05-28 10:24 被阅读0次

    利用xpath解析数据,requests库爬取房价,大致步骤如下:

    1. 获得目标网址,并观察网址源码;
    2. UA伪装,请求并获得响应;
    3. 解析标签数据;
    4. 循环遍历提取解析到的数据,并保存下来。
    import requests
    from lxml import etree
    import re
    import os
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'
    }
    file_path = './python_learning/xiantao58.txt'
    f = open(file_path,"w")
    for page in range(1,3):
        url = 'https://xiantao.58.com/xinfang/loupan/all/p{0:d}/'.format(page)
        response = requests.get(url=url,headers=headers)
        response.encoding = "utf-8"
        page_text = response.text
        # print(page_text)
        tree = etree.HTML(page_text)
        div_list = tree.xpath('//div[@class= "key-list imglazyload"]/div')
        # print(house_name_div)
        for div in div_list:
            house_name = div.xpath('./div/a[@class="lp-name"]/span/text()')[0]
            # print(house_name)
            house_price = div.xpath('./a[@class="favor-pos"]/p/span/text()')[0]
            # print(house_price)
            house_address = div.xpath('./div/a[@class="address"]/span/text()')[0]
            address1 = house_address.replace('[','')
            address2 = address1.replace(']','')
            address3 = address2.replace('(','')
            address4 = address3.replace(')','')
            address = "".join(address4.split()[1:])
            # print(house_name,house_price,house_address,sep="\t")
            # house_address = house_address.encode('iso-8859-1').decode('gbk')
            f.write(house_name+"\t"+house_price+"\t"+address+"\n")
            # break
            print(house_name+"下载成功!!")
    f.close()
    

    相关文章

      网友评论

          本文标题:Python小试牛刀之爬取本地城市新房价格和地址

          本文链接:https://www.haomeiwen.com/subject/bzjoahtx.html