美文网首页
Python实战计划学习笔记:week1_3 爬取租房信息

Python实战计划学习笔记:week1_3 爬取租房信息

作者: luckywoo | 来源:发表于2016-06-29 09:15 被阅读43次

    学习爬虫第3天,爬取小猪网租房信息。
    由于网页改版,目前没有显示性别信息,所以在做练习时去掉了该项。
    http://bj.xiaozhu.com/search-duanzufang-p1-0/
    代码如下:

    #!/usr/bin/env python
    # coding: utf-8
    __author__ = 'lucky'
    from bs4 import BeautifulSoup
    import requests
    #每个链接打开后的信息
    def get_info(url):    
        wb_data = requests.get(url)    
        Soup = BeautifulSoup(wb_data.text,'lxml')    
        titles =Soup.select('div.con_l > div.pho_info > h4 > em')    
        addresses = Soup.select('div.con_l > div.pho_info > p > span.pr5')    
        rents = Soup.select('#pricePart > div.day_l > span')    
        imgs = Soup.select('#curBigImage')    
        host_imgs = Soup.select('div.member_pic > a > img')   
        host_names = Soup.select('div.w_240 > h6 > a')    
        for title,address,rent,img,host_img,host_name in zip(titles,addresses,rents,imgs,host_imgs,host_names):        
        data={        
            "title":title.get_text(),        
            "address":address.get_text().split('\n')[0],        
            "rent":rent.get_text(),        
            "img":img.get('src'),        
            "host_img":host_img.get('src'),        
            "host_name":host_name.get_text()        
            }        
            print(data)
    
    def get_links(one_url):    
            wb_data = requests.get(one_url)    
            Soup = BeautifulSoup(wb_data.text,'lxml')    
            links = Soup.select('#page_list > ul > li > a')     
            for link in links:        
                    href = link.get("href")  #获取每个商品链接
                    get_info(href)   #访问链接,提取商品信息
    
    url_links = ["http://bj.xiaozhu.com/search-duanzufang-p{}-0/".format(number) for number in range(1, 10)]
    
    for url in url_links:    
            get_links(url)
    
    week1_3.png

    总结:

    1.加深了对request的get访问方式的理解。
    2.加深了对网页元素位置查找的学习和使用。
    3.温习了封装函数和函数调用的学习。

    相关文章

      网友评论

          本文标题:Python实战计划学习笔记:week1_3 爬取租房信息

          本文链接:https://www.haomeiwen.com/subject/aiztjttx.html