美文网首页
python学习—第三节练习项目:爬取租房信息

python学习—第三节练习项目:爬取租房信息

作者: 碾香年年念 | 来源:发表于2016-06-29 09:36 被阅读0次

    问题

    question.png

    代码

    from bs4 import BeautifulSoup
    import requests
    
    def gender_get(classname):
        if(classname) == ['member_boy_ico']:
            return 'boy'
        else:
            return 'girl'
    
    def wb_analyse(url):
        wb_data = requests.get(url)
        soup = BeautifulSoup(wb_data.text, 'lxml')
        titles = soup.select('div.pho_info > h4 > em')
        addresses = soup.select('div.pho_info > p > span.pr5')
        rents = soup.select('div.day_l > span')
        imgs = soup.select('#curBigImage')
        ownerimgs = soup.select('div.js_box.clearfix > div.member_pic > a > img')
        ownnames = soup.select('div.js_box.clearfix > div.w_240 > h6 > a')
        genders = soup.select('div.js_box.clearfix > div.w_240 > h6 > span')
    
        for title, address, rent, img, ownerimg, ownname, gender in zip(titles, addresses, rents, imgs, ownerimgs, ownnames,
                                                                        genders):
            data = {
                'title': title.get_text(),
                'address': address.get_text(),
                'rent': rent.get_text(),
                'img': img.get('src'),
                'ownerimg': ownerimg.get('src'),
                'ownname': ownname.get_text(),
                'gender': gender_get(gender.get('class')),
            }
            print(data)
    
    def url_get(wbpage):
        wbdata = requests.get(wbpage)
        soup = BeautifulSoup(wbdata.text,'lxml')
        links = soup.select('#page_list > ul > li > a')
        for link in links:
            urlwb = link.get('href')
            wb_analyse(urlwb)
    
    urls = ["http://bj.xiaozhu.com/search-duanzufang-p{}-0/".format(number) for number in range(1,2)]
    
    for single_url in urls:
        url_get(single_url)
    

    总结

    1. 一开始试图发现每一个链接的规律,后来发现其实链接可以从网页中爬取
    2. 关于性别的判定,一开始不会取class中的内容,后来发现和图片的‘src’是一样的取法

    相关文章

      网友评论

          本文标题:python学习—第三节练习项目:爬取租房信息

          本文链接:https://www.haomeiwen.com/subject/sstxjttx.html