美文网首页
Python实战计划学习笔记:week1_3 爬取租房信息

Python实战计划学习笔记:week1_3 爬取租房信息

作者: luckywoo | 来源:发表于2016-06-29 09:15 被阅读43次

学习爬虫第3天,爬取小猪网租房信息。
由于网页改版,目前没有显示性别信息,所以在做练习时去掉了该项。
http://bj.xiaozhu.com/search-duanzufang-p1-0/
代码如下:

#!/usr/bin/env python
# coding: utf-8
__author__ = 'lucky'
from bs4 import BeautifulSoup
import requests
#每个链接打开后的信息
def get_info(url):    
    wb_data = requests.get(url)    
    Soup = BeautifulSoup(wb_data.text,'lxml')    
    titles =Soup.select('div.con_l > div.pho_info > h4 > em')    
    addresses = Soup.select('div.con_l > div.pho_info > p > span.pr5')    
    rents = Soup.select('#pricePart > div.day_l > span')    
    imgs = Soup.select('#curBigImage')    
    host_imgs = Soup.select('div.member_pic > a > img')   
    host_names = Soup.select('div.w_240 > h6 > a')    
    for title,address,rent,img,host_img,host_name in zip(titles,addresses,rents,imgs,host_imgs,host_names):        
    data={        
        "title":title.get_text(),        
        "address":address.get_text().split('\n')[0],        
        "rent":rent.get_text(),        
        "img":img.get('src'),        
        "host_img":host_img.get('src'),        
        "host_name":host_name.get_text()        
        }        
        print(data)

def get_links(one_url):    
        wb_data = requests.get(one_url)    
        Soup = BeautifulSoup(wb_data.text,'lxml')    
        links = Soup.select('#page_list > ul > li > a')     
        for link in links:        
                href = link.get("href")  #获取每个商品链接
                get_info(href)   #访问链接,提取商品信息

url_links = ["http://bj.xiaozhu.com/search-duanzufang-p{}-0/".format(number) for number in range(1, 10)]

for url in url_links:    
        get_links(url)
week1_3.png

总结:

1.加深了对request的get访问方式的理解。
2.加深了对网页元素位置查找的学习和使用。
3.温习了封装函数和函数调用的学习。

相关文章

网友评论

      本文标题:Python实战计划学习笔记:week1_3 爬取租房信息

      本文链接:https://www.haomeiwen.com/subject/aiztjttx.html