美文网首页
Python实战计划学习笔记0630

Python实战计划学习笔记0630

作者: 个十滴水 | 来源:发表于2016-06-30 01:47 被阅读0次

    实战计划第三天,抓了300条数据。

    最终成果是这样的:

    Paste_Image.png

    我的代码:

    #!/usr/bin/env python    #告诉计算机执行程序在系统环境变量中的名字,详细位置在环境变量中设置好了
    #-*- coding: utf-8 -*-
    from bs4 import BeautifulSoup
    import time
    import requests
    
    urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format(str(i))for i in range(1,10,1)]
    
    def house_source(url,data=None):
        wb_data = requests.get(url)
        time.sleep(1)
        soup = BeautifulSoup(wb_data.text, 'lxml')
    
        titles = soup.select("body > div.wrap.clearfix.con_bg > div.con_l > div.pho_info > h4 > em")
        adresses = soup.select('body > div.wrap.clearfix.con_bg > div.con_l > div.pho_info > p > span.pr5')
        prices = soup.select('#pricePart > div.day_l > span')
        imgs = soup.select('img[id="curBigImage"]')
        names = soup.select('#floatRightBox > div.js_box.clearfix > div.w_240 > h6 > a')
        picts = soup.select('#floatRightBox > div.js_box.clearfix > div.member_pic > a > img')
        males = soup.select('div[class="member_ico1"]')
    
        for title, adress, price, img, name, pict, male in zip(titles, adresses, prices, imgs, names, picts, males):
            data = {
                'title': title.get_text(),
                'adress': adress.get_text(),
                'price': price.get_text(),
                'img': img.get('src'),
                'name': name.get_text(),
                'pict': pict.get('src'),
                'male': get_lorder_male(male.get('class'))      #写个函数处理
            }
            print(data)
    
    def get_links(url):
        wb_data = requests.get(url)
        time.sleep(2)
        soup = BeautifulSoup(wb_data.text, 'lxml')
        links = soup.select("#page_list > ul > li > a" )
        for link in links:
            href = link.get('href')
            house_source(href)
    
    def get_lorder_male(class_name):
        if class_name == ['member_ico']:  # 判断语句
            return '男'
        else:
            return '女'
    
    for single_url in urls:
        get_links(single_url)
    
    

    总结

    • format(str(i))for i in range(1,10,1) 找网页编号规律
    • 构建字典时对key的处理,如get到css样式对应的属性male.get('class'),link.get('href'),
    • 函数的构建 如def get_lorder_male(class_name):
      if class_name == ['member_ico']: # 判断语句 return '男'

    相关文章

      网友评论

          本文标题:Python实战计划学习笔记0630

          本文链接:https://www.haomeiwen.com/subject/yibxjttx.html