python爬虫1

作者: 关中宇 | 来源:发表于2017-12-16 12:40 被阅读0次

3分钟带你了解世界第一语言Python 入门上手也这么简单！
python爬虫学习-day7-实战
Python 基础爬虫目录
python爬虫学习-day5-selenium
python爬虫学习-day6-ip池
python爬虫学习-day3-BeautifulSoup
python爬虫学习-day4-使用lxml+xpath提取内容
python爬虫学习-day2正则表达式
python爬虫学习-day1
6张脑图系统讲透python爬虫和数据分析、数据挖掘

最近在学习编写爬虫，于是找了个网站开始爬取。这次爬取的是2345天气预报，采用的技术路线是requests-bs4。
代码如下:


import requests
import bs4

def get_html(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = 'gbk'
        #网站的编码是gbk,如果按照标准模板r.encoding = r.apparent_encoding就会出现乱码的现象
        return r.text
    except:
        return "ERROR"

def print_result(url):
    '''

    查询天气情况，并格式化输出
    '''
    html = get_html(url)
    soup = bs4.BeautifulSoup(html, 'lxml')
    weatherall1 = soup.find('div', attrs={'class':'unit-1'})
    #先找到<div class=unit-1>,信息包裹在这里面
    weatherall2 = weatherall1.find('ul', attrs={'class':'clearfix'})
    #主要信息在<ul class=clearfix>里面
    weathers = weatherall2.find_all('li')
    #每个信息在li标签里面
    weather_lists = []
    #建立一个list来存储数据
    for weather in weathers:
        i = 1
        weather_list= weather.find('i').text.strip()
        i=i+1
        weather_lists.append(weather_list)
    #这里设置i的原因是要遍历每个li标签,以便放入list里面
    print("当前气温： {}\n湿度： {}\n风力： {}\n气压： {}\n日出： {}\n日落： {}\n紫外线强度： {}\n"\
          .format(weather_lists[0],weather_lists[1],weather_lists[2],weather_lists[3],weather_lists[4],\
                  weather_lists[5],weather_lists[6]))

def main():
    url = "http://tianqi.2345.com/××××"
    print_result(url)

if __name__ == '__main__':
    main()

PS：这是我的爬虫文章的第一篇。所以程序的世界我来了！