天气预报爬取

作者: 波波在敲代码 | 来源:发表于2019-09-18 16:14 被阅读0次

天气预报爬取
Python泡妹爬虫，用Python发送天气预报邮件！
R语言爬取百度地图天气预报
爬取某一年哈尔滨市的天气预报信息
python爬虫1
【python 学习】python 爬取天气预报
python-爬虫学习（文字、图片、视频）
python爬虫学习（文字、图片、视频）
Selenium小例子
python多进程、多线程及协程爬虫速度比较

用熟了这个框架后，抓取静态页面感觉已经很随意了，Mooc上的《Python网络爬虫程序技术 - 深圳信息职业技术学院》课件中有一个爬取指定城市的天气预报的案例，我用这个框架去试了一下，轻松的抓取回来了。下面的代码只能抓取http://www.weather.com.cn这个站点下，指定城市的7日天气预报。（其他时长的需要重新分析内容）。
在《Python爬虫开发从入门到实践》这本书中，发现了一个判断路径是否已存在的新写法，比之前用的方法简洁，进行了更换。
打算以后把自己每天需要看的各种信息所在的网站都逐步做一个小爬虫，然后直接生成一个网页文件，配合服务器就可以做一个为自己定制的每日头条了。
以西安市的7日天气预报为例，代码如下（其他城市只需要将相应的城市7日预报的地址和文件名进行更换即可）：
（约定变量名以小写的v开头，自定义函数以小写的f开头）

# import library
import requests
from bs4 import BeautifulSoup
import os

# get html text
def fGetHtmlText(vUrl):
    vHeaders = {"User-Agent": "Mozilla/5.0"}
    try:
        r = requests.get(vUrl, headers = vHeaders)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return(r.text)
    except:
        print("there is something wrong")

# analysis of the html text
def fSoup(vHtml):
    vSoup = BeautifulSoup(vHtml, "lxml")
    vCitys = vSoup.find("title").text
    global vCity
    vCity = vCitys.split(",")[0].replace("天气预报", "")
    vList = vSoup.select("ul[class = 't clearfix'] li")    
    for vLi in vList:        
        vData = vLi.find("h1").text        
        vWeather = vLi.find("p", class_ = "wea").text
        # 最高气温比较特殊，当下有可能温度是确定的，网页有时候不存在最高气温
        if vLi.find("p", class_ = "tem").span:            
            vTemMaxinum = vLi.find("p", class_ = "tem").span.text
        else:
            vTemMaxinum = ""
        vTemMinimum = vLi.find("p", class_ = "tem").i.text
        vWind = vLi.find("em").span["title"]
        vWindPower = vLi.find("p", class_ ="win").i.text        
        fSaveData(vCity, vData, vWeather, vTemMaxinum, vTemMinimum, vWind, vWindPower)

# save data
def fSaveData(vCity, vData, vWeather, vTemMaxinum, vTemMinimum, vWind, vWindPower):
    with open(vPath, "a") as f:
        f.write(f'{vCity}, {vData}, {vWeather}, {vTemMaxinum}, {vTemMinimum}, {vWind}, {vWindPower}\n')

# judge if there is the folder and file
def fJudgeFile():
    os.makedirs("F:\\PythonData\\weather\\", exist_ok = True)
    # global statement
    global vPath
    vPath = "F:\\PythonData\\weather\\西安weather.csv"
    if os.path.exists(vPath) == True:
        os.remove(vPath)

# main function
def main(vUrl):
    fJudgeFile()
    vHtml = fGetHtmlText(vUrl)
    vText = fSoup(vHtml)

# use the main funciton    
print("****开始爬取***")
vUrl = "http://www.weather.com.cn/weather/101110101.shtml"
main(vUrl)
print("****爬取结束***")

网友评论

本文标题：天气预报爬取

本文链接：https://www.haomeiwen.com/subject/cjskuctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

天气预报爬取

相关文章

天气预报爬取

Python泡妹爬虫，用Python发送天气预报邮件！

R语言爬取百度地图天气预报

爬取某一年哈尔滨市的天气预报信息

python爬虫1

【python 学习】python 爬取天气预报

python-爬虫学习（文字、图片、视频）

python爬虫学习（文字、图片、视频）

Selenium小例子

python多进程、多线程及协程爬虫速度比较

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读