美文网首页python
Async与aiohttp介绍

Async与aiohttp介绍

作者: 奕剑听雨 | 来源:发表于2018-08-13 16:05 被阅读269次

    【async/await】 asyncio--异步IO async--异步
    将异步从yieled写法中解放出来.
    async一般用于方法或者条件语句前面,用于表明当前条件语句内部或者方法内部存在异步函数
    await 用于具体的操作前面,表明当前操作为异步操作

    !/usr/local/bin/python3.5

    import asyncio
    from aiohttp import ClientSession

    async def hello():
    async with ClientSession() as session:
    async with session.get("http://httpbin.org/headers") as response:
    response = await response.read()
    print(response)

    loop = asyncio.get_event_loop()
    loop.run_until_complete(hello())

    使用async和await将函数异步化,上述hello()实际有两个异步操作:首先异步获取响应;然后异步堵气响应内容
    Aiohttp推荐使用ClientSession作为主要的接口发起请求。ClientSession允许在多个请求之间保存cookie以及相关对象信息。
    Session(会话)在使用完毕之后需要关闭,关闭Session是另一个异步操作,所以每次你都需要使用async with关键字。
    要让程序正常跑起来需要将他们加入时间循环中,因此要创建asyncio loop实例,然后将任务加入其中。

    【aiohttp】
    基础用法:
    async with aiohttp.get('https://github.com') as r:---异步发起请求
    await r.text()-----异步操作

    设置超时时间:
    with aiohttp.Timeout(0.001):
    async with aiohttp.get('https://github.com') as r:
    await r.text()

    构建session:
    async with aiohttp.ClientSession() as session:
    async with session.get('https://api.github.com/events') as resp:
    print(resp.status)
    print(await resp.text())

    构建headers:
    url = 'https://api.github.com/some/endpoint'
    headers = {'content-type': 'application/json'}
    await session.get(url, headers=headers)

    使用代理:
    EG_1:
    conn = aiohttp.ProxyConnector(proxy="http://some.proxy.com")----创建代理
    session = aiohttp.ClientSession(connector=conn)
    async with session.get('http://python.org') as resp:
    print(resp.status)
    EG_2:
    conn = aiohttp.ProxyConnector(
    proxy="http://some.proxy.com",
    proxy_auth=aiohttp.BasicAuth('user', 'pass')
    )
    session = aiohttp.ClientSession(connector=conn)
    async with session.get('http://python.org') as r:
    assert r.status == 200

    自定义cookie:
    url = 'http://httpbin.org/cookies'
    async with ClientSession({'cookies_are': 'working'}) as session:
    async with session.get(url) as resp:
    assert await resp.json() == {"cookies": {"cookies_are": "working"}}

    爬虫实例:
    import urllib.request as request
    from bs4 import BeautifulSoup as bs
    import asyncio
    import aiohttp

    @asyncio.coroutine
    async def getPage(url,res_list):
    print(url)
    headers = {'User-Agent':'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
    # conn = aiohttp.ProxyConnector(proxy="http://127.0.0.1:8087")
    async with aiohttp.ClientSession() as session:
    async with session.get(url,headers=headers) as resp:
    assert resp.status==200
    res_list.append(await resp.text())

    class parseListPage():
    def init(self,page_str):
    self.page_str = page_str
    def enter(self):
    page_str = self.page_str
    page = bs(page_str,'lxml')
    # 获取文章链接
    articles = page.find_all('div',attrs={'class':'article_title'})
    art_urls = []
    for a in articles:
    x = a.find('a')['href']
    art_urls.append('http://blog.csdn.net'+x)
    return art_urls
    def exit(self, exc_type, exc_val, exc_tb):
    pass

    with open() as f:

    page_num = 5
    page_url_base = 'http://blog.csdn.net/u014595019/article/list/'
    page_urls = [page_url_base + str(i+1) for i in range(page_num)]
    loop = asyncio.get_event_loop()
    ret_list = []
    tasks = [getPage(host,ret_list) for host in page_urls]
    loop.run_until_complete(asyncio.wait(tasks))

    articles_url = []
    for ret in ret_list:
    with parseListPage(ret) as tmp:
    articles_url += tmp
    ret_list = []

    tasks = [getPage(url, ret_list) for url in articles_url]
    loop.run_until_complete(asyncio.wait(tasks))
    loop.close()


    相关文章

      网友评论

        本文标题:Async与aiohttp介绍

        本文链接:https://www.haomeiwen.com/subject/snsubftx.html