美文网首页
像写情书般优雅的写代码

像写情书般优雅的写代码

作者: SevenBy | 来源:发表于2018-09-08 19:05 被阅读56次

    毕业离校到上海漂泊已经有些时日了,每天的生活还算正常,魔都标配的房租,日常生活消费,告诫来这里的每一位追求梦想的人儿不得不更加努力。加班的周末,写了一个简单的脚本,加了些注释,主要想让参考的同学了解些编写代码的规范。我们要像写情书般优雅的写代码。周末愉快!!

    # -*- coding: utf-8 -*-
    #!/usr/bin/python3
    import requests,re
    import json
    import pymongo
    from requests.exceptions import RequestException
    
    from multiprocessing import Pool
    #from pathos.multiprocessing import ProcessingPool as Pool
    
    '''
    request modle usage:
        1.set headers
        2.verify response.status_code==200
        3.GoodPlan: use RequestException
    '''
    def getOnePage(url):
        headers={
                'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
                'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36'
        }
        try:
            response=requests.get(url,headers=headers)
            if response.status_code==200:
                return response.text
            return None
        except ResquestException:
            return None
    '''
    regular matching item and set a generator
    '''
    def parseOnePage(html):
        pattern = re.compile('<dd>.*?board-index.*?>(\d+)</i>.*?data-src="(.*?)".*?name"><a'
                            +'.*?>(.*?)</a>.*?star">(.*?)</p>.*?releasetime">(.*?)</p>'
                            +'.*?integer">(.*?)</i>.*?fraction">(.*?)</i>.*?<dd>',re.S)
        items= re.findall(pattern,html)
        for item in items:
            print(item)
            yield{
                'index':item[0],
                'image':item[1],
                'title':item[2],
                'actor':item[3].strip()[3:],
                'time':item[4].strip()[5:],
                'score':item[5]+item[6]
            }
    '''
    write content to local file
    '''
    def writeToFile(content):
        with open('result.txt','a',encoding='utf-8') as wf:
            wf.write(json.dumps(content,ensure_ascii=False)+'\n')
            wf.flush()
    
    def main(offset):
        '''
        configure your Database
        '''
        myclient=pymongo.MongoClient('mongodb://192.168.88.114:27017/')
        dblist=myclient.database_names()
        mydb=myclient['home']
        mycol=mydb['maoyan']
        '''
        set uri and to get html
        '''
        url='https://maoyan.com/board/4?offset='+str(offset)
        html =getOnePage(url)
        
        '''
        insert data to database
        '''
        for item in parseOnePage(html):
            try:
                #writeToFile(item)
                mycol.insert_one(item)
            except:
                print("check you datebase configure")
    
    if __name__ == '__main__':
        '''
        create a thread pool 
        '''
        pool =Pool()
        pool.map(main,[i*10 for i in range(10)])
    
    

    下面则是记录一些Requests库使用中的一些需要注意的地方:
    1.遇到网站证书校验时,可以视情况使用如下方法解决问题。

    import requests
    from requests.packages import urllib3
    urllib3.disable_warnings()
    response =requests.get('https://www.12306.cn',verify=False)
    print(response.status_code)
    

    2.加载本地的证书文件

    import requests
    response = requests.get('https://www.12306.cn',cert=('/path/server.crt','/path/key'))
    print(response.status_code)
    

    3.设置代理

    import requests
    proxies={'http':"http://127.0.0.1:9743",'https':"https://127.0.0.1:9743"}
    response = requests.get('https://taobao.com',proxies=proxies)
    print(response.status_code)
    

    4 设置的代理有密码

    import requests
    proxies={'http':"http://user:password@127.0.0.1:9743",'https':"https://user:pw@127.0.0.1:9743"}
    response = requests.get('https://taobao.com',proxies=proxies)
    print(response.status_code)
    

    5.requests 设置socks5代理
    pip3 install "requests[socks]"

     import requests
    proxies={'http':"socks5://127.0.0.1:9742",'https':"socks5://127.0.0.1:9742"}
    response = requests.get('https://taobao.com',proxies=proxies)
    print(response.status_code)
    

    6.超时设置,捕获异常

     import requests
    from requests.exception import ReadTimeout
    try:
      response = requests.get('https://taobao.com',timeout=3)
      print(response.status_code)
    except ReadTimeout:
      print("timeout")
    

    7。认证设置

     import requests  
    from requests.auth import HTTPBasicAuth
    
    response = requests.get('https://taobao.com',auth=HTTPBasicAuth('user','123'))
    print(response.status_code)
    

    相关文章

      网友评论

          本文标题:像写情书般优雅的写代码

          本文链接:https://www.haomeiwen.com/subject/ikjlgftx.html