requests.get(url) 与实际加载网页的元素不一致的

作者: realnickman | 来源:发表于2019-03-17 00:11 被阅读0次

requests.get(url) 与实际加载网页的元素不一致的
回味JS（十）浏览器相关
180726--BeautifulSoup撰写网络爬虫（2）--
iOS UIWebView 详解
小程序加载网页
Python 9
Python爬虫零散知识点笔记
Python requests 库
webview重定向次数太多错误 err too many RE
利用WebView加载文件的几种方式

这个小任务的目的主要是想命令行输入关键字然后自动打开前几位的google 搜索网页（简单版本的feeling lucky）。过程中发现requests.get()的页面和browser inspect的HTML元素是有差异的。。。比如我需要爬到的转向链接，即class= "iUh30"这个元素 (如图)，用bs4死活找不到...

利用chrome developer tool 去 inspect elements

那这个时候就需要设置好你的header中的UA以及用urllib.request 代替requests. Python3里面urllib2已经划入urllib中，所以直接导入urllib.request, 针对“SSL: CERTIFICATE_VERIFY_FAILED” Error的问题直接import ssl 然后利用gcontext = ssl.SSLContext() 来解决。代码如下：

#! python3
# opens several google page at once

import urllib.request
import requests, sys, webbrowser
from bs4 import BeautifulSoup
import ssl

print("googling...")

if len(sys.argv) > 1:
    kw = "+".join(sys.argv[1:])
    user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
    link = "https://www.google.com/search?q=" + kw
    print("link:", link)
    headers = {'User-Agent': user_agent}
    try:
        #res = requests.get(link) # 得到的HTML和浏览器直接访问不一致！
        request = urllib.request.Request(url=link, headers=headers)
        gcontext = ssl.SSLContext() # bypass “SSL: CERTIFICATE_VERIFY_FAILED” Error
        html = urllib.request.urlopen(request,context=gcontext).read()

        google_soup = BeautifulSoup(html,"html.parser")
        g_blocks = google_soup.select("cite.iUh30")

        for block in g_blocks:
            target_link = block.get_text()
            #webbrowser.open(target_link) # 直接打开：
            print(target_link)

    except Exception as err:
        print("something wrong:", err)
else:
    print("Please type search keywords as arguments: python3 xx.py keyword")

运行结果：

c18pxxx:Py4e Nick$ python3 webscraping-feelinglucky.py python
googling...
link: http://www.google.com/search?q=python
https://www.python.org/
https://en.wikipedia.org/wiki/Python_(programming_language)
https://sv.wikipedia.org/wiki/Python_(programspråk)
https://www.w3schools.com/python/
https://www.codecademy.com/learn/learn-python
https://www.tutorialspoint.com/python/