平时抓包的常规操作,通常按浏览器的F12调出浏览器自带的抓包工具,然后进行web请求分析,最后提取出请求头和参数使用,使用特定的语言如python或java编写代码抓包。这种做法很常规,但效率未免太慢。人类之所以能站在食物链的顶端,是因为会使用工具。下面我介绍一种快速、高效抓包写爬虫的方法。
首先,我们使用fiddler抓包,具体fiddler怎么抓包这就不说了,你可以搜索一下。如下图,我们登录简书首页,并抓到了简书首页的包。按如下图步骤操作,可以看到发送到简书的原生Raw请求。

然后,我们点击左边我们想查看的链接,将其拖动到右边。点击Composer,将左边的链接拖动到这里,然后可以先情况下左边的记录,点击execute,可以发现这时向简书发起了请求,多点几次发起多个请求,类似重放攻击。

OK,到上面这一步发起多次请求后,把请求的Raw参数复制,http的请求分为两个部分:请求头、请求体,中间用空行分隔,请求体可为空,如下所示。请求头的第一行为请求行,这里的请求体是为空的,但请求头和请求体中间的空行是不能省略的,如下最后一行是一个空行,不能省略。
GET https://www.jianshu.com/ HTTP/1.1
Host: www.jianshu.com
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8
Cookie: __yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564068289,1564155331,1564155349,1564155353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collection%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A%22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175
If-None-Match: W/"59ba10ba20969a6bd6d15c129cdd78f0"
我们看第一行是GET请求,请求的地址是https://www.jianshu.com/ ,将其设置在postman中,接着复制请求头除去请求行的内容,设置到hearders中,如下:

点击Bulk Edit后,将请求头出去请求行的内容,复制到这里

如上点击了Key-Value Edit后,可以发现,请求头都设置好了。

这时,点击蓝色的Send就可以发送请求了。
如果地址是https协议的,点击Send之后报错:,Could not get any response,那么你还需要设置一个地方,点击File=>Settings=>Generals这里SSL Certificate verification改为关闭即可,如下图,要关闭它。

请求成功后,就可以看到数据了。

至此,至少使用我们上面的参数,是可以调通目标地址的,这时,我们需要把他转为python代码或者java代码。如下操作,点击Code=> 再选择想要的语言。直接接可以运行代码了。

python的代码如下:
import requests
url = "https://www.jianshu.com/"
headers = {
'Host': "www.jianshu.com",
'Connection': "keep-alive",
'Cache-Control': "max-age=0",
'Upgrade-Insecure-Requests': "1",
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36",
'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
'Accept-Encoding': "gzip, deflate, br",
'Accept-Language': "zh-CN,zh;q=0.9,en;q=0.8",
'Cookie': "__yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564068289,1564155331,1564155349,1564155353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collection%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A%22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175, __yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564068289,1564155331,1564155349,1564155353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collection%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A%22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175; locale=zh-CN; read_mode=day; default_font=font2",
'If-None-Match': "W/"59ba10ba20969a6bd6d15c129cdd78f0"",
'Postman-Token': "f2d54dc2-67f6-451d-88a6-c5495bb8d2de,bd6acef5-9ef9-405f-888e-6d782198d06d",
'cache-control': "no-cache"
}
response = requests.request("GET", url, headers=headers)
print(response.text)
Java的代码如下:
OkHttpClient client = new OkHttpClient();
Request request = new Request.Builder()
.url("https://www.jianshu.com/")
.get()
.addHeader("Host", "www.jianshu.com")
.addHeader("Connection", "keep-alive")
.addHeader("Cache-Control", "max-age=0")
.addHeader("Upgrade-Insecure-Requests", "1")
.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36")
.addHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3")
.addHeader("Accept-Encoding", "gzip, deflate, br")
.addHeader("Accept-Language", "zh-CN,zh;q=0.9,en;q=0.8")
.addHeader("Cookie", "__yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564068289,1564155331,1564155349,1564155353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collection%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A%22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175, __yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564068289,1564155331,1564155349,1564155353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collection%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A%22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175; locale=zh-CN; read_mode=day; default_font=font2")
.addHeader("If-None-Match", "W/"59ba10ba20969a6bd6d15c129cdd78f0"")
.addHeader("Postman-Token", "f2d54dc2-67f6-451d-88a6-c5495bb8d2de,c5e2f8e3-61ba-4846-816d-1ab1118f3c2a")
.addHeader("cache-control", "no-cache")
.build();
Response response = client.newCall(request).execute();
本篇结束。
网友评论