之前写过一版Python模拟登录知乎,并抓取知乎某个问题下答案的帖子。随着时间的推进和知乎技术的变革,此前的代码已经不能正常登录了,本文章重新分析,并实现基于当前知乎的模拟登录。
1. 背景
时间:2019/3/5
Python:3.6
系统:Win10 Professional
登录:使用邮箱登录,默认不需要输入验证码。
2. 分析
我们首先在Chrome中进行正常登录,并查看正常登录所进行的网络活动。
zhihu_login.png注意勾选[Preserve log],防止网络请求被冲刷覆盖。
通过上图我们清楚地看到了,我们在正常登录时的网络请求,而其中的核心就是sign_in,我们来看下sign_in的网络请求信息。
zhihu_sign_in.png通过上图我们可以分析到,在登录知乎时,我们post的数据已经进行了加密,而不像之前的明文了,那么我们如何构造post的数据呢?
答案就是通过debug来看了。
我们看到了sign_in的完整的url为:https://www.zhihu.com/api/v3/oauth/sign_in,通过分析知乎登录页面,并没有找到了与该url相关的内容,那我们如何分析呢?
HTML页面没有,我们就从js里面找吧。
zhihu_find.png
找到了js剩下的就好办了,我们来登录一把,来看看这个js里面做了什么,关键要盯住body里面塞了什么数据。
而且根据分析可以知道,大概采用了zsEncrypt进行了加密。
通过上图,你应该明白了,post data在未加密前的具体信息。
{
captcha: "" //不一定需要
client_id: "c3cef7c66a1843f8b3a9e6a1e3160e20" // 固定值
grant_type: "password" //认证类型
lang: "cn"
password: "pwwwdddd"//填写的密码
ref_source: "homepage"//固定值
signature: "4e96133ef904d338a25a3733e0944562441b9daf"// 需要构造的值
source: "com.zhihu.web" //固定
timestamp: 1551682889032 //时间戳
username: "xxx@163.com"
utm_source: ""
}
接下来的任务就是构造signature、timestamp了。
signature的由来,还是搜索js来查看signature是如何构造的。
zhihu_signature.png
可见signature就是由上面的几项构成了,并且进行了转码操作。
时间戳的构造比较简单:str(int(time.time()*1000))。
加密post data的接口懒得抽取了,借用了网上的大神的文件,抽取为zhihu.js.
ZhiHuSpider:
import requests
import re
import execjs
import time
import hmac
from hashlib import sha1
import os
import http.cookiejar
class ZhiHuSpider(object):
def __init__(self):
self.session = requests.session()
self.headers = {
'content-type': 'application/x-www-form-urlencoded',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
'x-zse-83': '3_1.1'
}
# 建立LWPCookieJar实例,可以存Set-Cookie3类型的文件。
# 而MozillaCookieJar类是存为'/.txt'格式的文件
self.session.cookies = http.cookiejar.LWPCookieJar("cookie")
# 若本地有cookie则不用再post数据了
try:
self.session.cookies.load(ignore_discard=True)
print('Cookie加载成功!')
except IOError:
print('Cookie未能加载!')
def login(self,username,password):
# 请求login_url,udid_url,captcha_url加载所需要的cookie
login_url = 'https://www.zhihu.com/signup?next=/'
resp = self.session.get(login_url, headers=self.headers)
print("请求{},响应状态码:{}".format(login_url,resp.status_code))
udid_url = 'https://www.zhihu.com/udid'
resp = self.session.post(udid_url, headers=self.headers)
print("请求{},响应状态码:{}".format(udid_url,resp.status_code))
captcha_url = 'https://www.zhihu.com/api/v3/oauth/captcha?lang=en'
resp = self.session.get(captcha_url, headers=self.headers)
print("请求{},响应状态码:{}".format(captcha_url,resp.status_code))
# 校验是否需要验证吗,需要则直接退出,还没遇到过需要验证码的
if re.search('true',resp.text):
print('需要验证码')
exit()
# 获取signature参数
time_str = str(int(time.time()*1000))
signature = self.get_signature()
# print(signature)
# 拼接需要加密的字符串
string = "client_id=c3cef7c66a1843f8b3a9e6a1e3160e20\
&grant_type=password\
×tamp={}\
&source=com.zhihu.web\
&signature={}\
&username={}\
&password={}\
&captcha=\
&lang=en\
&ref_source=homepage\
&utm_source=".format(time_str,signature,username,password)
encrypt_string = self.encrypt(string)
# post请求登陆接口
post_url = "https://www.zhihu.com/api/v3/oauth/sign_in"
resp = self.session.post(post_url, data=encrypt_string, headers=self.headers)
print("请求{},响应状态码:{}".format(post_url,resp.status_code))
# 校验是否登陆成功
if re.search('user_id',resp.text):
print('登陆成功')
self.session.cookies.save();
else:
print("登陆失败")
exit()
def encrypt(self, string):
file_path = os.path.dirname(__file__) + os.sep+'zhihu.js'
with open(file_path, 'r', encoding='utf-8') as f:
js = f.read()
result = execjs.compile(js).call('encrypt', string)
return result
def get_signature(self):
h = hmac.new(key='d1b964811afb40118a12068ff74a12f4'.encode('utf-8'), digestmod=sha1)
grant_type = 'password'
client_id = 'c3cef7c66a1843f8b3a9e6a1e3160e20'
source = 'com.zhihu.web'
now = self.time_str
h.update((grant_type + client_id + source + now).encode('utf-8'))
return h.hexdigest()
def isLogin(self):
# 通过查看用户个人信息来判断是否已经登录
url = "https://www.zhihu.com/settings/account"
# 禁止重定向,否则登录失败重定向到首页也是响应200
login_code = self.session.get(
url, headers=self.headers, allow_redirects=False, verify=False).status_code
if login_code == 200:
return True
else:
return False
if __name__ == "__main__":
spider = ZhiHuSpider()
if spider.isLogin():
print('您已经登录。')
else:
username = input('输入账号:')
password = input('输入密码:')
spider.login(username, password)
zhihu.js
// window对象
window={
'encodeURIComponent':encodeURIComponent
};
// navigator对象
navigator = {
'userAgent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'
}
// atob函数
function atob(e){
return new Buffer(e,'base64').toString('binary');
}
"use strict";
function s(e) {
return (s = "function" == typeof Symbol && "symbol" == typeof Symbol.t ? function(e) {
return typeof e
}
: function(e) {
return e && "function" == typeof Symbol && e.constructor === Symbol && e !== Symbol.prototype ? "symbol" : typeof e
}
)(e)
}
function i() {}
function h(e) {
this.s = (2048 & e) >> 11,
this.i = (1536 & e) >> 9,
this.h = 511 & e,
this.A = 511 & e
}
function A(e) {
this.i = (3072 & e) >> 10,
this.A = 1023 & e
}
function n(e) {
this.n = (3072 & e) >> 10,
this.e = (768 & e) >> 8,
this.a = (192 & e) >> 6,
this.s = 63 & e
}
function e(e) {
this.i = e >> 10 & 3,
this.h = 1023 & e
}
function a() {}
function c(e) {
this.n = (3072 & e) >> 10,
this.e = (768 & e) >> 8,
this.a = (192 & e) >> 6,
this.s = 63 & e
}
function o(e) {
this.A = (4095 & e) >> 2,
this.s = 3 & e
}
function r(e) {
this.i = e >> 10 & 3,
this.h = e >> 2 & 255,
this.s = 3 & e
}
function k(e) {
this.s = (4095 & e) >> 10,
this.i = (1023 & e) >> 8,
this.h = 1023 & e,
this.A = 63 & e
}
function B(e) {
this.s = (4095 & e) >> 10,
this.n = (1023 & e) >> 8,
this.e = (255 & e) >> 6
}
function f(e) {
this.i = (3072 & e) >> 10,
this.A = 1023 & e
}
function u(e) {
this.A = 4095 & e
}
function C(e) {
this.i = (3072 & e) >> 10
}
function b(e) {
this.A = 4095 & e
}
function g(e) {
this.s = (3840 & e) >> 8,
this.i = (192 & e) >> 6,
this.h = 63 & e
}
function G() {
this.c = [0, 0, 0, 0],
this.o = 0,
this.r = [],
this.k = [],
this.B = [],
this.f = [],
this.u = [],
this.C = !1,
this.b = [],
this.g = [],
this.G = !1,
this.Q = null,
this.R = null,
this.w = [],
this.x = 0,
this.D = {
0: i,
1: h,
2: A,
3: n,
4: e,
5: a,
6: c,
7: o,
8: r,
9: k,
10: B,
11: f,
12: u,
13: C,
14: b,
15: g
}
}
// Object.defineProperty(exports, "__esModule", {
// value: !0
// });
var t = "1.1"
, __g = {};
i.prototype.M = function(e) {
e.G = !1
}
,
h.prototype.M = function(e) {
switch (this.s) {
case 0:
e.c[this.i] = this.h;
break;
case 1:
e.c[this.i] = e.k[this.A]
}
}
,
A.prototype.M = function(e) {
e.k[this.A] = e.c[this.i]
}
,
n.prototype.M = function(e) {
switch (this.s) {
case 0:
e.c[this.n] = e.c[this.e] + e.c[this.a];
break;
case 1:
e.c[this.n] = e.c[this.e] - e.c[this.a];
break;
case 2:
e.c[this.n] = e.c[this.e] * e.c[this.a];
break;
case 3:
e.c[this.n] = e.c[this.e] / e.c[this.a];
break;
case 4:
e.c[this.n] = e.c[this.e] % e.c[this.a];
break;
case 5:
e.c[this.n] = e.c[this.e] == e.c[this.a];
break;
case 6:
e.c[this.n] = e.c[this.e] >= e.c[this.a];
break;
case 7:
e.c[this.n] = e.c[this.e] || e.c[this.a];
break;
case 8:
e.c[this.n] = e.c[this.e] && e.c[this.a];
break;
case 9:
e.c[this.n] = e.c[this.e] !== e.c[this.a];
break;
case 10:
e.c[this.n] = s(e.c[this.e]);
break;
case 11:
e.c[this.n] = e.c[this.e]in e.c[this.a];
break;
case 12:
e.c[this.n] = e.c[this.e] > e.c[this.a];
break;
case 13:
e.c[this.n] = -e.c[this.e];
break;
case 14:
e.c[this.n] = e.c[this.e] < e.c[this.a];
break;
case 15:
e.c[this.n] = e.c[this.e] & e.c[this.a];
break;
case 16:
e.c[this.n] = e.c[this.e] ^ e.c[this.a];
break;
case 17:
e.c[this.n] = e.c[this.e] << e.c[this.a];
break;
case 18:
e.c[this.n] = e.c[this.e] >>> e.c[this.a];
break;
case 19:
e.c[this.n] = e.c[this.e] | e.c[this.a]
}
}
,
e.prototype.M = function(e) {
e.r.push(e.o),
e.B.push(e.k),
e.o = e.c[this.i],
e.k = [];
for (var t = 0; t < this.h; t++)
e.k.unshift(e.f.pop());
e.u.push(e.f),
e.f = []
}
,
a.prototype.M = function(e) {
e.o = e.r.pop(),
e.k = e.B.pop(),
e.f = e.u.pop()
}
,
c.prototype.M = function(e) {
switch (this.s) {
case 0:
e.C = e.c[this.n] >= e.c[this.e];
break;
case 1:
e.C = e.c[this.n] <= e.c[this.e];
break;
case 2:
e.C = e.c[this.n] > e.c[this.e];
break;
case 3:
e.C = e.c[this.n] < e.c[this.e];
break;
case 4:
e.C = e.c[this.n] == e.c[this.e];
break;
case 5:
e.C = e.c[this.n] != e.c[this.e];
break;
case 6:
e.C = e.c[this.n];
break;
case 7:
e.C = !e.c[this.n]
}
}
,
o.prototype.M = function(e) {
switch (this.s) {
case 0:
e.o = this.A;
break;
case 1:
e.C && (e.o = this.A);
break;
case 2:
e.C || (e.o = this.A);
break;
case 3:
e.o = this.A,
e.Q = null
}
e.C = !1
}
,
r.prototype.M = function(e) {
switch (this.s) {
case 0:
for (var t = [], n = 0; n < this.h; n++)
t.unshift(e.f.pop());
e.c[3] = e.c[this.i](t[0], t[1]);
break;
case 1:
for (var r = e.f.pop(), o = [], i = 0; i < this.h; i++)
o.unshift(e.f.pop());
e.c[3] = e.c[this.i][r](o[0], o[1]);
break;
case 2:
for (var a = [], c = 0; c < this.h; c++)
a.unshift(e.f.pop());
e.c[3] = new e.c[this.i](a[0],a[1])
}
}
,
k.prototype.M = function(e) {
switch (this.s) {
case 0:
e.f.push(e.c[this.i]);
break;
case 1:
e.f.push(this.h);
break;
case 2:
e.f.push(e.k[this.A]);
break;
case 3:
e.f.push(e.g[this.A])
}
}
,
B.prototype.M = function(t) {
switch (this.s) {
case 0:
var s = t.f.pop();
t.c[this.n] = t.c[this.e][s];
break;
case 1:
var i = t.f.pop()
, h = t.f.pop();
t.c[this.e][i] = h;
break;
case 2:
var A = t.f.pop();
t.c[this.n] = eval(A)
}
}
,
f.prototype.M = function(e) {
e.c[this.i] = e.g[this.A]
}
,
u.prototype.M = function(e) {
e.Q = this.A
}
,
C.prototype.M = function(e) {
throw e.c[this.i]
}
,
b.prototype.M = function(e) {
var t = this
, n = [0];
e.k.forEach(function(e) {
n.push(e)
});
var r = function(r) {
var o = new G;
return o.k = n,
o.k[0] = r,
o.J(e.b, t.A, e.g, e.w),
o.c[3]
};
r.toString = function() {
return "() { [native code] }"
}
,
e.c[3] = r
}
,
g.prototype.M = function(e) {
switch (this.s) {
case 0:
for (var t = {}, n = 0; n < this.h; n++) {
var r = e.f.pop();
t[e.f.pop()] = r
}
e.c[this.i] = t;
break;
case 1:
for (var o = [], i = 0; i < this.h; i++)
o.unshift(e.f.pop());
e.c[this.i] = o
}
}
,
G.prototype.v = function(e) {
for (var t = atob(e), n = [], r = 0; r < t.length - 1; r += 2)
n.push(t.charCodeAt(r) << 8 | t.charCodeAt(r + 1));
this.b = n
}
,
G.prototype.y = function(e) {
for (var t = atob(e), n = 66, r = [], o = 0; o < t.length; o++) {
var i = 24 ^ t.charCodeAt(o) ^ n;
r.push(String.fromCharCode(i)),
n = i
}
return r.join("")
}
,
G.prototype.F = function(e) {
var t = this;
this.g = e.map(function(e) {
return "string" == typeof e ? t.y(e) : e
})
}
,
G.prototype.J = function(e, t, n) {
for (t = t || 0,
n = n || [],
this.o = t,
"string" == typeof e ? (this.F(n),
this.v(e)) : (this.b = e,
this.g = n),
this.G = !0,
this.x = Date.now(); this.G; ) {
var r = this.b[this.o++];
if ("number" != typeof r)
break;
var o = Date.now();
if (500 < o - this.x)
return;
this.x = o;
try {
this.M(r)
} catch (e) {
if (this.R = e,
!this.Q)
throw "execption at " + this.o + ": " + e;
this.o = this.Q
}
}
}
,
G.prototype.M = function(e) {
var t = (61440 & e) >> 12;
new this.D[t](e).M(this)
}
,
"undefined" != typeof window && (new G).J("4AeTAJwAqACcAaQAAAAYAJAAnAKoAJwDgAWTACwAnAKoACACGAESOTRHkQAkAbAEIAMYAJwFoAASAzREJAQYBBIBNEVkBnCiGAC0BjRAJAAYBBICNEVkBnDGGAC0BzRAJACwCJAAnAmoAJwKoACcC4ABnAyMBRAAMwZgBnESsA0aADRAkQAkABgCnA6gABoCnA+hQDRHGAKcEKAAMQdgBnFasBEaADRAkQAkABgCnBKgABoCnBOhQDRHZAZxkrAUGgA0QJEAJAAYApwVoABgBnG6sBYaADRAkQAkABgCnBegAGAGceKwGBoANECRACQAnAmoAJwZoABgBnIOsBoaADRAkQAkABgCnBugABoCnByhQDRHZAZyRrAdGgA0QJEAJAAQACAFsB4gBhgAnAWgABIBNEEkBxgHEgA0RmQGdJoQCBoFFAE5gCgFFAQ5hDSCJAgYB5AAGACcH4AFGAEaCDRSEP8xDzMQIAkQCBoFFAE5gCgFFAQ5hDSCkQAkCBgBGgg0UhD/MQ+QACAIGAkaBxQBOYGSABoAnB+EBRoIN1AUCDmRNJMkCRAIGgUUATmAKAUUBDmENIKRACQIGAEaCDRSEP8xD5AAIAgYCRoHFAI5gZIAGgCcH4QFGgg3UBQQOZE0kyQJGAMaCRQ/OY+SABoGnCCEBTTAJAMYAxoJFAY5khI/Nk+RABoGnCCEBTTAJAMYAxoJFAw5khI/Nk+RABoGnCCEBTTAJAMYAxoJFBI5khI/Nk+RABoGnCCEBTTAJAMYBxIDNEEkB3JsHgNQAA==", 0, ["BRgg", "BSITFQkTERw=", "LQYfEhMA", "PxMVFBMZKB8DEjQaBQcZExMC", "", "NhETEQsE", "Whg=", "Wg==", "MhUcHRARDhg=", "NBcPBxYeDQMF", "Lx4ODys+GhMC", "LgM7OwAKDyk6Cg4=", "Mx8SGQUvMQ==", "SA==", "ORoVGCQgERcCAxo=", "BTcAERcCAxo=", "BRg3ABEXAgMaFAo=", "SQ==", "OA8LGBsP", "GC8LGBsP", "Tg==", "PxAcBQ==", "Tw==", "KRsJDgE=", "TA==", "LQofHg4DBwsP", "TQ==", "PhMaNCwZAxoUDQUeGQ==", "PhMaNCwZAxoUDQUeGTU0GQIeBRsYEQ8=", "Qg==", "BWpUGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZG1MbGR8ZGxkXGRFpGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZGw==", "ORMRCyk0Exk8LQ==", "ORMRCyst"]);
var Q = function(e) {
return __g._encrypt(e)
};
// exports.ENCRYPT_VERSION = t,
// exports.default = Q
// console.log(Q("client_id=c3cef7c66a1843f8b3a9e6a1e3160e20&grant_type=password×tamp=1551062570616&source=com.zhihu.web&signature=e3ab73425750a4dbcf9ab357f6030fc281ceeb22&username=819201111%40qq.com&password=123456&captcha=&lang=en&ref_source=homepage&utm_source="))
function encrypt(s){
return Q(s);
}
网友评论