美文网首页
Python模拟登录知乎

Python模拟登录知乎

作者: 24K男 | 来源:发表于2019-03-05 10:49 被阅读0次

之前写过一版Python模拟登录知乎,并抓取知乎某个问题下答案的帖子。随着时间的推进和知乎技术的变革,此前的代码已经不能正常登录了,本文章重新分析,并实现基于当前知乎的模拟登录。

1. 背景

时间:2019/3/5
Python:3.6
系统:Win10 Professional
登录:使用邮箱登录,默认不需要输入验证码。

2. 分析

我们首先在Chrome中进行正常登录,并查看正常登录所进行的网络活动。

注意勾选[Preserve log],防止网络请求被冲刷覆盖。

zhihu_login.png

通过上图我们清楚地看到了,我们在正常登录时的网络请求,而其中的核心就是sign_in,我们来看下sign_in的网络请求信息。

zhihu_sign_in.png

通过上图我们可以分析到,在登录知乎时,我们post的数据已经进行了加密,而不像之前的明文了,那么我们如何构造post的数据呢?

答案就是通过debug来看了。
我们看到了sign_in的完整的url为:https://www.zhihu.com/api/v3/oauth/sign_in,通过分析知乎登录页面,并没有找到了与该url相关的内容,那我们如何分析呢?

HTML页面没有,我们就从js里面找吧。


zhihu_find.png

找到了js剩下的就好办了,我们来登录一把,来看看这个js里面做了什么,关键要盯住body里面塞了什么数据。
而且根据分析可以知道,大概采用了zsEncrypt进行了加密。

zhihu_body.png

通过上图,你应该明白了,post data在未加密前的具体信息。

 {
    captcha: "" //不一定需要
    client_id: "c3cef7c66a1843f8b3a9e6a1e3160e20" // 固定值
    grant_type: "password" //认证类型
    lang: "cn"
    password: "pwwwdddd"//填写的密码
    ref_source: "homepage"//固定值
    signature: "4e96133ef904d338a25a3733e0944562441b9daf"// 需要构造的值
    source: "com.zhihu.web" //固定
    timestamp: 1551682889032 //时间戳
    username: "xxx@163.com"
    utm_source: ""
}

接下来的任务就是构造signature、timestamp了。

signature的由来,还是搜索js来查看signature是如何构造的。


zhihu_signature.png

可见signature就是由上面的几项构成了,并且进行了转码操作。
时间戳的构造比较简单:str(int(time.time()*1000))。

加密post data的接口懒得抽取了,借用了网上的大神的文件,抽取为zhihu.js.

ZhiHuSpider:

import requests
import re
import execjs
import time
import hmac
from hashlib import sha1
import os
import http.cookiejar

class ZhiHuSpider(object):

    def __init__(self):
        self.session = requests.session()
        self.headers = {
            'content-type': 'application/x-www-form-urlencoded',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
            'x-zse-83': '3_1.1'
        }
        # 建立LWPCookieJar实例,可以存Set-Cookie3类型的文件。
        # 而MozillaCookieJar类是存为'/.txt'格式的文件
        self.session.cookies = http.cookiejar.LWPCookieJar("cookie")
        # 若本地有cookie则不用再post数据了
        try:
            self.session.cookies.load(ignore_discard=True)
            print('Cookie加载成功!')
        except IOError:
            print('Cookie未能加载!')

    def login(self,username,password):

        # 请求login_url,udid_url,captcha_url加载所需要的cookie
        login_url = 'https://www.zhihu.com/signup?next=/'
        resp = self.session.get(login_url, headers=self.headers)        
        print("请求{},响应状态码:{}".format(login_url,resp.status_code)) 

        udid_url = 'https://www.zhihu.com/udid'
        resp = self.session.post(udid_url, headers=self.headers)
        print("请求{},响应状态码:{}".format(udid_url,resp.status_code)) 


        captcha_url = 'https://www.zhihu.com/api/v3/oauth/captcha?lang=en'
        resp = self.session.get(captcha_url, headers=self.headers)
        print("请求{},响应状态码:{}".format(captcha_url,resp.status_code)) 
       
        
        # 校验是否需要验证吗,需要则直接退出,还没遇到过需要验证码的
        if re.search('true',resp.text):
            print('需要验证码')
            exit()
        
        # 获取signature参数
        time_str = str(int(time.time()*1000))
        signature = self.get_signature()
        # print(signature)

        # 拼接需要加密的字符串
        string = "client_id=c3cef7c66a1843f8b3a9e6a1e3160e20\
        &grant_type=password\
        &timestamp={}\
        &source=com.zhihu.web\
        &signature={}\
        &username={}\
        &password={}\
        &captcha=\
        &lang=en\
        &ref_source=homepage\
        &utm_source=".format(time_str,signature,username,password)
        encrypt_string = self.encrypt(string)
  

        # post请求登陆接口
        post_url = "https://www.zhihu.com/api/v3/oauth/sign_in"
        resp = self.session.post(post_url, data=encrypt_string, headers=self.headers)
        print("请求{},响应状态码:{}".format(post_url,resp.status_code)) 
  

        # 校验是否登陆成功
        if re.search('user_id',resp.text):
            print('登陆成功')
            self.session.cookies.save();
        else:
            print("登陆失败")
            exit()

    def encrypt(self, string):
        file_path = os.path.dirname(__file__) + os.sep+'zhihu.js'
        with open(file_path, 'r', encoding='utf-8') as f:
            js = f.read()
        result = execjs.compile(js).call('encrypt', string)
        return result

    def get_signature(self):
        h = hmac.new(key='d1b964811afb40118a12068ff74a12f4'.encode('utf-8'), digestmod=sha1)
        grant_type = 'password'
        client_id = 'c3cef7c66a1843f8b3a9e6a1e3160e20'
        source = 'com.zhihu.web'
        now = self.time_str
        h.update((grant_type + client_id + source + now).encode('utf-8'))
        return h.hexdigest()
    
    def isLogin(self):
        # 通过查看用户个人信息来判断是否已经登录
        url = "https://www.zhihu.com/settings/account"
        # 禁止重定向,否则登录失败重定向到首页也是响应200
        login_code = self.session.get(
            url, headers=self.headers, allow_redirects=False, verify=False).status_code
        if login_code == 200:
            return True
        else:
            return False

       

if __name__ == "__main__":
    spider = ZhiHuSpider()
    if spider.isLogin():
        print('您已经登录。')
    else:
        username = input('输入账号:')
        password = input('输入密码:')
        spider.login(username, password)

zhihu.js

// window对象
window={
    'encodeURIComponent':encodeURIComponent
};
// navigator对象
navigator = {
    'userAgent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'
}
// atob函数
function atob(e){
    return new Buffer(e,'base64').toString('binary');
}
"use strict";
function s(e) {
    return (s = "function" == typeof Symbol && "symbol" == typeof Symbol.t ? function(e) {
        return typeof e
    }
    : function(e) {
        return e && "function" == typeof Symbol && e.constructor === Symbol && e !== Symbol.prototype ? "symbol" : typeof e
    }
    )(e)
}
function i() {}
function h(e) {
    this.s = (2048 & e) >> 11,
    this.i = (1536 & e) >> 9,
    this.h = 511 & e,
    this.A = 511 & e
}
function A(e) {
    this.i = (3072 & e) >> 10,
    this.A = 1023 & e
}
function n(e) {
    this.n = (3072 & e) >> 10,
    this.e = (768 & e) >> 8,
    this.a = (192 & e) >> 6,
    this.s = 63 & e
}
function e(e) {
    this.i = e >> 10 & 3,
    this.h = 1023 & e
}
function a() {}
function c(e) {
    this.n = (3072 & e) >> 10,
    this.e = (768 & e) >> 8,
    this.a = (192 & e) >> 6,
    this.s = 63 & e
}
function o(e) {
    this.A = (4095 & e) >> 2,
    this.s = 3 & e
}
function r(e) {
    this.i = e >> 10 & 3,
    this.h = e >> 2 & 255,
    this.s = 3 & e
}
function k(e) {
    this.s = (4095 & e) >> 10,
    this.i = (1023 & e) >> 8,
    this.h = 1023 & e,
    this.A = 63 & e
}
function B(e) {
    this.s = (4095 & e) >> 10,
    this.n = (1023 & e) >> 8,
    this.e = (255 & e) >> 6
}
function f(e) {
    this.i = (3072 & e) >> 10,
    this.A = 1023 & e
}
function u(e) {
    this.A = 4095 & e
}
function C(e) {
    this.i = (3072 & e) >> 10
}
function b(e) {
    this.A = 4095 & e
}
function g(e) {
    this.s = (3840 & e) >> 8,
    this.i = (192 & e) >> 6,
    this.h = 63 & e
}
function G() {
    this.c = [0, 0, 0, 0],
    this.o = 0,
    this.r = [],
    this.k = [],
    this.B = [],
    this.f = [],
    this.u = [],
    this.C = !1,
    this.b = [],
    this.g = [],
    this.G = !1,
    this.Q = null,
    this.R = null,
    this.w = [],
    this.x = 0,
    this.D = {
        0: i,
        1: h,
        2: A,
        3: n,
        4: e,
        5: a,
        6: c,
        7: o,
        8: r,
        9: k,
        10: B,
        11: f,
        12: u,
        13: C,
        14: b,
        15: g
    }
}
// Object.defineProperty(exports, "__esModule", {
//     value: !0
// });
var t = "1.1"
  , __g = {};
i.prototype.M = function(e) {
    e.G = !1
}
,
h.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.c[this.i] = this.h;
        break;
    case 1:
        e.c[this.i] = e.k[this.A]
    }
}
,
A.prototype.M = function(e) {
    e.k[this.A] = e.c[this.i]
}
,
n.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.c[this.n] = e.c[this.e] + e.c[this.a];
        break;
    case 1:
        e.c[this.n] = e.c[this.e] - e.c[this.a];
        break;
    case 2:
        e.c[this.n] = e.c[this.e] * e.c[this.a];
        break;
    case 3:
        e.c[this.n] = e.c[this.e] / e.c[this.a];
        break;
    case 4:
        e.c[this.n] = e.c[this.e] % e.c[this.a];
        break;
    case 5:
        e.c[this.n] = e.c[this.e] == e.c[this.a];
        break;
    case 6:
        e.c[this.n] = e.c[this.e] >= e.c[this.a];
        break;
    case 7:
        e.c[this.n] = e.c[this.e] || e.c[this.a];
        break;
    case 8:
        e.c[this.n] = e.c[this.e] && e.c[this.a];
        break;
    case 9:
        e.c[this.n] = e.c[this.e] !== e.c[this.a];
        break;
    case 10:
        e.c[this.n] = s(e.c[this.e]);
        break;
    case 11:
        e.c[this.n] = e.c[this.e]in e.c[this.a];
        break;
    case 12:
        e.c[this.n] = e.c[this.e] > e.c[this.a];
        break;
    case 13:
        e.c[this.n] = -e.c[this.e];
        break;
    case 14:
        e.c[this.n] = e.c[this.e] < e.c[this.a];
        break;
    case 15:
        e.c[this.n] = e.c[this.e] & e.c[this.a];
        break;
    case 16:
        e.c[this.n] = e.c[this.e] ^ e.c[this.a];
        break;
    case 17:
        e.c[this.n] = e.c[this.e] << e.c[this.a];
        break;
    case 18:
        e.c[this.n] = e.c[this.e] >>> e.c[this.a];
        break;
    case 19:
        e.c[this.n] = e.c[this.e] | e.c[this.a]
    }
}
,
e.prototype.M = function(e) {
    e.r.push(e.o),
    e.B.push(e.k),
    e.o = e.c[this.i],
    e.k = [];
    for (var t = 0; t < this.h; t++)
        e.k.unshift(e.f.pop());
    e.u.push(e.f),
    e.f = []
}
,
a.prototype.M = function(e) {
    e.o = e.r.pop(),
    e.k = e.B.pop(),
    e.f = e.u.pop()
}
,
c.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.C = e.c[this.n] >= e.c[this.e];
        break;
    case 1:
        e.C = e.c[this.n] <= e.c[this.e];
        break;
    case 2:
        e.C = e.c[this.n] > e.c[this.e];
        break;
    case 3:
        e.C = e.c[this.n] < e.c[this.e];
        break;
    case 4:
        e.C = e.c[this.n] == e.c[this.e];
        break;
    case 5:
        e.C = e.c[this.n] != e.c[this.e];
        break;
    case 6:
        e.C = e.c[this.n];
        break;
    case 7:
        e.C = !e.c[this.n]
    }
}
,
o.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.o = this.A;
        break;
    case 1:
        e.C && (e.o = this.A);
        break;
    case 2:
        e.C || (e.o = this.A);
        break;
    case 3:
        e.o = this.A,
        e.Q = null
    }
    e.C = !1
}
,
r.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        for (var t = [], n = 0; n < this.h; n++)
            t.unshift(e.f.pop());
        e.c[3] = e.c[this.i](t[0], t[1]);
        break;
    case 1:
        for (var r = e.f.pop(), o = [], i = 0; i < this.h; i++)
            o.unshift(e.f.pop());
        e.c[3] = e.c[this.i][r](o[0], o[1]);
        break;
    case 2:
        for (var a = [], c = 0; c < this.h; c++)
            a.unshift(e.f.pop());
        e.c[3] = new e.c[this.i](a[0],a[1])
    }
}
,
k.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        e.f.push(e.c[this.i]);
        break;
    case 1:
        e.f.push(this.h);
        break;
    case 2:
        e.f.push(e.k[this.A]);
        break;
    case 3:
        e.f.push(e.g[this.A])
    }
}
,
B.prototype.M = function(t) {
    switch (this.s) {
    case 0:
        var s = t.f.pop();
        t.c[this.n] = t.c[this.e][s];
        break;
    case 1:
        var i = t.f.pop()
          , h = t.f.pop();
        t.c[this.e][i] = h;
        break;
    case 2:
        var A = t.f.pop();
        t.c[this.n] = eval(A)
    }
}
,
f.prototype.M = function(e) {
    e.c[this.i] = e.g[this.A]
}
,
u.prototype.M = function(e) {
    e.Q = this.A
}
,
C.prototype.M = function(e) {
    throw e.c[this.i]
}
,
b.prototype.M = function(e) {
    var t = this
      , n = [0];
    e.k.forEach(function(e) {
        n.push(e)
    });
    var r = function(r) {
        var o = new G;
        return o.k = n,
        o.k[0] = r,
        o.J(e.b, t.A, e.g, e.w),
        o.c[3]
    };
    r.toString = function() {
        return "() { [native code] }"
    }
    ,
    e.c[3] = r
}
,
g.prototype.M = function(e) {
    switch (this.s) {
    case 0:
        for (var t = {}, n = 0; n < this.h; n++) {
            var r = e.f.pop();
            t[e.f.pop()] = r
        }
        e.c[this.i] = t;
        break;
    case 1:
        for (var o = [], i = 0; i < this.h; i++)
            o.unshift(e.f.pop());
        e.c[this.i] = o
    }
}
,
G.prototype.v = function(e) {
    for (var t = atob(e), n = [], r = 0; r < t.length - 1; r += 2)
        n.push(t.charCodeAt(r) << 8 | t.charCodeAt(r + 1));
    this.b = n
}
,
G.prototype.y = function(e) {
    for (var t = atob(e), n = 66, r = [], o = 0; o < t.length; o++) {
        var i = 24 ^ t.charCodeAt(o) ^ n;
        r.push(String.fromCharCode(i)),
        n = i
    }
    return r.join("")
}
,
G.prototype.F = function(e) {
    var t = this;
    this.g = e.map(function(e) {
        return "string" == typeof e ? t.y(e) : e
    })
}
,
G.prototype.J = function(e, t, n) {
    for (t = t || 0,
    n = n || [],
    this.o = t,
    "string" == typeof e ? (this.F(n),
    this.v(e)) : (this.b = e,
    this.g = n),
    this.G = !0,
    this.x = Date.now(); this.G; ) {
        var r = this.b[this.o++];
        if ("number" != typeof r)
            break;
        var o = Date.now();
        if (500 < o - this.x)
            return;
        this.x = o;
        try {
            this.M(r)
        } catch (e) {
            if (this.R = e,
            !this.Q)
                throw "execption at " + this.o + ": " + e;
            this.o = this.Q
        }
    }
}
,
G.prototype.M = function(e) {
    var t = (61440 & e) >> 12;
    new this.D[t](e).M(this)
}
,
"undefined" != typeof window && (new G).J("4AeTAJwAqACcAaQAAAAYAJAAnAKoAJwDgAWTACwAnAKoACACGAESOTRHkQAkAbAEIAMYAJwFoAASAzREJAQYBBIBNEVkBnCiGAC0BjRAJAAYBBICNEVkBnDGGAC0BzRAJACwCJAAnAmoAJwKoACcC4ABnAyMBRAAMwZgBnESsA0aADRAkQAkABgCnA6gABoCnA+hQDRHGAKcEKAAMQdgBnFasBEaADRAkQAkABgCnBKgABoCnBOhQDRHZAZxkrAUGgA0QJEAJAAYApwVoABgBnG6sBYaADRAkQAkABgCnBegAGAGceKwGBoANECRACQAnAmoAJwZoABgBnIOsBoaADRAkQAkABgCnBugABoCnByhQDRHZAZyRrAdGgA0QJEAJAAQACAFsB4gBhgAnAWgABIBNEEkBxgHEgA0RmQGdJoQCBoFFAE5gCgFFAQ5hDSCJAgYB5AAGACcH4AFGAEaCDRSEP8xDzMQIAkQCBoFFAE5gCgFFAQ5hDSCkQAkCBgBGgg0UhD/MQ+QACAIGAkaBxQBOYGSABoAnB+EBRoIN1AUCDmRNJMkCRAIGgUUATmAKAUUBDmENIKRACQIGAEaCDRSEP8xD5AAIAgYCRoHFAI5gZIAGgCcH4QFGgg3UBQQOZE0kyQJGAMaCRQ/OY+SABoGnCCEBTTAJAMYAxoJFAY5khI/Nk+RABoGnCCEBTTAJAMYAxoJFAw5khI/Nk+RABoGnCCEBTTAJAMYAxoJFBI5khI/Nk+RABoGnCCEBTTAJAMYBxIDNEEkB3JsHgNQAA==", 0, ["BRgg", "BSITFQkTERw=", "LQYfEhMA", "PxMVFBMZKB8DEjQaBQcZExMC", "", "NhETEQsE", "Whg=", "Wg==", "MhUcHRARDhg=", "NBcPBxYeDQMF", "Lx4ODys+GhMC", "LgM7OwAKDyk6Cg4=", "Mx8SGQUvMQ==", "SA==", "ORoVGCQgERcCAxo=", "BTcAERcCAxo=", "BRg3ABEXAgMaFAo=", "SQ==", "OA8LGBsP", "GC8LGBsP", "Tg==", "PxAcBQ==", "Tw==", "KRsJDgE=", "TA==", "LQofHg4DBwsP", "TQ==", "PhMaNCwZAxoUDQUeGQ==", "PhMaNCwZAxoUDQUeGTU0GQIeBRsYEQ8=", "Qg==", "BWpUGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZG1MbGR8ZGxkXGRFpGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZGw==", "ORMRCyk0Exk8LQ==", "ORMRCyst"]);
var Q = function(e) {
    return __g._encrypt(e)
};
// exports.ENCRYPT_VERSION = t,
// exports.default = Q
// console.log(Q("client_id=c3cef7c66a1843f8b3a9e6a1e3160e20&grant_type=password&timestamp=1551062570616&source=com.zhihu.web&signature=e3ab73425750a4dbcf9ab357f6030fc281ceeb22&username=819201111%40qq.com&password=123456&captcha=&lang=en&ref_source=homepage&utm_source="))
function encrypt(s){
    return Q(s);
}

相关文章

  • 知乎登录接口字段解密分析

    前言 现在是2019年10月,正在用python模拟知乎登录 随便在网上一查,模拟知乎登录时的接口都是这样的: 那...

  • 知乎--Python模拟登录

    运行环境 Python:python 3.6.5IDE:PyCharm 2018.1.2抓包工具:Charles ...

  • Python模拟知乎登录

    github地址:ZhihuLogin 最近工(xian)作(de)不(dan)忙(teng), 一直写Andro...

  • [Python] 模拟知乎登录

    前言:最近看到公众号python之禅里面的历史文章,模拟登录知乎,又看到很多人在网上尝试写代码,自己也想试试,最新...

  • Python模拟登录知乎

    之前写过一版Python模拟登录知乎,并抓取知乎某个问题下答案的帖子。随着时间的推进和知乎技术的变革,此前的代码已...

  • 聚焦Python分布式爬虫必学框架 Scrapy 打造搜索引擎

    selenium动态网页请求与模拟登录知乎 Selenium 架构图 Selenium python apihtt...

  • 使用Python模拟登录知乎

    环境和开发工具 Python3.6+macOS Firefox 模拟登录的过程 使用浏览器查看客户端和服务器间的通...

  • 最新python模拟登录知乎

    步骤: 第一步:抓包查看登陆接口 第二步:分析js文件,提取加密请求参数的js脚本 第三步:分析js文件,构造需要...

  • 模拟登陆存在问题

    学习Python爬虫(七)--Scrapy模拟登录的post模拟登陆后,自己写了模拟登陆知乎首页的代码。 测试后发现无效

  • 模拟知乎登录

    环境: python3.6+pycharm+windows7+requests 过程分析 首先请求一下'htt...

网友评论

      本文标题:Python模拟登录知乎

      本文链接:https://www.haomeiwen.com/subject/smknuqtx.html