美文网首页
Python模拟登录知乎

Python模拟登录知乎

作者: 24K男 | 来源:发表于2019-03-05 10:49 被阅读0次

    之前写过一版Python模拟登录知乎,并抓取知乎某个问题下答案的帖子。随着时间的推进和知乎技术的变革,此前的代码已经不能正常登录了,本文章重新分析,并实现基于当前知乎的模拟登录。

    1. 背景

    时间:2019/3/5
    Python:3.6
    系统:Win10 Professional
    登录:使用邮箱登录,默认不需要输入验证码。

    2. 分析

    我们首先在Chrome中进行正常登录,并查看正常登录所进行的网络活动。

    注意勾选[Preserve log],防止网络请求被冲刷覆盖。

    zhihu_login.png

    通过上图我们清楚地看到了,我们在正常登录时的网络请求,而其中的核心就是sign_in,我们来看下sign_in的网络请求信息。

    zhihu_sign_in.png

    通过上图我们可以分析到,在登录知乎时,我们post的数据已经进行了加密,而不像之前的明文了,那么我们如何构造post的数据呢?

    答案就是通过debug来看了。
    我们看到了sign_in的完整的url为:https://www.zhihu.com/api/v3/oauth/sign_in,通过分析知乎登录页面,并没有找到了与该url相关的内容,那我们如何分析呢?

    HTML页面没有,我们就从js里面找吧。


    zhihu_find.png

    找到了js剩下的就好办了,我们来登录一把,来看看这个js里面做了什么,关键要盯住body里面塞了什么数据。
    而且根据分析可以知道,大概采用了zsEncrypt进行了加密。

    zhihu_body.png

    通过上图,你应该明白了,post data在未加密前的具体信息。

     {
        captcha: "" //不一定需要
        client_id: "c3cef7c66a1843f8b3a9e6a1e3160e20" // 固定值
        grant_type: "password" //认证类型
        lang: "cn"
        password: "pwwwdddd"//填写的密码
        ref_source: "homepage"//固定值
        signature: "4e96133ef904d338a25a3733e0944562441b9daf"// 需要构造的值
        source: "com.zhihu.web" //固定
        timestamp: 1551682889032 //时间戳
        username: "xxx@163.com"
        utm_source: ""
    }
    

    接下来的任务就是构造signature、timestamp了。

    signature的由来,还是搜索js来查看signature是如何构造的。


    zhihu_signature.png

    可见signature就是由上面的几项构成了,并且进行了转码操作。
    时间戳的构造比较简单:str(int(time.time()*1000))。

    加密post data的接口懒得抽取了,借用了网上的大神的文件,抽取为zhihu.js.

    ZhiHuSpider:

    import requests
    import re
    import execjs
    import time
    import hmac
    from hashlib import sha1
    import os
    import http.cookiejar
    
    class ZhiHuSpider(object):
    
        def __init__(self):
            self.session = requests.session()
            self.headers = {
                'content-type': 'application/x-www-form-urlencoded',
                'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
                'x-zse-83': '3_1.1'
            }
            # 建立LWPCookieJar实例,可以存Set-Cookie3类型的文件。
            # 而MozillaCookieJar类是存为'/.txt'格式的文件
            self.session.cookies = http.cookiejar.LWPCookieJar("cookie")
            # 若本地有cookie则不用再post数据了
            try:
                self.session.cookies.load(ignore_discard=True)
                print('Cookie加载成功!')
            except IOError:
                print('Cookie未能加载!')
    
        def login(self,username,password):
    
            # 请求login_url,udid_url,captcha_url加载所需要的cookie
            login_url = 'https://www.zhihu.com/signup?next=/'
            resp = self.session.get(login_url, headers=self.headers)        
            print("请求{},响应状态码:{}".format(login_url,resp.status_code)) 
    
            udid_url = 'https://www.zhihu.com/udid'
            resp = self.session.post(udid_url, headers=self.headers)
            print("请求{},响应状态码:{}".format(udid_url,resp.status_code)) 
    
    
            captcha_url = 'https://www.zhihu.com/api/v3/oauth/captcha?lang=en'
            resp = self.session.get(captcha_url, headers=self.headers)
            print("请求{},响应状态码:{}".format(captcha_url,resp.status_code)) 
           
            
            # 校验是否需要验证吗,需要则直接退出,还没遇到过需要验证码的
            if re.search('true',resp.text):
                print('需要验证码')
                exit()
            
            # 获取signature参数
            time_str = str(int(time.time()*1000))
            signature = self.get_signature()
            # print(signature)
    
            # 拼接需要加密的字符串
            string = "client_id=c3cef7c66a1843f8b3a9e6a1e3160e20\
            &grant_type=password\
            &timestamp={}\
            &source=com.zhihu.web\
            &signature={}\
            &username={}\
            &password={}\
            &captcha=\
            &lang=en\
            &ref_source=homepage\
            &utm_source=".format(time_str,signature,username,password)
            encrypt_string = self.encrypt(string)
      
    
            # post请求登陆接口
            post_url = "https://www.zhihu.com/api/v3/oauth/sign_in"
            resp = self.session.post(post_url, data=encrypt_string, headers=self.headers)
            print("请求{},响应状态码:{}".format(post_url,resp.status_code)) 
      
    
            # 校验是否登陆成功
            if re.search('user_id',resp.text):
                print('登陆成功')
                self.session.cookies.save();
            else:
                print("登陆失败")
                exit()
    
        def encrypt(self, string):
            file_path = os.path.dirname(__file__) + os.sep+'zhihu.js'
            with open(file_path, 'r', encoding='utf-8') as f:
                js = f.read()
            result = execjs.compile(js).call('encrypt', string)
            return result
    
        def get_signature(self):
            h = hmac.new(key='d1b964811afb40118a12068ff74a12f4'.encode('utf-8'), digestmod=sha1)
            grant_type = 'password'
            client_id = 'c3cef7c66a1843f8b3a9e6a1e3160e20'
            source = 'com.zhihu.web'
            now = self.time_str
            h.update((grant_type + client_id + source + now).encode('utf-8'))
            return h.hexdigest()
        
        def isLogin(self):
            # 通过查看用户个人信息来判断是否已经登录
            url = "https://www.zhihu.com/settings/account"
            # 禁止重定向,否则登录失败重定向到首页也是响应200
            login_code = self.session.get(
                url, headers=self.headers, allow_redirects=False, verify=False).status_code
            if login_code == 200:
                return True
            else:
                return False
    
           
    
    if __name__ == "__main__":
        spider = ZhiHuSpider()
        if spider.isLogin():
            print('您已经登录。')
        else:
            username = input('输入账号:')
            password = input('输入密码:')
            spider.login(username, password)
    

    zhihu.js

    // window对象
    window={
        'encodeURIComponent':encodeURIComponent
    };
    // navigator对象
    navigator = {
        'userAgent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'
    }
    // atob函数
    function atob(e){
        return new Buffer(e,'base64').toString('binary');
    }
    "use strict";
    function s(e) {
        return (s = "function" == typeof Symbol && "symbol" == typeof Symbol.t ? function(e) {
            return typeof e
        }
        : function(e) {
            return e && "function" == typeof Symbol && e.constructor === Symbol && e !== Symbol.prototype ? "symbol" : typeof e
        }
        )(e)
    }
    function i() {}
    function h(e) {
        this.s = (2048 & e) >> 11,
        this.i = (1536 & e) >> 9,
        this.h = 511 & e,
        this.A = 511 & e
    }
    function A(e) {
        this.i = (3072 & e) >> 10,
        this.A = 1023 & e
    }
    function n(e) {
        this.n = (3072 & e) >> 10,
        this.e = (768 & e) >> 8,
        this.a = (192 & e) >> 6,
        this.s = 63 & e
    }
    function e(e) {
        this.i = e >> 10 & 3,
        this.h = 1023 & e
    }
    function a() {}
    function c(e) {
        this.n = (3072 & e) >> 10,
        this.e = (768 & e) >> 8,
        this.a = (192 & e) >> 6,
        this.s = 63 & e
    }
    function o(e) {
        this.A = (4095 & e) >> 2,
        this.s = 3 & e
    }
    function r(e) {
        this.i = e >> 10 & 3,
        this.h = e >> 2 & 255,
        this.s = 3 & e
    }
    function k(e) {
        this.s = (4095 & e) >> 10,
        this.i = (1023 & e) >> 8,
        this.h = 1023 & e,
        this.A = 63 & e
    }
    function B(e) {
        this.s = (4095 & e) >> 10,
        this.n = (1023 & e) >> 8,
        this.e = (255 & e) >> 6
    }
    function f(e) {
        this.i = (3072 & e) >> 10,
        this.A = 1023 & e
    }
    function u(e) {
        this.A = 4095 & e
    }
    function C(e) {
        this.i = (3072 & e) >> 10
    }
    function b(e) {
        this.A = 4095 & e
    }
    function g(e) {
        this.s = (3840 & e) >> 8,
        this.i = (192 & e) >> 6,
        this.h = 63 & e
    }
    function G() {
        this.c = [0, 0, 0, 0],
        this.o = 0,
        this.r = [],
        this.k = [],
        this.B = [],
        this.f = [],
        this.u = [],
        this.C = !1,
        this.b = [],
        this.g = [],
        this.G = !1,
        this.Q = null,
        this.R = null,
        this.w = [],
        this.x = 0,
        this.D = {
            0: i,
            1: h,
            2: A,
            3: n,
            4: e,
            5: a,
            6: c,
            7: o,
            8: r,
            9: k,
            10: B,
            11: f,
            12: u,
            13: C,
            14: b,
            15: g
        }
    }
    // Object.defineProperty(exports, "__esModule", {
    //     value: !0
    // });
    var t = "1.1"
      , __g = {};
    i.prototype.M = function(e) {
        e.G = !1
    }
    ,
    h.prototype.M = function(e) {
        switch (this.s) {
        case 0:
            e.c[this.i] = this.h;
            break;
        case 1:
            e.c[this.i] = e.k[this.A]
        }
    }
    ,
    A.prototype.M = function(e) {
        e.k[this.A] = e.c[this.i]
    }
    ,
    n.prototype.M = function(e) {
        switch (this.s) {
        case 0:
            e.c[this.n] = e.c[this.e] + e.c[this.a];
            break;
        case 1:
            e.c[this.n] = e.c[this.e] - e.c[this.a];
            break;
        case 2:
            e.c[this.n] = e.c[this.e] * e.c[this.a];
            break;
        case 3:
            e.c[this.n] = e.c[this.e] / e.c[this.a];
            break;
        case 4:
            e.c[this.n] = e.c[this.e] % e.c[this.a];
            break;
        case 5:
            e.c[this.n] = e.c[this.e] == e.c[this.a];
            break;
        case 6:
            e.c[this.n] = e.c[this.e] >= e.c[this.a];
            break;
        case 7:
            e.c[this.n] = e.c[this.e] || e.c[this.a];
            break;
        case 8:
            e.c[this.n] = e.c[this.e] && e.c[this.a];
            break;
        case 9:
            e.c[this.n] = e.c[this.e] !== e.c[this.a];
            break;
        case 10:
            e.c[this.n] = s(e.c[this.e]);
            break;
        case 11:
            e.c[this.n] = e.c[this.e]in e.c[this.a];
            break;
        case 12:
            e.c[this.n] = e.c[this.e] > e.c[this.a];
            break;
        case 13:
            e.c[this.n] = -e.c[this.e];
            break;
        case 14:
            e.c[this.n] = e.c[this.e] < e.c[this.a];
            break;
        case 15:
            e.c[this.n] = e.c[this.e] & e.c[this.a];
            break;
        case 16:
            e.c[this.n] = e.c[this.e] ^ e.c[this.a];
            break;
        case 17:
            e.c[this.n] = e.c[this.e] << e.c[this.a];
            break;
        case 18:
            e.c[this.n] = e.c[this.e] >>> e.c[this.a];
            break;
        case 19:
            e.c[this.n] = e.c[this.e] | e.c[this.a]
        }
    }
    ,
    e.prototype.M = function(e) {
        e.r.push(e.o),
        e.B.push(e.k),
        e.o = e.c[this.i],
        e.k = [];
        for (var t = 0; t < this.h; t++)
            e.k.unshift(e.f.pop());
        e.u.push(e.f),
        e.f = []
    }
    ,
    a.prototype.M = function(e) {
        e.o = e.r.pop(),
        e.k = e.B.pop(),
        e.f = e.u.pop()
    }
    ,
    c.prototype.M = function(e) {
        switch (this.s) {
        case 0:
            e.C = e.c[this.n] >= e.c[this.e];
            break;
        case 1:
            e.C = e.c[this.n] <= e.c[this.e];
            break;
        case 2:
            e.C = e.c[this.n] > e.c[this.e];
            break;
        case 3:
            e.C = e.c[this.n] < e.c[this.e];
            break;
        case 4:
            e.C = e.c[this.n] == e.c[this.e];
            break;
        case 5:
            e.C = e.c[this.n] != e.c[this.e];
            break;
        case 6:
            e.C = e.c[this.n];
            break;
        case 7:
            e.C = !e.c[this.n]
        }
    }
    ,
    o.prototype.M = function(e) {
        switch (this.s) {
        case 0:
            e.o = this.A;
            break;
        case 1:
            e.C && (e.o = this.A);
            break;
        case 2:
            e.C || (e.o = this.A);
            break;
        case 3:
            e.o = this.A,
            e.Q = null
        }
        e.C = !1
    }
    ,
    r.prototype.M = function(e) {
        switch (this.s) {
        case 0:
            for (var t = [], n = 0; n < this.h; n++)
                t.unshift(e.f.pop());
            e.c[3] = e.c[this.i](t[0], t[1]);
            break;
        case 1:
            for (var r = e.f.pop(), o = [], i = 0; i < this.h; i++)
                o.unshift(e.f.pop());
            e.c[3] = e.c[this.i][r](o[0], o[1]);
            break;
        case 2:
            for (var a = [], c = 0; c < this.h; c++)
                a.unshift(e.f.pop());
            e.c[3] = new e.c[this.i](a[0],a[1])
        }
    }
    ,
    k.prototype.M = function(e) {
        switch (this.s) {
        case 0:
            e.f.push(e.c[this.i]);
            break;
        case 1:
            e.f.push(this.h);
            break;
        case 2:
            e.f.push(e.k[this.A]);
            break;
        case 3:
            e.f.push(e.g[this.A])
        }
    }
    ,
    B.prototype.M = function(t) {
        switch (this.s) {
        case 0:
            var s = t.f.pop();
            t.c[this.n] = t.c[this.e][s];
            break;
        case 1:
            var i = t.f.pop()
              , h = t.f.pop();
            t.c[this.e][i] = h;
            break;
        case 2:
            var A = t.f.pop();
            t.c[this.n] = eval(A)
        }
    }
    ,
    f.prototype.M = function(e) {
        e.c[this.i] = e.g[this.A]
    }
    ,
    u.prototype.M = function(e) {
        e.Q = this.A
    }
    ,
    C.prototype.M = function(e) {
        throw e.c[this.i]
    }
    ,
    b.prototype.M = function(e) {
        var t = this
          , n = [0];
        e.k.forEach(function(e) {
            n.push(e)
        });
        var r = function(r) {
            var o = new G;
            return o.k = n,
            o.k[0] = r,
            o.J(e.b, t.A, e.g, e.w),
            o.c[3]
        };
        r.toString = function() {
            return "() { [native code] }"
        }
        ,
        e.c[3] = r
    }
    ,
    g.prototype.M = function(e) {
        switch (this.s) {
        case 0:
            for (var t = {}, n = 0; n < this.h; n++) {
                var r = e.f.pop();
                t[e.f.pop()] = r
            }
            e.c[this.i] = t;
            break;
        case 1:
            for (var o = [], i = 0; i < this.h; i++)
                o.unshift(e.f.pop());
            e.c[this.i] = o
        }
    }
    ,
    G.prototype.v = function(e) {
        for (var t = atob(e), n = [], r = 0; r < t.length - 1; r += 2)
            n.push(t.charCodeAt(r) << 8 | t.charCodeAt(r + 1));
        this.b = n
    }
    ,
    G.prototype.y = function(e) {
        for (var t = atob(e), n = 66, r = [], o = 0; o < t.length; o++) {
            var i = 24 ^ t.charCodeAt(o) ^ n;
            r.push(String.fromCharCode(i)),
            n = i
        }
        return r.join("")
    }
    ,
    G.prototype.F = function(e) {
        var t = this;
        this.g = e.map(function(e) {
            return "string" == typeof e ? t.y(e) : e
        })
    }
    ,
    G.prototype.J = function(e, t, n) {
        for (t = t || 0,
        n = n || [],
        this.o = t,
        "string" == typeof e ? (this.F(n),
        this.v(e)) : (this.b = e,
        this.g = n),
        this.G = !0,
        this.x = Date.now(); this.G; ) {
            var r = this.b[this.o++];
            if ("number" != typeof r)
                break;
            var o = Date.now();
            if (500 < o - this.x)
                return;
            this.x = o;
            try {
                this.M(r)
            } catch (e) {
                if (this.R = e,
                !this.Q)
                    throw "execption at " + this.o + ": " + e;
                this.o = this.Q
            }
        }
    }
    ,
    G.prototype.M = function(e) {
        var t = (61440 & e) >> 12;
        new this.D[t](e).M(this)
    }
    ,
    "undefined" != typeof window && (new G).J("4AeTAJwAqACcAaQAAAAYAJAAnAKoAJwDgAWTACwAnAKoACACGAESOTRHkQAkAbAEIAMYAJwFoAASAzREJAQYBBIBNEVkBnCiGAC0BjRAJAAYBBICNEVkBnDGGAC0BzRAJACwCJAAnAmoAJwKoACcC4ABnAyMBRAAMwZgBnESsA0aADRAkQAkABgCnA6gABoCnA+hQDRHGAKcEKAAMQdgBnFasBEaADRAkQAkABgCnBKgABoCnBOhQDRHZAZxkrAUGgA0QJEAJAAYApwVoABgBnG6sBYaADRAkQAkABgCnBegAGAGceKwGBoANECRACQAnAmoAJwZoABgBnIOsBoaADRAkQAkABgCnBugABoCnByhQDRHZAZyRrAdGgA0QJEAJAAQACAFsB4gBhgAnAWgABIBNEEkBxgHEgA0RmQGdJoQCBoFFAE5gCgFFAQ5hDSCJAgYB5AAGACcH4AFGAEaCDRSEP8xDzMQIAkQCBoFFAE5gCgFFAQ5hDSCkQAkCBgBGgg0UhD/MQ+QACAIGAkaBxQBOYGSABoAnB+EBRoIN1AUCDmRNJMkCRAIGgUUATmAKAUUBDmENIKRACQIGAEaCDRSEP8xD5AAIAgYCRoHFAI5gZIAGgCcH4QFGgg3UBQQOZE0kyQJGAMaCRQ/OY+SABoGnCCEBTTAJAMYAxoJFAY5khI/Nk+RABoGnCCEBTTAJAMYAxoJFAw5khI/Nk+RABoGnCCEBTTAJAMYAxoJFBI5khI/Nk+RABoGnCCEBTTAJAMYBxIDNEEkB3JsHgNQAA==", 0, ["BRgg", "BSITFQkTERw=", "LQYfEhMA", "PxMVFBMZKB8DEjQaBQcZExMC", "", "NhETEQsE", "Whg=", "Wg==", "MhUcHRARDhg=", "NBcPBxYeDQMF", "Lx4ODys+GhMC", "LgM7OwAKDyk6Cg4=", "Mx8SGQUvMQ==", "SA==", "ORoVGCQgERcCAxo=", "BTcAERcCAxo=", "BRg3ABEXAgMaFAo=", "SQ==", "OA8LGBsP", "GC8LGBsP", "Tg==", "PxAcBQ==", "Tw==", "KRsJDgE=", "TA==", "LQofHg4DBwsP", "TQ==", "PhMaNCwZAxoUDQUeGQ==", "PhMaNCwZAxoUDQUeGTU0GQIeBRsYEQ8=", "Qg==", "BWpUGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZG1MbGR8ZGxkXGRFpGxkfGRsZFxkbGR8ZGxkHGRsZHxkbGRcZGw==", "ORMRCyk0Exk8LQ==", "ORMRCyst"]);
    var Q = function(e) {
        return __g._encrypt(e)
    };
    // exports.ENCRYPT_VERSION = t,
    // exports.default = Q
    // console.log(Q("client_id=c3cef7c66a1843f8b3a9e6a1e3160e20&grant_type=password&timestamp=1551062570616&source=com.zhihu.web&signature=e3ab73425750a4dbcf9ab357f6030fc281ceeb22&username=819201111%40qq.com&password=123456&captcha=&lang=en&ref_source=homepage&utm_source="))
    function encrypt(s){
        return Q(s);
    }
    
    

    相关文章

      网友评论

          本文标题:Python模拟登录知乎

          本文链接:https://www.haomeiwen.com/subject/smknuqtx.html