学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

作者: 1a076099f916 | 来源:发表于2018-10-20 15:51 被阅读10次

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

Python 的概念

正式进入主题，每学期末都要抢课，学校的服务器还贼渣。

私信小编007即可自动获取大量Python视频教程以及各类PDF！

先说一个简单的的方法，抓包，不断地发起选课请求，但有一个明显的缺点，那就是cookies容易过期。

还得重新登陆替换cookies。于是，就有了今天要分享的内容。

其大概分为两个部分：1.自动登录教务处；

2.查看成绩、抢课.

考虑到有些童鞋没有图像处理的经验，我会把训练好的结果直接给大家。今天先写第一个自动登录。

自动登录教务处

自动登陆教务处时，解决两个主要问题：** 1.验证码的识别；2.cookies的获取**

验证码识别

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1. 想要识别验证码，最起码知道验证码的url才能下载下来。其url地址：http://{}/CheckCode.aspx {}
</pre>

代表着相应的域名或ip地址。

2. 将验证码下载下来，其代码十分简单，后面会给放上代码的。得到验证码。如下图

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

3.接下来的过程很重要，对验证码进行数字图像处理，把图片里的字母一个个的分割出来。怎么做呢？

3.1先观察图像中的字符都蓝色的，可以提取RGB三色通道中的B通道。得到图像

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

3.2注意到图片中的小点点了没有，在数字图像中，这种情况称之为椒盐噪声，对这种噪声的处理呢，一般用的方法是中值滤波，当然，也可以根据B通道图像的特点，自己写一个更合适的算法。这里时间比较紧，就直接采用中值滤波，并图像增强，得到图像

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

3.3 嗯，在这里补充一点，我觉得自己写的算法应该比这种普遍的处理方法要好的多，强烈建议有能力的童鞋自己写一写，估计github、博客上已经有相关代码了。得到上图像，将图像进行二值化，先观察期灰度分布直方图，确定阈值为160（当然阈值可以适当的调整），最后，进行二值化。

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

灰度直方图

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

二值化之后的图像

3.4 将图片分割成单个的字符，采用的分割距离为[5,16,29,38,53]

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

效果图

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

全部分割完成

3.5 将得到的字符串与正确的验证码用机器学习中的KNN算法进行训练，当然也可以用RNN等算法，它的精确度什么的统统没计算，感觉准确度在80%左右。我的小破笔记本跑不了很多数据，见谅呀！

当然，你也可以采用不同的算法进行训练，其性能什么的可以做一下比对。得到结果集，保存出来，为下一次预测准备。放心吧，得到的结果以附件的形式发出来的。直接拿来用就好了。

以上内容呢，虽然没有代码，但是思路已经写得很清楚了，我写的代码有点乱，有点不好意思发出来。想要的话可以在帖子下面留言，如果想看的人多的话，我会整理整理发出来的。哦，差点忘了，所用的库有 numpy、PIL（2.7.x，三对应版本应该是pillow-pil记不太清了）、sklearn matplotlib

自动登录

在写以前呢，先大体看一下它的结构

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

解释一下，cache下面存放的是验证码，ImageIdentification下面的图像处理+识别的相关函数 model下存放的是训练好的结果，network下是关于网络请求的一些类。在写的过程中用了一点点面向对象的思想所以，抽象出一个config类，通过方正系统的一个分析，他在页面里隐藏着很重要的一个数据叫做__VIEWSTATE，在每次请求中都要用到，所以config的代码如下:

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># encoding=utf-8[/align]import requests
from bs4 import BeautifulSoup
import urllib
class config(object):
'''
抽象出的父类
'''
def init(self,url):
self.ip="xxx.xxx.xxx.xxx"
self.url=url.format(self.ip)
self.headers = {
'Accept': 'text/html,application/xhtml+xm…plication/xml;q=0.9,/;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': self.ip,
'Pragma': 'no-cache',
'Referer': self.url,
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'
}
def getVaules(self,cookies=''):
html = requests.get(self.url,headers=self.headers,cookies=cookies)
soup = BeautifulSoup(html.content, 'html.parser', from_encoding='gbk')
__Value = soup.find('input', {'type': 'hidden', 'name': '__VIEWSTATE'})
return urllib.quote_plus(__Value.get('value'))
</pre>

在登录的过程中肯定需要cookies，那么cookies怎么获取，才能使验证码不报错呢？这个也很简单，在获取验证码时，同时获取到验证码的cookies，在登录时填入验证码的cookies就可以实现。代码如下

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># encoding=utf-8
from BaseClass import config
import requests
class cookie(config):
def getcookies(self):
content=requests.get(self.url,headers=self.headers)
with open('./cache/yzm.png', 'wb') as f:
f.write(content.content)
return content.cookies
</pre>

前面铺垫了那么多，终于要写登录了，登录是不是要先抓包！抓到请求参数：__VIEWSTATE={}&txtUserName={}&Textbox1=&TextBox2={}&txtSecretCode={}&RadioButtonList1=%D1%A7%C9%FA&Button1=&lbLanguage=&hidPdrs=&hidsc=

__VIEWSTATE:这个参数在网页中已经有了，只需要解析出来，就行了。

txtUserName：这个参数代表学号

TextBox2：表示密码

txtSecretCode：表示验证码

RadioButtonList1：这个后面解码出来，代表着学生（gbk编码），

代码如下

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># encoding=utf-8
import requests
from BaseClass import config
from bs4 import BeautifulSoup
import urllib
class login(config):
def init(self,url):
self.data = '''__VIEWSTATE={}&txtUserName={}&Textbox1=&TextBox2={}&txtSecretCode={}&RadioButtonList1=%D1%A7%C9%FA&Button1=&lbLanguage=&hidPdrs=&hidsc='''
super(login,self).init(url)
def login(self,xh,pwd,yzm,cookies):
try:
data=self.data.format(self.getVaules(cookies=cookies),xh, pwd, yzm)
s=requests.session()
s.cookies=cookies
con = s.post(self.url, headers=self.headers, data=data)
if con.url==self.url:
print("验证码可能出现错误！请重新登陆")

print(con.content.decode('gbk'))

return False,None,None,None,
else:
data=con.content
soup=BeautifulSoup(data,'html.parser',from_encoding='gbk')
xm=soup.find('span',attrs={'id':'xhxm'}).text[0:len(soup.find('span',attrs={'id':'xhxm'}))-3]

print(xm)

return True,cookies,xh,xm
except Exception, e:
print(e.args)
print(e.message)
print e.doc
print(u"出现错误！请重新登陆")
return False,None,None,None,
</pre>

图像处理代码

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># encoding=utf-8
from PIL import Image,ImageFilter
def process():
file_path='./cache/yzm.png'
split_lines = [5,17,29,41,53]
img=Image.open(file_path)
img = img.convert('RGB')
r, g, b = img.split()
image_b_median = b.filter(ImageFilter.MedianFilter())
image_b_median_binary = image_b_median.point(lambda i: i > 160, mode='1')
c = 1
for x_min, x_max in zip(split_lines[:-1], split_lines[1:]):
image_b_median_binary.crop([x_min, 0, x_max, 22]).save('./cache/yzm-{}.png'.format(c))
c = c + 1
print(u"图片处理完成")
</pre>

预测代码

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># encoding=utf-8
from sklearn.externals import joblib
from PIL import Image
import numpy as np
def predict():
photo_path='./cache/yzm-{}.png'
X = []
for i in range(1,5):
img = Image.open(photo_path.format(i))
ls = np.array(img).tostring()
ls=np.fromstring(ls,dtype=bool)
X.append(ls)
file_path = './model/knn.pkl'
knn=joblib.load(file_path)
ls=knn.predict(X)
return "".join(ls)</pre>

网友评论

大数据爬虫Python AI Sql

本文标题：学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

本文链接：https://www.haomeiwen.com/subject/kddmzftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

print(con.content.decode('gbk'))

print(xm)

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据爬虫Python AI Sql

学生利用python破解验证码，模拟登录教务处查看成绩、抢课！

print(con.content.decode('gbk'))

print(xm)

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据 爬虫Python AI Sql

大数据爬虫Python AI Sql