猫眼字体动态加载的破解方法分为这几个步骤:
准备工作,获取页面内容:
from fontTools.ttLib import TTFont
url = 'https://maoyan.com/cinema/2158?poi=1425678&movieId=1212'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
}
# 获取猫眼页面
response = requests.get(url,headers).content.decode()
1.下载网页的font字体包,作为对照字体文件base.woff;下面的的代码运行一次过后就可以注释了。
#获取页面中的字体文件url
woff_url = 'http:' + re.findall(r"//vfile.meituan.net/colorstone/.*?\.woff",response)[0]
# 获得映射字体表,并使用百度字体编辑器http://fontstore.baidu.com/static/editor/ 获得basenumlist与baseunicode的映射关系
font_response = requests.get(woff_url,headers).content
with open('base.woff','wb')as f:
f.write(font_response)
2.使用百度字体编辑器 http://fontstore.baidu.com/static/editor/,打开base.woff文件就能获取对照字体文件乱码与数字之间的映射关系,获取basenumlist与baseunicode 数据。
data:image/s3,"s3://crabby-images/57a62/57a62c9882613312b78b054182159a95bdc162cf" alt=""
# basenumlist与baseunicode 一一对应
# basenumlist = ['6', '3', '1', '7', '5', '9', '0', '4', '8', '2']
# baseunicode = ['uniF8C5', 'uniF216', 'uniE7C4', 'uniE5B0', 'uniE8BD', 'uniF2FD', 'uniF087', 'uniF0C6', 'uniF869',
# 'uniEB10']
3.因为字体是动态加载的,所以每一次访问猫眼的页面字体文件都会出现变化。再次访问网页,下载当前页面的字体文件maoyan.woff。
#获取页面中的字体文件url
woff_url = 'http:' + re.findall(r"//vfile.meituan.net/colorstone/.*?\.woff",response)[0]
# 下载动态字体文件
font_response = requests.get(woff_url,headers).content
with open('maoyan.woff','wb')as f:
f.write(font_response)
4.将当前的字体文件与对照字体文件进行比较,获取乱码所代表的数字的顺序,然后与字体文件中的乱码一一对应组成一个字典。
# 字体解密
def modify_maoyan():
basefont = TTFont('base.woff')
maoyanfont = TTFont('maoyan.woff')
#获取 GlyphOrder 字段的值,一共有12个,但是前两个不需要
unilist = maoyanfont.getGlyphOrder()
# unilist = maoyanfont['cmap'].tables[0].ttFont.getGlyphOrder()
print(unilist)
numlist = []
basenumlist = ['6','3','1','7','5','9','0','4','8','2']
baseunicode = ['uniF8C5','uniF216','uniE7C4','uniE5B0','uniE8BD','uniF2FD','uniF087','uniF0C6','uniF869','uniEB10']
# 对maoyanfont来说前两个不需要
# 获取 maoyanfont 的数字顺序
for i in range(2,12):
maoyangly = maoyanfont['glyf'][unilist[i]]
for j in range(10):
basegly = basefont['glyf'][baseunicode[j]]
#比较两个字体文件中的数字字体结构
if maoyangly == basegly:
numlist.append(basenumlist[j])
#获取页面中的乱码
rowlist = []
for i in unilist[2:]:
i = i.replace('uni','&#x').lower() + ';'
rowlist.append(i)
# 实现 乱码与数字的一一对应
print(dict(zip(rowlist,numlist)))
return dict(zip(rowlist,numlist))
5.替换请求到的页面中的乱码,然后再对页面进行解析获取正确的数据。
#替换掉 请求返回结果中的乱码
for key,value in modify_maoyan().items():
if key in response:
response = response.replace(key,value)
#获取页面中的正确数字
maoyan_html = etree.HTML(response)
luanma_data = maoyan_html.xpath('//td/span/span/text()')
print(luanma_data)
6.运行结果
data:image/s3,"s3://crabby-images/e687a/e687a33c8cc951e5035bc36f6017ada872b6cff5" alt=""
data:image/s3,"s3://crabby-images/72958/72958b9322e7bf7bfafc786aaacf1ce49e7de0aa" alt=""
网友评论