Python3 中的默认编码是 UTF-8,这给大家写 Python 代码带来了很大的便利,不用再像 Python2.x 那样为数据编码操碎了心。但是,由于全面转向 UTF-8 编码,Python3 里面会有一些小细节,稍有不慎容易栽坑。本文就对二进制数据 XOR 编码这一种操作,浅析 Py2/Py3 中默认编码相关的一个细节小差异而引起的小 Bug。
XOR 编码是最简单有效的编码方法之一,虽然简单,但仍然应用广泛。在分析恶意样本时,经常会遇到样本内置的隐秘数据或者网络通信数据,用到了 XOR 编码。比如,一个典型就是 XOR.DDoS 家族,它样本内部关键字符串全用 XOR 编码过,而且其网络通信中 Bot 发给 C2 的上线数据包和 C2 给 Bot 下发的控制指令数据包中均涉及 XOR 编码/解码操作。
对于这类样本,分析的时候我们不免要写一些自动化的解析脚本,把其中的编码数据还原成名文以便分析。在其他开发场景中也偶尔会用 Python 写一些 XOR 编码/解码的程序。网上一搜 「Python XOR 编码 加密」或者「Python XOR encoding crypt」,都会搜出很多别人发出来的 Python XOR 编解码脚本,大多数情况下拿来直接用就行。比如我搜来的几个中文帖子中的相关脚本(本人不保证下面截图里代码的正确性):
Python学习交流群:1004391443,这里有资源共享,技术解答,还有小编从最基础的Python资料到项目实战的学习资料都有整理,希望能帮助你更了解python,学习python
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540091" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image <input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image> <tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540096" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image <input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image> <tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540100" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image <input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image> <tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540103" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
这些脚本,在 Python2 环境下都没有问题,都可以正确进行 XOR 编解码,然而如果直接拿到 Python3 环境下去运行,却会发生一个不容易发现的小 Bug。来看一段在 ipthon3 里的操作记录:
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">In [1]: def xor_crypt(data, key):
...: cipher_data = []
...: len_data = len(data)
...: len_key = len(key)
...: for idx in range(len_data):
...: bias = key[idx % len_key]
...: curr_byte = data[idx]
...: cipher_data.append(chr(bias ^ curr_byte))
...: return bytearray("".join(cipher_data).encode())
...:
In [2]: xor_key = b'0123456789'
In [3]: sam1 = b'abcdefgh'
In [4]: sam2 = b'abcdefghijklmnopqrstuvwxyz'
In [5]: print(xor_crypt(sam1,xor_key))
bytearray(b'QSQWQSQ_')
In [6]: print(xor_crypt(xor_crypt(sam1,xor_key), xor_key))
bytearray(b'abcdefgh')
In [7]: print(xor_crypt(sam2, xor_key))
bytearray(b'QSQWQSQ_QS[]_][EGEKMEGEKMO')
In [8]: print(xor_crypt(xor_crypt(sam2,xor_key), xor_key))
bytearray(b'abcdefghijklmnopqrstuvwxyz')
In [9]: sam3 = b'\x7f\x80\x81\x90\x91\xA0\xA1\xB0\xB1\xC0\xC1\xD0\xD1\xE0\xE1\xF0\xF1\xFA'
In [10]: print(xor_crypt(sam3, xor_key))
bytearray(b'O\xc2\xb1\xc2\xb3\xc2\xa3\xc2\xa5\xc2\x95\xc2\x97\xc2\x87\xc2\x89\xc3\xb9\xc3\xb1\xc3\xa1\xc3\xa3\xc3\x93\xc3\x95\xc3\x85\xc3\x87\xc3\x8d')
In [11]: print(xor_crypt(xor_crypt(sam3,xor_key), xor_key))
bytearray(b'\x7f\xc3\xb3\xc2\x83\xc3\xb1\xc2\x87\xc3\xb7\xc2\x95\xc3\xb5\xc2\x9d\xc3\xbb\xc2\xa5\xc3\xb3\xc2\xa5\xc3\xb1\xc2\xb3\xc3\xb7\xc2\xbf\xc3\xb4\xc2\x81\xc3\xba\xc2\x81\xc3\xb2\xc2\x93\xc3\xb0\xc2\x97\xc3\xb6\xc2\xa5\xc3\xb4\xc2\xad\xc3\xba\xc2\xb5\xc3\xb2\xc2\xb5\xc3\xb0\xc2\xb9')
In [12]: print(len(sam3))
18
In [13]: print(len(xor_crypt(xor_crypt(sam3,xor_key), xor_key)))
69
</pre>
可以看到,仿照 Python2 环境下那些常用的 XOR 编码操作写的函数,在 Python3 环境下,偶尔会出现意料之外的结果:上面的操作记录中,对于 sam1 和 sam2 两个全都是可打印字符的字节串进行 XOR 编解码是没有问题的;但是对于 sam3 ,一个内含大量 HEX 值大于 0x7F 的非可打印字符字节串,原本是 18 个字节,进行两次 XOR 操作之后竟然变成了 69 个字节。
这就十分蹊跷了。问题出在哪个环节?是函数内部的字节列表 cipher_data 的问题,还是最后 bytearray() 操作出了问题,还是进行 XOR 计算的时候, chr() 函数的问题?
经过一番排查,发现这是 chr() 函数的问题。先看这个函数在 Python2 和 Python3 中各有什么表现:
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540119" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image <input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image> <tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540124" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
在 Python2 版本中,除了 chr() 还有一个 unichr() ,可以看到 Py2 中的 unichr() 与Py3 中的 chr() 行为是一致的:对于 HEX 值大于 0x7F 的字符,返回值占 2 Bytes;对于 HEX 值小于或等于 0x7F 的字符,返回值占 1 Byte。
为什么会出现这么个差异?刚开始一直以为 chr() 函数只会返回 1 Byte 的结果,对此感到很是不解。
查阅一下 Py2 中 chr() 和 unichr() 的文档如下:
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540133" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image <input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image> <tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540139" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
而 Py3 中 chr() 函数的文档说明如下:
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540143" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
从文档来看, Py3 中的 chr() 函数确实对应到了 Py2 中的 unichr() 函数,只返回 Unicode 编码的结果。在点破最后的一层窗户纸之前,我们再去 CPython 的源码里瞅一眼,以便把这个结论锤结实了。
Py3 中的 chr() 函数,源码中是这样实现的:
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540150" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image <input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image> <tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540152" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image <input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image> <tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540156" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
至于其中的 unicode_char() 函数如何实现,我们就不深究了,知道它就是返回一个 Unicode 编码的字符即可。再看 Py2 中 unichr() 函数:
<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1558939540161" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>
如出一辙有木有。
那 最后一层窗户纸 到底是什么?就是 Py3 默认的 UTF-8 编码了。在 http://www.utf-8.com 网站上有这么一段话:
UTF-8 encodes each Unicode character as a variable number of 1 to 4 octets , where the number of octets depends on the integer value assigned to the Unicode character. It is an efficient encoding of Unicode documents that use mostly US-ASCII characters because it represents each character in the range U+0000 through U+007F as a single octet .
注意上面加粗部分的重点:
- UTF-8 编码的字符占 1~4 个字节;
- 字符 U+0000 到 U+007F 都用一个字节来表示,其它字符 1 个字节不够,就用 2~4 个字节来表示。
这样就明确上面问题的原因了:Py3 中的 chr() 函数,只有在参数的 HEX 值位于 [0x00, 0x7F] 区间内的时候才返回 1 Byte 的结果,这个结果同于 Py2 中的 chr() 函数;当 HEX 值大于 0x7F ,其返回值占 2 Bytes,行为同于 Py2 中的 unichr() 函数。
那么 Py3 中正确的 XOR 编解码姿势是什么?上面 ipython3 操作记录中的函数稍加改动即可:
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def xor_crypt(data, key):
cipher_data = []
len_data = len(data)
len_key = len(key)
for idx in range(len_data):
bias = key[idx % len_key]
curr_byte = data[idx]
cipher_data.append(bias ^ curr_byte)
return bytearray(cipher_data)
</pre>
当然,还有更简洁的写法:
<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def XORCrypt(data, key):
return bytearray(a^b for a, b in zip(*map(bytearray, [data, key])))
</pre>
网友评论