目录
引入
计算机表达数据的方式
'111001001011100010100101'
-
二进制
-
最小单位bit
bit位 byte字节 1byte = 8bits
ASCII
-
American Standard Code for Information Interchange (美国信息交换标准代码)
-
8bits (即1byte) 表示256 (即2^8) 种状态
# Python 3.7.6
>>> bin(ord('A'))
'0b1000001'
文本 (256个字符) => ASCII => 二进制
UTF-8
-
Unicode (统一字符编码标准) 囊括世界上所有符号
-
UTF-8 使用1~4个字节表示一个符号
# Python 3.7.6
>>> '中'.encode('UTF-8')
b'\xe4\xb8\xad'
Unicode是标准 UTF-8是标准的实现
Base64
- 基于64个可打印字符(A-Z a-Z 0-9 + /)来表示二进制数据
字符 | 中 | A | ||||||
十六进制 | E4 | B8 | AD | 41 | ||||
8位二进制 | 1110 0100 | 1011 1000 | 1010 1101 | 0100 0001 | ||||
6位二进制 | 111001 | 100010 | 100010 | 101101 | 010000 | 010000 | 000000 | 000000 |
索引 | 57 | 11 | 34 | 45 | 16 | 16 | ||
输出 | 5 | L | i | t | Q | Q | = | = |
# Python 3.7.6
>>> base64.b64encode('中A'.encode('UTF-8'))
b'5LitQQ=='
- Base64缺点是会降低传输效率
二进制 => Base64 => 文本 (64个字符)
URL Encode
- 百分号编码 => 基于UTF-8编码除(A-Z a-Z 0-9)编码成"%十六进制"
# Python 3.7.6
>>> from urllib.parse import urlencode
>>> urlencode({"中" : "A%"})
'%E4%B8%AD=A%25'
def urlencode(query, doseq=False, safe='', encoding=None, errors=None,
quote_via=quote_plus):
l = []
for k, v in query:
if isinstance(k, bytes):
k = quote_via(k, safe)
else:
k = quote_via(str(k), safe, encoding, errors)
if isinstance(v, bytes):
v = quote_via(v, safe)
else:
v = quote_via(str(v), safe, encoding, errors)
l.append(k + '=' + v)
return '&'.join(l)
def quote_plus(string, safe='', encoding=None, errors=None):
if ((isinstance(string, str) and ' ' not in string) or
(isinstance(string, bytes) and b' ' not in string)):
return quote(string, safe, encoding, errors)
if isinstance(safe, str):
space = ' '
else:
space = b' '
string = quote(string, safe + space, encoding, errors)
return string.replace(' ', '+')
def quote(string, safe='/', encoding=None, errors=None):
if not string:
return string
if encoding is None:
encoding = 'utf-8'
if errors is None:
errors = 'strict'
string = string.encode(encoding, errors)
return quote_from_bytes(string, safe)
def quote_from_bytes(bs, safe='/'):
if isinstance(safe, str):
safe = safe.encode('ascii', 'ignore')
else:
safe = bytes([c for c in safe if c < 128])
if not bs.rstrip(_ALWAYS_SAFE_BYTES + safe):
return bs.decode()
try:
quoter = _safe_quoters[safe]
except KeyError:
_safe_quoters[safe] = quoter = Quoter(safe).__getitem__
return ''.join([quoter(char) for char in bs])
_safe_quoters = {}
class Quoter(collections.defaultdict):
def __init__(self, safe):
self.safe = _ALWAYS_SAFE.union(safe)
def __repr__(self):
return "<%s %r>" % (self.__class__.__name__, dict(self))
def __missing__(self, b):
res = chr(b) if b in self.safe else '%{:02X}'.format(b)
self[b] = res
return res
网友评论