str和unicode

作者: MA木易YA | 来源:发表于2018-11-28 22:43 被阅读0次

str和unicode
str和unicode
encode和decode学习
python转码问题
str、unicode、bytes和编码
Python str, unicode
UnicodeEncodeError:
'ascii' codec can't
js Unicode编码
常见编码问题UnicodeEncodeError

str和unicode都是basestring的子类

python3用isinstance来验证类型，这里可以结合basestring来验证变量是否为字符串类型

from numpy.compat import basestring

str = "hello world"
   
print("str是否是字符串：", isinstance(str, basestring))
print("str的类型是： ", type(str))

#输出
str是否是字符串： True
str的类型是：  <class 'str'>

两者区别

Py2 和 Py3 都用 str 类型来表示字符串，不过在Py2中，str跟bytes是等价的；在Py3中，str跟unicode是等价的。另外，值得注意的是，在Py2中，bytes跟unicode是等价的；在Py3中则是不等的。

a = 'hello'
b = u'hello'
c = b'hello'
print("a的类型是：", type(a))
print("b的类型是：", type(b))
print("c的类型是：", type(c))

#输出
a的类型是： <class 'str'>
b的类型是： <class 'str'>
c的类型是： <class 'bytes'>

字节型转换为字符型可以理解为解码（decode），反之则是编码（encode）。

字符 -> 字节

a = '中文'
print("a转换为字节后：", a.encode('utf-8'))
print("转换后的类型是： ", type(a.encode('utf-8')))

#输出
a转换为字节后： b'\xe4\xb8\xad\xe6\x96\x87'
转换后的类型是：  <class 'bytes'>

字节 -> 字符

a = b'\xe4\xb8\xad\xe6\x96\x87'
print("a转换为字符后：", a.decode('utf-8'))
print("转换后的类型是： ", type(a.decode('utf-8')))

#输出
a转换为字符后： 中文
转换后的类型是：  <class 'str'>

最后，因为在Py3中，字符串与unicode是等价的，所以字符串是没有decode方法的，如果调用该方法则会报错

a = '中文'
a.decode('utf-8')

#输出
Traceback (most recent call last):
  File "F:/ServerveManager/Pycharm/PyCharm 2018.2.2/files/python_test/test_11_28/one.py", line 6, in <module>
    a.decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'