If you write python programs and do somethings about characters , maybe there are some matters about character gibberish often happens. The matter has bothered me for a long time, so I spent some time to thoroughly understand this problem.
There are two blogs is useful:
About character encoding in python 2.7, I think it is important to note the following points:
-
We can think that all the Chinese words are somethings be encrypted , but computer stored it as true words , we can't read this word directly,the character encoding rules like utf-8, gbk, unicode are the password.
-
If you open a file and read it, you should tell python what is the file's encoding, if there are not any word about this, python will use the environment encoding, like the script first line:
#coding = utf-8
you should decode you file characters by decode() method, python can use this password to decode file and show the Chinese words or others words.
After read, if you want to save this file, you should encode Chinese words which you can read directly by encode() method, python can use this password to encode the Chinese words to some characters, and save it in disk.
-
Python use unicode as defult password to decode words, if you give if a str, and not tell the encoding type, it can't tell you the true word what you want.
-
You can't use decode() method in unicode, and can't use encode() method in str.
网友评论