Python基础008--字符串常量方法、decode和enco

作者: 不一样的丶我们 | 来源:发表于2018-02-25 19:16 被阅读12次

Python基础008--字符串常量方法、decode和enco
Python
urllib在headers中设置User Agent
Python encode()和decode()方法：字符串编码
python3基础02数值和字符串（二）
cannot use a string pattern on a
python学习计划大全（从入门到实战）
Python3中copy模块常用功能及其他几种copy方式比较
p4-equals系列：jdk1.6中，JVM字符串常量池在哪里
python报错提示 AttributeError: ‘byte

字符串常量方法的掌握以及decode和encode的熟练掌握，str和unicode的区别使用

字符串常量的方法

# 列表反序   [::-1] 内置函数reversed
In [83]: s = "hello world"
In [84]: s[::-1]
Out[84]: 'dlrow olleh'
In [87]: "".join(reversed(s))
Out[87]: 'dlrow olleh'

# 索引和分片
str = "abcdefg"
print(str)--->打印全部
print(str[0])--->打印第一个值a
print(str[0:-1])--->打印全部
print(str[2:4])--->打印cd--[m,n]/表示的是从下标m到n-1
print(str[2:])--->从下标2到最后
print(str*2)--->打印全部数据两次
# 经常用到的是字符串和整数之间的转换
eg: str="2" b=int(str)-->b=2是一个整数类型
print(str[::-1])--->字符串反转
print(str[::2])--->每间隔两个下标取一个值

# 字符串大小写相关的方法
In [93]: x = "abcdefg"
In [94]: len(x)             # 获取字符串的长度
Out[94]: 7
In [95]: x.upper()          # 将字符串转换为大写
Out[95]: 'ABCDEFG'
In [96]: x.lower()          # 将字符串转换为小写
Out[96]: 'abcdefg'
In [97]: x1 = "heLLo wORld"
In [98]: x1.isupper()       # 判断字符串是否都是大写
Out[98]: False
In [99]: x1.islower()       # 判断字符串是否都是小写
Out[99]: False
In [100]: x1.swapcase()     # 将字符串中的字母大写转小写，小写转大写
Out[100]: 'HEllO WorLD'
In [101]: x.capitalize()    # 将字符串中的首字母转为大写
Out[101]: 'Abcdefg'

# 判断类 startswith endswith--->判断字符串是否是以什么开头结尾的
In [103]: x1.startswith("index")
Out[103]: False
In [104]: x1
Out[104]: 'heLLo wORld'
In [105]: x1.startswith("he")
Out[105]: True

# 查找类函数
find --->查找字串在字符串中的位置，查找失败返回-1
index--->和find相似，查找失败报错ValueError;index-->在字符串中查找字串第一次出现的位置,返回下标；
rfind -->与find类似，区别在于从后面开始查找
In [106]: s = 'Return the lower index in S where substring sub is found'
In [107]: s.find("in")
Out[107]: 17
In [108]: s.find("hh")
Out[108]: -1
In [109]: s.rfind("is")
Out[109]: 48
In [110]: s.find("is",20)       # 指定从哪个下标开始查找
Out[110]: 48
In [111]: s.index("the")
Out[111]: 7

# 拆分 去重复
字符串中strip用法--->只移除字符串头尾指定的字符，中间的部分不会移除
str = "0000000this is string 0000example....wow!!!0000000"
print str.strip('0')
结果：this is string 0000example....wow!!!
字符串中lstrip()--->用于截掉字符串左边的空格或指定字符
字符串中rstrip()--->用于截掉字符串右边的空格或指定字符

字符串中split的用法--->用于分割某个字符串/得到一个分割后的列表
str = "abcdefg"
str.split("c")
结果：['ab', 'defg']

# 字符串格式化format
1、占位符或者下标形式显示
In [114]: "{} is apple".format("apple")
Out[114]: 'apple is apple'
In [115]: "{0} is apple".format("apple")
Out[115]: 'apple is apple'
2、关键字参数形式访问
In [116]: dic = {"a":1,"b":2,"c":3}
In [118]: "{a} is 1, {b} is 2,{c} is 3,{a} little {c}".format(**dic)
Out[118]: '1 is 1, 2 is 2,3 is 3,1 little 3'
3、format的其他功能
In [120]: "{:.2f}".format(3.1415926)        # 保留两位小数
Out[120]: '3.14'
In [121]: "{:10.2f}".format(3.1415926)      # 前面补十个空格
Out[121]: '      3.14'
In [122]: "{:^10.2f}".format(3.1415926)     # ^两端对齐
Out[122]: '   3.14   '
In [124]: "{:_^10.2f}".format(3.1415926)    # _空格补位
Out[124]: '___3.14___'

decode 和 encode
- decode-->解码、encode-->编码
- 首先要搞清楚，字符串在python内部的表示是unicode编码；因此在做编码转的时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码(decode)成unicode,再从unicode编码(encode)成另一种编码；
  - decode的作用是将其他编码的字符串转换成unicode编码，如str1.decode("gbk2312"),表示将gbk2312编码的字符串str1转换成unicode编码
  - encode的作用是将unicode编码转换成其他编码的字符串，如str2.encode("gb2312"),表示将unicode编码的字符串str2转换成gb2312编码
- 总的意思是：想要将其他的编码转换成utf-8必须先将其解码成unicode然后重新编码成utf-8,它是以unicode为转换媒介的
```
In [150]: ss = "中文"
In [151]: l = ss.decode("utf-8")                # 将utf8解码成unicode
In [152]: isinstance(l,unicode)                 
Out[152]: True
In [153]: l = l.encode("utf-8")                 # 将unicode编码成utf8
In [154]: isinstance(l,unicode)
Out[154]: False
In [155]: import sys
In [156]: print sys.getdefaultencoding()        # Linux系统下默认的是ascii
ascii

# 修改系统默认编码
In [167]: print sys.getdefaultencoding()        # 获取系统默认编码
ascii
In [168]: reload(sys)
<module 'sys' (built-in)>
In [169]: sys.setdefaultencoding("utf8")        # 修改系统默认编码
In [170]: print sys.getdefaultencoding()
utf8
```
str和unicode的区别使用
- 首先理解概念：str->decode("coding")->unicode->encode("coding")->str
- str和unicode都是basestring下面的子类
- 区别：str是字符串，是unicode编码(encode)后的字节组成的
- 一个中文字符串在unicode中占一个字节，在gbk中占2个字节，在utf-8中占3个字节
  - win系统默认编码是gbk,Linux默认编码是utf-8,py文件默认编码是ascii
- py文件默认的是ascii，如果用到非ascii字符，需要在文件的头部进行编码声明
  - # -*- coding: utf-8 -*- 或者#coding=utf-8
  - 若头部声明coding=utf-8, a = '中文'其编码为utf-8
  - 若头部声明coding=gb2312, a = '中文' 其编码为gbk
- 默认使用规则
  - 不对str使用编码(encode)，不对unicode使用解码(decode)
```
In [8]: a = "中文"
In [14]: u = a.decode("utf-8")
In [15]: type(u)
Out[15]: unicode
In [16]: u = u.encode("utf-8")
In [17]: type(u)
Out[17]: str
In [18]: len(u)
Out[18]: 6
```