Python | 数据结构常用方法

作者: shwzhao | 来源:发表于2021-09-08 17:32 被阅读0次

python 有什么用？或者说 python 对我有什么用？

我试着写了一个提取基因序列小脚本，几个数据结构的应用。
python 学的不好，欢迎批评。

$ cat cds.fa # fasta 序列文件
>gene3
ATCGACCGTAGCC
AGCGAGAGCGACG
TTAGCATTTAC
>gene2
AATCAGGGCACCT
CCATGCAGGGC
>gene1
ACAGCAGTTTCAGT
CAG
>gene4
GCAGCTTATTTCGA
GCAGTCACAGTACA
CGATCAG

$ cat geneid.txt # gene id
gene1
gene2
gene3

$ python3 extract_gene.py # 脚本运行
>gene1
ACAGCAGTTTCAGTCAG
>gene2
AATCAGGGCACCTCCATGCAGGGC
>gene3
ATCGACCGTAGCCAGCGAGAGCGACGTTAGCATTTAC

fasta_dict = {}
ids=[]

with open('geneid.txt') as e:
    for readid in e.readlines():
        ids.append(readid.strip())

with open('cds.fa') as f:
    for line in f:
        if line.startswith(">"):
            key = line.split()[0].lstrip(">").rstrip()
            fasta_dict[key] = ''
        else:
            fasta_dict[key] += line.rstrip()

for id in ids:
    geneid = '>' + id
    print(geneid + "\n" + fasta_dict[id])

dir() 函数
不带参数时，返回当前范围内的变量、方法和定义的类型列表；带参数时，返回参数的属性、方法列表。
如：dir(dict)，这样就知道了字典有哪些方法了。

>>> dir(dict)
['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']

1. 数据结构

1.1 字符串

1.1.1 大小写切换

str.capitalize(): 字符串的首字母大写，其他字母小写
str.title(): 字符串每个字符的首字母大写，其他字母小写
str.upper(): 全部大写
str.lower(): 全部小写
str.swapcase(): 大小写切换

1.1.2 字符串分割

str.split(): 指定分隔符，进行分割，可以指定分割数量
str.splitlines()，默认分隔符为\n、\r、\r\n
str.partition(): 第一个分隔符之前、本身、之后
str.rpartition(): 从右往左匹配

1.1.3 去除头尾指定字符

str.strip([chrs]): 去除字符串头尾指定字符[chrs]，默认为空格
str.lstrip(): 去除头部，l = left
str.rstrip(): 去除尾部，r = right

文件分割后，要注意去除尾部的 \n。

1.1.4 查询

str.count(): 统计字符串中某个字符出现的次数
str.find(): 检测字符串中是否包含子字符串。若有，返回在字符串中的起始位置
str.index(): 类似于str.find()

1.1.5 替换

str.replace(old, new[, num]): 把old字符替换为new，num设置最大替换数目

str.maketrans()、str.translate()、str.expandtabs()后面再补充吧，还没理解

1.1.6 填充

str.center(width[, fillchar]): 居中，填充字符fillchar默认为空格
str.rjust(width[, fillchar]): 右对齐，填充字符fillchar默认为空格
str.zfill(width): 右对齐，用0填充

>>> 'hello'.center(20); 'hello world'.center(20); 'hello'.zfill(20); 'hello'.rjust(20)
'       hello        '
'    hello world     '
'000000000000000hello'
'               hello'

1.1.7 判断

str.startswith(): 判断起始
此外还有判断结尾、是否大小写、是否是字母.....

1.2 列表

1.2.1 添加

list.append(): 添加新的元素到列表的末尾，如果新元素为列表，将这个列表整个当成一个元素
list.extend(): 如果新元素为列表，会将列表中的元素添加
list.insert(): 将元素插入到制定位置，如果新元素为列表，将这个列表整个当成一个元素
list1 + list2: 效果同 list1.extend(list2)，原列表不改变，所以更耗内存

1.2.2 删除

list.remove(): 按值删除
list.pop(): 弹出最后一个元素，无放回删除
list.clear(): 清空列表

1.2.3 排序

list.sort(): 正序
list.reverse(): 倒序

1.2.4 其他方法

list.copy(): 浅复制
list.index(x[, start[, end]]): 查询
list.count(): 统计某个元素在列表中出现的次数

>>> a = [1, 2, 3, 4, 5]
>>> c = a.copy()
>>> c.reverse()
>>> c
[5, 4, 3, 2, 1]
>>> a
[1, 2, 3, 4, 5]

改变列表 c 并不会同时改变列表 a

2.5 函数对列表的使用

sorted(list): 列表元素排序
del list[2]: 按位置删除
len(list): 列表元素个数
sum(list): 列表中元素的和
max(list): 列表元素中最大值
min(list): 列表元素中最小值

1.3 元组

跟列表很像，不过不可改

>>> a, b, c = ("first", "second", "third") # 拆包
>>> a
'first'
>>> b
'second'
>>> c
'third'

1.4 字典

1.4.1 字典常用方法

dict.keys(): 键
dict.values(): 值
dict.items(): 键值对
dict.get(): 返回指定键的值
dict.copy(): 浅复制
dict.popitem(): 弹出，无放回
dict.clear(): 删除字典内所有元素
dict1.update(dict2): 合并字典，对重复的键值对进行覆盖

dict.fromkeys()、dict.setdefault()...不知道能有什么用

1.4.2 函数对字典的使用

del dict: 删除字典
len(dict): 字典元素个数

推导式什么的以后再说

1.5 集合

集合里面没有重复，没有顺序，可用于去重

>>> a = [1,2,3,4,5,1,3,4]
>>> set(a) # 去重
{1, 2, 3, 4, 5}

网友评论

本文标题：Python | 数据结构常用方法

本文链接：https://www.haomeiwen.com/subject/nrudcltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！