美文网首页
dataframe 操作总结

dataframe 操作总结

作者: 锦绣拾年 | 来源:发表于2021-07-14 22:42 被阅读0次

1、
去重

unique = df [df.duplicated(subset=[''])]#查重复行索引,默认除去了第一行
unique = df [df.duplicated()]
unique = df [df.duplicated(subset=[''],keep=False)]#得到所有的
keep{‘first’, ‘last’, False}, default ‘first’
Determines which duplicates (if any) to mark.

first : Mark duplicates as True except for the first occurrence.

last : Mark duplicates as True except for the last occurrence.

False : Mark all duplicates as True.

如何使用drop_duplicates进行简单去重(入门篇) - 侦探L的文章 - 知乎
https://zhuanlan.zhihu.com/p/116884554

合并

result = pd.merge(dfun, dftmp , how='left', on=['列名'])

筛选

df2 = df.loc[df['列名'==××]]
df2 = df[df['列名'==××]]
df2 = df[df['列名'].isin([list])]
df2 = df[~df['列名'].isin([list])]

随机抽取一些数

df = df.sample(frac=1).reset_index(drop=True)
.sample(frac=1)# 采样,frac表示采样比例

相关文章

网友评论

      本文标题:dataframe 操作总结

      本文链接:https://www.haomeiwen.com/subject/cprelltx.html