1、
去重
unique = df [df.duplicated(subset=[''])]#查重复行索引,默认除去了第一行
unique = df [df.duplicated()]
unique = df [df.duplicated(subset=[''],keep=False)]#得到所有的
keep{‘first’, ‘last’, False}, default ‘first’
Determines which duplicates (if any) to mark.
first : Mark duplicates as True except for the first occurrence.
last : Mark duplicates as True except for the last occurrence.
False : Mark all duplicates as True.
如何使用drop_duplicates进行简单去重(入门篇) - 侦探L的文章 - 知乎
https://zhuanlan.zhihu.com/p/116884554
合并
result = pd.merge(dfun, dftmp , how='left', on=['列名'])
筛选
df2 = df.loc[df['列名'==××]]
df2 = df[df['列名'==××]]
df2 = df[df['列名'].isin([list])]
df2 = df[~df['列名'].isin([list])]
随机抽取一些数
df = df.sample(frac=1).reset_index(drop=True)
.sample(frac=1)# 采样,frac表示采样比例
网友评论