美文网首页
利用Python collection库对Excel数据去重

利用Python collection库对Excel数据去重

作者: 火卫控 | 来源:发表于2021-06-22 00:29 被阅读0次

    有时候我们需要对excel表格中的数据去重,在行数比较多的时候,肉眼很难处理,我们可以采用Python来对excel表格进行读取操作,根据某一栏,利用collection库对重复元素进行计数,获得重复元素的信息,最后根据需要在原excel表中就行删除。

    Python代码如下:

    # -*- coding:utf-8 -*-
    # @Time      :2021/6/21 11:45
    # @Author    :
    # @File      :excel_undupl.py
    
    import pandas as pd
    import numpy as np
    from collections import Counter
    
    df = pd.read_excel(r"D:\s1s1-06220003.xlsx")
    # print(df)
    
    
    phones = Counter(df['Gene Name'])
    # 通过调用most_common()方法,能够获取到
    # 排序以后的结果
    phones_sort = phones.most_common()
    # 以下列表解析的结果是遍历结果并
    # 排除掉val <= 1的结果,并返回key
    dul = [item[0] for item in phones_sort if item[1] > 1]
    print(dul)
    print(phones_sort)
    

    结果如下:

    ['pp220', 'CP2475L', 'A137R', 'p30', 'p72', 'A224L']
    [('pp220', 3), ('CP2475L', 2), ('A137R', 2), ('p30', 2), ('p72', 2), ('A224L', 2),('War-045', 1), ('B646L', 1), ('CP312R', 1), ('CP530R', 1), ('K205R', 1), ('KP177R', 1), ('B602L', 1), ('I215L', 1), ('K145R', 1), ('PE184L', 1), ('p12', 1), ('E146L', 1), ('B475L', 1), ('H124R', 1), ('B125R', 1), ('NP1450L', 1), ('EP1242L', 1), ('BA71V-NP868R', 1), ('MGF_110-3L', 1), ('F334L', 1), ('L60L', 1), ('H359L', 1), ('BA71V-F165R', 1), ('BA71V-D117L', 1), ('BA71V-Q706L', 1), ('BA71V-EP84R', 1), ('p54', 1), ('E111R', 1), ('H339R', 1), ('p14.5', 1), ('C147L', 1), ('C129R', 1), ('D205R', 1), ('F778R', 1), ('H171R', 1), ('H233R', 1), ('QP509L', 1), ('D79L', 1), ('M448R', 1), ('A151R', 1), ('B407L', 1), ('DNA', 1), ('I73R', 1), ('G1211R', 1), ('L11L', 1), ('CP123L', 1), ('DP238L', 1), ('E248R', 1), ('BA71V-I226R', 1), ('I177L', 1), ('BA71V-K421R', 1), ('CP80R', 1), ('BA71V-B385R', 1), ('E423R', 1), ('110-6L', 1), ('B175L', 1), ('D345L', 1), ('BA71V-I267L', 1), ('D1133L', 1), ('BA71V-K78R', 1), ('EP424R', 1), ('Mal-139', 1), ('BA71V-S183', 1), ('MGF_110-9L', 1), ('I243L', 1), ('285L', 1), ('BA71V-F1055L', 1), ('BA71V-S273R', 1), ('g5R War-112', 1), ('B962L', 1), ('BA71V-C962R', 1), ('MGF360-19R', 1), ('F317L', 1), ('C122R', 1), ('E301R', 1), ('C84L', 1), ('BA71V-G1340L', 1), ('G1340L', 1), ('PL83L', 1), ('MGF-360-11L', 1), ('ASFV_G_ACD_00600', 1), ('M1107L', 1), ('I9R', 1), ('EP364R', 1), ('B119L', 1), ('MGF100-3L', 1)]
    

    最后根据结果删除重组数据所在的行,即可获得无重复数据的Excel表

    相关文章

      网友评论

          本文标题:利用Python collection库对Excel数据去重

          本文链接:https://www.haomeiwen.com/subject/gmfpyltx.html