美文网首页
处理 CSV 数据文件

处理 CSV 数据文件

作者: 三方斜阳 | 来源:发表于2021-02-27 20:24 被阅读0次

    记录python 使用pandas 处理 csv 文件常规程序:

    1. 读取 csv 文件,获取数据:

    • 1. pandas
    import pandas as pd
    import csv
    data=pd.read_csv('zi_202105071017.csv',encoding='utf-8')
    print(data.head())# 查看前5行数据
    print(data.tail())# 查看最后5行数据,括号里可以指定查看行数
    >>
          zi_id  fanti  struct zi_str     freq
    0      1    NaN     NaN      一  1338743
    1      2    NaN     NaN      丁    11857
    2      3    NaN     NaN      七    14477
    3      4    NaN     NaN      万    28095
    4      5    NaN     NaN      丈    15697
          zi_id  fanti  struct zi_str  freq
    4961   4962    NaN     NaN      龋    27
    4962   4963    NaN     NaN      龙  7012
    4963   4964    NaN     NaN      龚    57
    4964   4965    NaN     NaN      龛   379
    4965   4966    NaN     NaN      龟   647
    [4966 rows x 5 columns]
    ==================================================================================================================
    print(data['zi_str'])#访问指定的列
    print(data['zi_str'].values)#取出对应的值,依次放入 list 
    >>
    0       一
    1       丁
    2       七
    3       万
    4       丈
    Name: zi_str, Length: 4966, dtype: object
    ['一' '丁' '七' ... '龚' '龛' '龟']
    ======================================================================================================
    data.drop('freq', axis=1, inplace=True)
    data.drop('struct', axis=1, inplace=True)#删除指定的列
    >>
          zi_id  fanti zi_str
    0         1    NaN      一
    1         2    NaN      丁
    2         3    NaN      七
    3         4    NaN      万
    4         5    NaN      丈
    [4966 rows x 3 columns]
    
    • 2. csv.reader()
    with open("zi_202105071017.csv",'r',encoding='utf-8') as f: 
            rows = [row for row in csv.reader(f)]
    print(rows[:5])
    >>
    [['zi_id', 'fanti', 'struct', 'zi_str', 'freq'], ['1', '', '', '一', '1338743'], ['2', '', '', '丁', '11857'], ['3', '', '', '七', '14477'], ['4', '', '', '万', '28095']]
    

    读取 CSV 文件并取出指定行写入新的 CSV文件

    import csv
    count=0
    with open("out.csv", 'r',encoding='utf-8', newline='') as file:
        with open("train.csv", 'w',encoding='utf-8', newline='') as trian:
            with open("valid.csv", 'w',encoding='utf-8', newline='') as valid:
                csvreader = csv.reader(file)
                trian_csvwriter = csv.writer(trian)
                valid_csvwriter = csv.writer(valid)
                valid_csvwriter.writerow(['text','ner_tags'])
                for row in csvreader:
                    if count<10001:
                        trian_csvwriter.writerow(row)
                    elif count>=10001 and count<=13000:
                        valid_csvwriter.writerow(row)
                    else:
                        break
                    count+=1
    

    3. 数据写入 csv:

    pd.DataFrame(data=pred_label, index=range(len(pred_label))).to_csv('pred.csv')
    

    eg2:

    with open('test.csv','w',encoding="utf-8",errors='ignore',newline='')as f:
      csv_write=csv.writer(f,dialect='excel')
      csv_write.writerow(['text','ner_tags'])
      stu1=['今天天气还行','B,I,O,I,I,O']
      csv_write.writerow(stu1)
    

    相关文章

      网友评论

          本文标题:处理 CSV 数据文件

          本文链接:https://www.haomeiwen.com/subject/zylifltx.html