美文网首页
从质控log文件中提取碱基过滤信息(python实现)

从质控log文件中提取碱基过滤信息(python实现)

作者: 曹草BioInfo | 来源:发表于2022-09-18 01:16 被阅读0次

    先准备文件路径

    head -n 2 pwd.list
    /public1/home/scb6498/02_clean_data/final_300/disk1/wait/GM002-F300_disk1-1_fastp.txt
    /public1/home/scb6498/02_clean_data/final_300/disk1/wait/GM003-F300_disk1-1_fastp.txt
    

    有些代码是可有可无的,忘了删了

    #!/bin/python
    import re
    
    fi = open("pwd.list","r")
    fi.seek(0)
    total_txt = fi.readlines()
    num = len(total_txt)
    with open("samstat.txt","w") as fo:
      #header
      fo.write("name\tpwd\tbeforereads\tafterreads\tbeforebases\tafterbases\n")
      for i in range(0,num):
        pwd = total_txt[i]
        pwd = pwd.strip()
        readfile = open(pwd,"r")
        fline = readfile.readlines()
        readslist = []
        baseslist = []
        for j in fline:
          if "total reads:" in j:
            j = j.strip()
            readslist.append(int(j.split(":")[-1]))
          if "total bases:" in j:
            baseslist.append(int(j.split(":")[-1]))
        before_reads = str(readslist[0]+readslist[1])
        after_reads = str(readslist[2]+readslist[3])
        before_bases = str(baseslist[0]+baseslist[1])
        after_bases = str(baseslist[2]+baseslist[3])
        fo.write(re.split('/|_disk|\-disk',pwd)[-2]+"\t"+re.split('/|_fastp.',pwd)[-2]+"\t"+before_reads+"\t"+after_reads+"\t"+before_bases+"\t"+after_bases+"\n")
    

    相关文章

      网友评论

          本文标题:从质控log文件中提取碱基过滤信息(python实现)

          本文链接:https://www.haomeiwen.com/subject/gwqyortx.html