美文网首页ncRNA
一整天写完一个烂脚本

一整天写完一个烂脚本

作者: 纳灰灰 | 来源:发表于2019-04-03 22:05 被阅读26次

之前写过一个从sRNA-seq数据中查找miRNA表达量的脚本,但是很傻,每换一个物种,需要守在电脑旁,输入下一个物种的miRNA成熟序列和物种名,当时做2个miRNA花了2天时间守在电脑旁。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import sys
import os
miRseq = input("请输入miRNA的成熟序列:")
File = sys.argv[1]
if ".gz" in File:
    input_file = os.popen("zcat " +File)
else:
    input_file = open(File,'r')
Dict = dict()
Total = 0
for line in input_file:
    Line = line.rstrip()
    if not Line: break
    if '>' in Line:
        abund = Line.split('-')[1]
        Total += int(abund)
    else:
        Dict[Line] = abund
for key,value in Dict.items():
    if key == miRseq:
        FPKM = int(value)*10000000/Total
        print(value,Total,FPKM)
input_file.close()

现在又要面临同样的工作,之前那么傻的操作再也不想重新经历一次了,于是想进行批量操作,于是想到借助工具

import sys
import os
input_file = open(sys.argv[1],'r')
os.system("ls > totalmcfile.txt")
for line1 in input_file:
    line1 = line1.rstrip()
    inlist = line1.split()
    #print(inlist[0])
    mcfile = open('totalmcfile.txt','r')
    for mcfilename in mcfile:
        mcfilename = mcfilename.rstrip()
        print(inlist[0],mcfilename)
        if inlist[0] in mcfilename:
            prefix = mcfilename.split('.')[0]
            print(prefix)
            os.system("java -cp /tools/TBtools_JRE1.6.jar biocjava.sRNA.Miner.sRnaReadFinder --inRefMiR "+inlist[5]+" --inFa "+mcfilename+" --outFa "+prefix+".miR3954")
            os.system("perl -lane 'if(/>\d+-(\d+)/){$exp=$1}else{print join qq{\t},$ARGV,$_,$exp}' "+prefix+".miR3954|perl -F'\t' -lane 'print if length($F[1])==22'|perl -lane '$count{$F[1]}+=$F[2];END{for(sort {$count{$a}<=>$count{$b}} keys %count){print join qq{\t},$_,$count{$_}}}' > "+prefix+".abundance.miR3954")
input_file.close()

改了一系列的难缠的bug之后,最后发现依然有我无法解决甚至看不懂的bug,临近崩溃的我决定放弃,毕竟不是专业写代码的,我先做到能出结果就好了,于是决定换一个思路。

import sys
import os
input_file = open(sys.argv[1],'r')
os.system("ls > totalmcfile.txt")
output_file = open ('RPM.xls', 'w')
for line1 in input_file:
    line1 = line1.rstrip()
    inlist = line1.split()
    #print(inlist[0])
    mcfile = open('totalmcfile.txt','r')
    for mcfilename in mcfile:
        mcfilename = mcfilename.rstrip()
        if inlist[0] in mcfilename:
            print(mcfilename)
            prefix = mcfilename.split('.')[0]
            Dict = {}
            Total = 0
            for line2 in os.popen("zcat "+mcfilename+""):
                line2 = line2.rstrip()
                if '>' in line2:
                    #print(line2)
                    abundance = line2.split('-')[1]
                    try:
                        Total += int(float(abundance))
                    except ValueError:
                        pass
                else:
                    Dict[line2] = abundance
            for key,value in Dict.items():
                if key == inlist[5]:
                    print(key,value)
                    RPM = int(value)*10000000/Total
                    print(mcfilename,key,value,RPM,file = output_file)
input_file.close()
output_file.close()

运行命令

python ../get_miRNA_abundance.py ../good_miR3954_mod.txt

运行结果


image.png

慢就慢一点好了,好歹能给个结果
奈何还是基础太差!

相关文章

网友评论

    本文标题:一整天写完一个烂脚本

    本文链接:https://www.haomeiwen.com/subject/xwitiqtx.html