之前写过一个从sRNA-seq数据中查找miRNA表达量的脚本,但是很傻,每换一个物种,需要守在电脑旁,输入下一个物种的miRNA成熟序列和物种名,当时做2个miRNA花了2天时间守在电脑旁。
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import sys
import os
miRseq = input("请输入miRNA的成熟序列:")
File = sys.argv[1]
if ".gz" in File:
input_file = os.popen("zcat " +File)
else:
input_file = open(File,'r')
Dict = dict()
Total = 0
for line in input_file:
Line = line.rstrip()
if not Line: break
if '>' in Line:
abund = Line.split('-')[1]
Total += int(abund)
else:
Dict[Line] = abund
for key,value in Dict.items():
if key == miRseq:
FPKM = int(value)*10000000/Total
print(value,Total,FPKM)
input_file.close()
现在又要面临同样的工作,之前那么傻的操作再也不想重新经历一次了,于是想进行批量操作,于是想到借助工具
import sys
import os
input_file = open(sys.argv[1],'r')
os.system("ls > totalmcfile.txt")
for line1 in input_file:
line1 = line1.rstrip()
inlist = line1.split()
#print(inlist[0])
mcfile = open('totalmcfile.txt','r')
for mcfilename in mcfile:
mcfilename = mcfilename.rstrip()
print(inlist[0],mcfilename)
if inlist[0] in mcfilename:
prefix = mcfilename.split('.')[0]
print(prefix)
os.system("java -cp /tools/TBtools_JRE1.6.jar biocjava.sRNA.Miner.sRnaReadFinder --inRefMiR "+inlist[5]+" --inFa "+mcfilename+" --outFa "+prefix+".miR3954")
os.system("perl -lane 'if(/>\d+-(\d+)/){$exp=$1}else{print join qq{\t},$ARGV,$_,$exp}' "+prefix+".miR3954|perl -F'\t' -lane 'print if length($F[1])==22'|perl -lane '$count{$F[1]}+=$F[2];END{for(sort {$count{$a}<=>$count{$b}} keys %count){print join qq{\t},$_,$count{$_}}}' > "+prefix+".abundance.miR3954")
input_file.close()
改了一系列的难缠的bug之后,最后发现依然有我无法解决甚至看不懂的bug,临近崩溃的我决定放弃,毕竟不是专业写代码的,我先做到能出结果就好了,于是决定换一个思路。
import sys
import os
input_file = open(sys.argv[1],'r')
os.system("ls > totalmcfile.txt")
output_file = open ('RPM.xls', 'w')
for line1 in input_file:
line1 = line1.rstrip()
inlist = line1.split()
#print(inlist[0])
mcfile = open('totalmcfile.txt','r')
for mcfilename in mcfile:
mcfilename = mcfilename.rstrip()
if inlist[0] in mcfilename:
print(mcfilename)
prefix = mcfilename.split('.')[0]
Dict = {}
Total = 0
for line2 in os.popen("zcat "+mcfilename+""):
line2 = line2.rstrip()
if '>' in line2:
#print(line2)
abundance = line2.split('-')[1]
try:
Total += int(float(abundance))
except ValueError:
pass
else:
Dict[line2] = abundance
for key,value in Dict.items():
if key == inlist[5]:
print(key,value)
RPM = int(value)*10000000/Total
print(mcfilename,key,value,RPM,file = output_file)
input_file.close()
output_file.close()
运行命令
python ../get_miRNA_abundance.py ../good_miR3954_mod.txt
运行结果

慢就慢一点好了,好歹能给个结果
奈何还是基础太差!
网友评论