题目来自生信技能树论坛
![](https://img.haomeiwen.com/i5011192/d221d02a3b6d11dc.png)
先下载tss文件
wget http://www.biotrainee.com/jmzeng/tmp/hg38.tss
chr7 148697841 148698941
chr7 148698942 148699029
chr7 148699911 148701053
chr7 148701109 148701307
chr7 148701354 148702694
chr7 148703100 148703520
chr7 148703831 148704175
chr7 148704484 148704734
chr7 148704857 148705937
chr7 148706271 148706671
然后针对这些位点操作
脚本如下:
import sys
args=sys.argv
filename1=args[1]
filename2=args[2]
aDict={}
for line in open (filename1):
lineL=line.strip().split("\t")
gene_ID=lineL[0]
chr_name=lineL[1]
start=lineL[2]
end=lineL[3]
if chr_name not in aDict:
aDict[chr_name]={}
aDict[chr_name][start,end]=gene_ID
for line in open (filename2):
lineList=line.strip().split("\t")
Chr_name=lineList[0]
Start=int(lineList[1])
End=int(lineList[2])
for k1,v1 in aDict[Chr_name].items():
if int(k1[0])<=Start <= int(k1[1]) or int(k1[0])<=End <= int(k1[1]) or Start <=int(k1[0])<=End or Start <=int(k1[1])<=End :
print(Chr_name,Start,End,v1)
根据不同的情景,对倒数第二行的if条件进行改变即可
python3 exercise.py hg38.tss test.txt
chr7 148697841 148698941 NM_003592
chr7 148698942 148699029 NM_003592
chr7 148699911 148701053 NM_003592
10个中只找到这3个
网友评论