pavcp软件简介及基本操作流程:
泛基因组Psvcp测试 - 简书 (jianshu.com)
对于代码,
/public/home/fengting/demo/psvcp/test/hhz_new/structure_variation_calling_script/1depth_call_pav.py
测试:
# Usage: python3 $0 depth.infor.gz.file out.file depth_threshold gap_threshold
# The input file is the depth information file from mosdepth output, it can be per-base.bed.gz or regions.bed.gz
python3 ../structure_variation_calling_script/1depth_call_pav.py DL539-1.regions.bed.gz 539 5 50
![](https://img.haomeiwen.com/i25274977/e59df4d6af4d5c17.png)
# The output file show the present/absent situation in one sample. This file have 8 columns. On every line, the 2 column means the begining of the absence region and the 6 column means the ending of the absence region. The 2 column also means the ending of a presence region of last line and he 6 column means the begining of a presence region of next line. the 4 and 8 columns contain the depth information.
输出文件显示了一个样本中存在/缺失情况。该文件有8列。在每一行中,第2列表示缺失区域的开始,第6列表示缺失区域的结束。第2列也表示上一行存在区域的结束,第6列表示下一行存在区域的开始。第4列和第8列包含深度信息。
``python
for line in depths:
line = line.decode('utf-8')
# Code for processing each line in the depths file
这是一个for循环,用于遍历深度信息文件的每一行。在循环内部,使用line.decode('utf-8')
将每一行进行解码,以确保正确处理字符串。
部分代码:
if chromosome.split()[0] == line.split()[0]:
# Code for handling lines with the same chromosome
else:
# Code for handling lines with different chromosomes
这部分代码基于染色体的信息来处理具有相同染色体的行和具有不同染色体的行。如果染色体信息匹配,则执行第一个分支,否则执行第二个分支。
if float(line.split()[3]) <= float(depth_threshold):
# Code for handling lines with depth below the threshold
if av_or_not:
absence_present_line = line.rstrip()
av_or_not = False
chromosome = line
else:
chromosome = line
else:
# Code for handling lines with depth above the threshold
if av_or_not:
chromosome = line
else:
absence_present_line = absence_present_line + '\t' + line.rstrip()
这部分代码根据深度信息是否低于阈值来处理行。如果深度低于阈值,则执行第一个分支,将行作为缺失区域的开始。如果深度高于阈值,则执行第二个分支,将行添加到缺失区域的结束。
if float(absence_present_line.split()[5]) - float(absence_present_line.split()[1]) > float(gap_threshold):
print(absence_present_line, file=outputfile)
av_or_not = True
chromosome = line
这部分代码用于判断缺失区域的长度是否超过间隙阈值
,如果超过,则将缺失区域信息输出到输出文件,并将标志位av_or_not
设置为True,以处理下一个存在区域的开始。同时,将当前行作为下一个存在区域的开始。
![](https://img.haomeiwen.com/i25274977/c9ebc3c939460401.png)
![](https://img.haomeiwen.com/i25274977/f14638cc715e1b65.png)
网友评论