@小光amateur
老师,您好。我用reader读取了整个bam,我想以染色体进行并行,每个核计算一个染色体,输出一个结果,然后最后在汇总到所有染色体的结果到一个文件中。
using DataFrames,CSV,XAM,GenomicFeatures,BioSequences
function generatechdf(reader,chromosomename::AbstractString,start::Int64,final::Int64)
chdf=Vector{Tuple{String,Int64,Int64,Char,String,LongDNASeq}}()
for record in eachoverlap(reader,chromosomename,start:final)
if BAM.ismapped(record)
a=BAM.refname(record), BAM.position(record),BAM.rightposition(record),f(record),BAM.cigar(record),BAM.sequence(record)
push!(chdf,a)
end
end
rename!(DataFrame(chdf),[:refname,:position,:rightposition,:strand,:cigar,:sequence])
end
Threads.@threads for number in 1:chr.ngroups #chr is a grouped dataframe containing all chromosomes like the following.
chromosome=chr[number]
start=chromosome[1,3];final=chromosome[end,3]
chromosomename=chromosome[1,1]#like "chr1","chr2"...
chdf=generatechdf(reader,chromosomename,start,final)
end
这是chr的形式
julia >chr
GroupedDataFrame with 25 groups based on key: Column1
First Group (1482189 rows): Column1 = "chr1"
Row │ Column1 Column2 Column3
│ String7 String1 Int64
─────────┼─────────────────────────────
1 │ chr1 C 10469
2 │ chr1 C 10471
3 │ chr1 C 10484
4 │ chr1 C 10489
5 │ chr1 C 10589
6 │ chr1 G 10590
7 │ chr1 G 10610
8 │ chr1 C 10617
9 │ chr1 G 10618
10 │ chr1 C 10620
11 │ chr1 C 10633
当我运行多线程时(不确定是否用多线程),就报错了。
Stacktrace:
[1] wait
@ ./task.jl:334 [inlined]
[2] threading_run(func::Function)
@ Base.Threads ./threadingconstructs.jl:38
[3] top-level scope
@ ./threadingconstructs.jl:97
nested task error: zlib failed to inflate a compressed block
Stacktrace:
当我运行单线程时就没有这个问题,应该看那个报错是同一reader不能同时操作,我要怎么做呢?我想以每个染色体并行,然后最后把所有结果写成一个dataframe。谢谢老师指点
网友评论