julia如何实现并行

作者: ZHANG先森_5850 | 来源:发表于2022-10-23 14:46 被阅读0次

@小光amateur
老师,您好。我用reader读取了整个bam,我想以染色体进行并行,每个核计算一个染色体,输出一个结果,然后最后在汇总到所有染色体的结果到一个文件中。

using DataFrames,CSV,XAM,GenomicFeatures,BioSequences

function generatechdf(reader,chromosomename::AbstractString,start::Int64,final::Int64)
    chdf=Vector{Tuple{String,Int64,Int64,Char,String,LongDNASeq}}()
    for record in eachoverlap(reader,chromosomename,start:final)
        if BAM.ismapped(record)
            a=BAM.refname(record), BAM.position(record),BAM.rightposition(record),f(record),BAM.cigar(record),BAM.sequence(record)
            push!(chdf,a)
        end
    end
    rename!(DataFrame(chdf),[:refname,:position,:rightposition,:strand,:cigar,:sequence])
end

Threads.@threads for number in 1:chr.ngroups #chr is a grouped dataframe containing all chromosomes like the following.
        chromosome=chr[number]
        start=chromosome[1,3];final=chromosome[end,3]
        chromosomename=chromosome[1,1]#like "chr1","chr2"...
        chdf=generatechdf(reader,chromosomename,start,final)
end

这是chr的形式

julia >chr
GroupedDataFrame with 25 groups based on key: Column1
First Group (1482189 rows): Column1 = "chr1"
     Row │ Column1  Column2  Column3
         │ String7  String1  Int64
─────────┼─────────────────────────────
       1 │ chr1     C            10469
       2 │ chr1     C            10471
       3 │ chr1     C            10484
       4 │ chr1     C            10489
       5 │ chr1     C            10589
       6 │ chr1     G            10590
       7 │ chr1     G            10610
       8 │ chr1     C            10617
       9 │ chr1     G            10618
      10 │ chr1     C            10620
      11 │ chr1     C            10633

当我运行多线程时(不确定是否用多线程),就报错了。

Stacktrace:
 [1] wait
   @ ./task.jl:334 [inlined]
 [2] threading_run(func::Function)
   @ Base.Threads ./threadingconstructs.jl:38
 [3] top-level scope
   @ ./threadingconstructs.jl:97

    nested task error: zlib failed to inflate a compressed block
    Stacktrace:

当我运行单线程时就没有这个问题,应该看那个报错是同一reader不能同时操作,我要怎么做呢?我想以每个染色体并行,然后最后把所有结果写成一个dataframe。谢谢老师指点

相关文章

网友评论

    本文标题:julia如何实现并行

    本文链接:https://www.haomeiwen.com/subject/almizrtx.html