美文网首页基因组学
文献笔记二十九:银合欢(Leucaena trichandra)

文献笔记二十九:银合欢(Leucaena trichandra)

作者: 小明的数据分析笔记本 | 来源:发表于2020-08-12 16:25 被阅读0次
    文章题目

    PacBio-Based Mitochondrial Genome Assembly of Leucaena trichandra (Leguminosae) and an Intrageneric Assessment of Mitochondrial RNA Editing

    发表期刊、单位、年份

    GBE Genome Biology and Evolution
    Accepted: August 17, 2018
    New Mexico State University
    Department of Systematic and Evolutionary Botany, University of Zurich, Switzerland(苏黎世大学)
    论文本地存储名:evy179.pdf

    现阶段还是重点关注完整线粒体的组装方法,原文数据公开,还公布了组装使用的shell脚本,争取重复组装过程

    DNA Extraction, and Sequencing

    sapling 树苗
    polysaccharide 多糖
    Aquagenomic DNA extraction protocol
    For each extraction 10 mg of fresh young leaf material was obtained from a L. trichandra sapling that had been kept in the dark for 24h to reduce polysaccharide concentration.
    DNA with an average fragment size of 21 kbp was submitted for sequencing.
    PacBIo P6-C4 chemistry

    Genome Assembly

    followed an iterative approach
    begins with the assembly of highly conserved regions and extends from that starting point.
    The pipeline involved:

    • using BLASR to map raw reads against the reference
    • filtering hits by a minumum aligned length (500 bp)
    • recovering the qualifying reads to a new fastq file using seqtk
    • assembling reads with Canu.

    The L.trichandra PacBio reads provided sufficient long read data to also assemble the mitochondrial genome.
    Nonetheless, when we identified likely mt-genome contigs recovered from assemblies derived from all the available reads (which includes mitochndrial, nuclear, and plastid data in large computationally intensive analyses), the mitochondrial portion was moderately fragmented (> 7 contigs).

    计算机资源:The project primarily employed an AMD7252 32 core server with 256 GB of RAM.

    将路径改和数据替换为自己的以后运行脚本,遇到报错

    [Pomgroup@localhost Pome_Mito_practice]$ bash Iternative_assembly_Pome_Mito.sh 
    Iternative_assembly_Pome_Mito.sh: line 2: $'\r': command not found
    Iternative_assembly_Pome_Mito.sh: line 4: syntax error near unexpected token `$'\r''
    'ternative_assembly_Pome_Mito.sh: line 4: `
    

    解决办法

    https://hacpai.com/article/1488765818607

    sed -i 's/\r$//' Iternative_assembly_Pome_Mito.sh
    

    原因解释

    https://blog.csdn.net/Lnho2015/article/details/51322289

    Linux的基础知识还有好多得仔细看!

    脚本对应的链接
    https://github.com/cdb3ny/Mitochondrial-Genome-Scripts/blob/master/Iternative_assembly_script.sh

    脚本中用到的命令逐行解释
    • 首先是blasr比对
      用法是
    blasr nanopore.fastq reference.fasta --nproc 16 > blasr.out
    

    blasr.out 好像对饮的是 https://github.com/PacificBiosciences/blasr/wiki/Blasr-Output-Format
    这个链接上的 -m为1

    • 操作输出结果blasr.out
    awk '{a=$8-$7;print $0,a;}' blastr.out
    

    第8列减去第7列赋值给a并且将a添加到文件的最后一列

    awk '{a=$8-$7;print $0,a;}' blastr.out | sort -n -r -k14,14
    

    按照第14列倒叙排列

    awk '{a=$8-$7;print $0,a;}' blastr.out | sort -n -r -k14,14 | awk '$14>500'
    

    第14列大于500的行

    awk '{a=$8-$7;print $0,a;}' blastr.out | sort -n -r -k14,14 | awk '$14>500' | cut -d ' ' -f1,1
    

    以空格作为分隔符分割然后提取第一列
    这样就得到了比对长度大于500的fastq的reads的id

    grep -F -x -v -f
    

    这行命令是干什么的还不知道

    根据id提取序列(fastq)
    seqtk subseq nanopore.fasta  ids.txt > aligned.fastq
    
    canu组装
    canu -p hehuan -d hehuan-oxford genomeSize=2000k -nanopore-raw aligned.fastq
    

    最后再用canu软件组装的结果作为参考序列重复这个过程,原论文的脚本for i in 1:10相当于是重复了10次这个过程。

    好了这篇文章暂时看到这里了

    欢迎大家关注我的公众号
    小明的数据分析笔记本

    公众号二维码.jpg

    相关文章

      网友评论

        本文标题:文献笔记二十九:银合欢(Leucaena trichandra)

        本文链接:https://www.haomeiwen.com/subject/dkxwoqtx.html