美文网首页
如何简单上手序列突变分析?

如何简单上手序列突变分析?

作者: 生信雀 | 来源:发表于2020-08-22 20:10 被阅读0次

    无论是小尺度(少量)的序列集,还是大尺度(大量)的序列集,无论是基因片段的突变分析,还是全基因组的突变分析。

    又或者,序列为非编码基因,或非编码基因+编码基因,或单纯的编码基因,或是蛋白质的氨基酸序列。

    一图一个软件应该够了。

    附BioAider下载地址:https://github.com/ZhijianZhou01/BioAider


    BioAider的基因突变分析功能

    This function could be used for analysis of the mutations characteristicson on large numbers of sequenced strains.  The sequence datas for analysis needs to be aligned in advance, and   they could be nucleotides, proteins ( amino acid )sequences or simply   coding gene fragments. For nucleotides and proteins sequences, BioAider   could summarizes all the mutation sites with corresponding frequency and  strains.

    Of course, if the datas is codon gene, BioAider provides multiple  sets of different codon tables for users, and could scan each condon   sites in aligned sequence datasets, and identifies the type of mutation,   including synonymous, non-synonymous, insertions and deletions and   early termination. Finally, BioAider will automatically summarize and  output the relevant analysis results.

    Note: The codon gene sequences for mutations analysis have to   be aligned by translation-alignment methon in advance, It is worth   mentioning that BioAider packed three multiple-sequence-alignment   software (mafft, muscle and clsutal-omega) in the graphical interface,  and provided translation-alignment additionally.

    Whether it’s nucleotides or amino acids or coding genes, BioAider  could plot the frequency distribution graph for mutation sites through   specifing groups of substitution frequencey in custom.

    Eaxmple of mutations analysis for aligned SARS-CoV-2 ORF3a gene (一个编码基因) sequences.

    First, create frequency grouping in a table editor:

    The each groups of substitution frequencey contains start value and end value  which are separated by tab symbol. Note, the start value of each group is not included in the range of frequency, and the frequencies of different groups need to be consecutive integers.

    Then copy them to the textedit box of BioAider,and select "Codon" single button in  "Datas type":

    After the run is over, these analysis result could be found in the  directory where the source file is located, you could scan the *_mutation site summary file then know the overall variation and mutation hotspots.

    You could also konw the number of mutation sites under each mutation frequency group through view *_substitution frequency distribution.png.

    It is not difficult to find that more than half of the mutation sites  only appear in a single strain, although there are many mutation sites  in ORF3a gene. Of course,BioAider  additionally  provides  vector  graphics  (*_substitution  frequency distribution.pdf), users can edit them and facilitate publication.

    Besides, users could obtain the corresponding mutant strains of these variant sites in the detailed *_log.txt file.

    Of note, if these sequences are much divergent, such as from different family enver order and contain a lot of gaps ("-") in the aligned sequence, I usually don't recommend using them for mutation analysis. On the one hand, they would make a lot of calculations, on the other hand, they are inherently highly variable and have no value of analysis.

    相关文章

      网友评论

          本文标题:如何简单上手序列突变分析?

          本文链接:https://www.haomeiwen.com/subject/fuldjktx.html