美文网首页RNA
小RNA预测前面步骤

小RNA预测前面步骤

作者: SNAKEIn | 来源:发表于2021-08-27 16:37 被阅读0次

    准备文件及软件

    ViennaRNA-2.4.17(重要)

    ncbi-blast-2.9.0+

    bowtie

    miRDP2-v1.1.4_pipeline.bash(重要)

    程序大致流程

    1.fq文件转化成fa文件(利用两个perl脚本)

    01-run_fq2fa.sh && fq2fa.pl
    perl fq2fa.pl 201_FRRN202416676-1a_clean.fq 201_FRRN202416676-1a.fa
    perl fq2fa.pl 202_FRRN202416677-1a_clean.fq 202_FRRN202416677-1a.fa
    perl fq2fa.pl 203_FRRN202416678-1a_clean.fq 203_FRRN202416678-1a.fa
    perl fq2fa.pl 211_FRRN202416700-1a_clean.fq 211_FRRN202416700-1a.fa
    perl fq2fa.pl 212_FRRN202416701-1a_clean.fq 212_FRRN202416701-1a.fa
    perl fq2fa.pl 213_FRRN202416702-1a_clean.fq 213_FRRN202416702-1a.fa
    perl fq2fa.pl 281_FRRN202416688-1a_clean.fq 281_FRRN202416688-1a.fa
    perl fq2fa.pl 282_FRRN202416689-1a_clean.fq 282_FRRN202416689-1a.fa
    perl fq2fa.pl 283_FRRN202416690-1a_clean.fq 283_FRRN202416690-1a.fa
    perl fq2fa.pl 402_FRRN202416679-1a_clean.fq 402_FRRN202416679-1a.fa
    perl fq2fa.pl 403_FRRN202416680-1a_clean.fq 403_FRRN202416680-1a.fa
    perl fq2fa.pl 404_FRRN202416681-1a_clean.fq 404_FRRN202416681-1a.fa
    perl fq2fa.pl 481_FRRN202416691-1a_clean.fq 481_FRRN202416691-1a.fa
    perl fq2fa.pl 482_FRRN202416692-1a_clean.fq 482_FRRN202416692-1a.fa
    perl fq2fa.pl 483_FRRN202416693-1a_clean.fq 483_FRRN202416693-1a.fa
    perl fq2fa.pl 601_FRRN202416682-1a_clean.fq 601_FRRN202416682-1a.fa
    perl fq2fa.pl 603_FRRN202416683-1a_clean.fq 603_FRRN202416683-1a.fa
    perl fq2fa.pl 604_FRRN202416684-1a_clean.fq 604_FRRN202416684-1a.fa
    perl fq2fa.pl 681_FRRN202416694-1a_clean.fq 681_FRRN202416694-1a.fa
    perl fq2fa.pl 682_FRRN202416695-1a_clean.fq 682_FRRN202416695-1a.fa
    perl fq2fa.pl 683_FRRN202416696-1a_clean.fq 683_FRRN202416696-1a.fa
    perl fq2fa.pl 801_FRRN202416685-1a_clean.fq 801_FRRN202416685-1a.fa
    perl fq2fa.pl 802_FRRN202416686-1a_clean.fq 802_FRRN202416686-1a.fa
    perl fq2fa.pl 803_FRRN202416687-1a_clean.fq 803_FRRN202416687-1a.fa
    perl fq2fa.pl 811_FRRN202416703-1a_clean.fq 811_FRRN202416703-1a.fa
    perl fq2fa.pl 812_FRRN202416704-1a_clean.fq 812_FRRN202416704-1a.fa
    perl fq2fa.pl 813_FRRN202416705-1a_clean.fq 813_FRRN202416705-1a.fa
    perl fq2fa.pl 881_FRRN202416697-1a_clean.fq 881_FRRN202416697-1a.fa
    perl fq2fa.pl 882_FRRN202416698-1a_clean.fq 882_FRRN202416698-1a.fa
    perl fq2fa.pl 883_FRRN202416699-1a_clean.fq 883_FRRN202416699-1a.fa
    #! /usr/bin/perl -w
    use strict;
    use warnings;
    
    if(@ARGV!=2)
    {
            print "\nperl $0 <fq> <output>\n";
            exit;
    }
    
    open OUT,">$ARGV[1]";
    if($ARGV[0]=~/\.gz$/)
    {
            open IN,"gzip -dc $ARGV[0] | " || die "Cannot open the file '$ARGV[0]'.\n";
    }
    else
    {
            open IN,"$ARGV[0]" || die "Cannot open the file '$ARGV[0]'.\n";
    }
    while(<IN>)
    {
            chomp;
            if($_=~/^\@/)
            {
                    my $line=$_;
                    $line=~s/^\@//;
                    my $seq=<IN>;
                    chomp $seq;
                    <IN>;
                    <IN>;
                    print OUT "\>$line\n$seq\n";
            }
    }
    close IN
    

    得到结果fa文件
    顺便利用脚本进行 collapse 一下 除去冗余序列
    perl collapse_reads_md.pl 201_FRRN210045013-1a_clean.fa 201 > 201_collapsed.fa
    perl collapse_reads_md.pl 202_FRRN210045014-1a_clean.fa 202 > 202_collapsed.fa
    perl collapse_reads_md.pl 203_FRRN210045015-1a_clean.fa 203 > 203_collapsed.fa
    perl collapse_reads_md.pl 211_FRRN210045019-1a_clean.fa 211 > 211_collapsed.fa
    perl collapse_reads_md.pl 212_FRRN210045020-1a_clean.fa 212 > 212_collapsed.fa
    perl collapse_reads_md.pl 213_FRRN210045021-1a_clean.fa 213 > 213_collapsed.fa
    perl collapse_reads_md.pl 281_FRRN210045016-1a_clean.fa 281 > 281_collapsed.fa
    perl collapse_reads_md.pl 282_FRRN210045017-1a_clean.fa 282 > 282_collapsed.fa
    perl collapse_reads_md.pl 283_FRRN210045018-1a_clean.fa 283 > 283_collapsed.fa
    perl collapse_reads_md.pl 801_FRRN210045022-1a_clean.fa 801 > 801_collapsed.fa
    perl collapse_reads_md.pl 802_FRRN210045023-1a_clean.fa 802 > 802_collapsed.fa
    perl collapse_reads_md.pl 803_FRRN210045024-1a_clean.fa 803 > 803_collapsed.fa
    perl collapse_reads_md.pl 811_FRRN210045028-1a_clean.fa 811 > 811_collapsed.fa
    perl collapse_reads_md.pl 812_FRRN210045029-1a_clean.fa 812 > 812_collapsed.fa
    perl collapse_reads_md.pl 813_FRRN210045030-1a_clean.fa 813 > 813_collapsed.fa
    perl collapse_reads_md.pl 881_FRRN210045025-1a_clean.fa 881 > 881_collapsed.fa
    perl collapse_reads_md.pl 882_FRRN210045026-1a_clean.fa 882 > 882_collapsed.fa
    perl collapse_reads_md.pl 883_FRRN210045027-1a_clean.fa 883 > 883_collapsed.fa

    2.过滤数据库rfam里面存在的snRNA,tRNA,rRNA(先建库,再blast-short比对)。

    进入rfam官网下载已经注释过的小RNA序列进行过滤(自行准备构建数据库的TXT文件)

    linux 中Rfam文件夹

    需要首先构建需要的数据库

    #!/bin/bash
    trna_no=`awk '{print $1}' rfam.anno.tRNA.txt|sed 's/$/&\.fa/g'`
    cat $trna_no|sed -n '/Brassica rapa/{p;n;p}' > rfam.Br.trna.fa
    snrna_no=`awk '{print $1}' rfam.anno.snRNA.txt|sed 's/$/&\.fa/g'`
    cat $snrna_no|sed -n '/Brassica rapa/{p;n;p}' > rfam.Br.snrna.fa
    rrna_no=`awk '{print $1}' rfam.anno.rRNA.txt|sed 's/$/&\.fa/g'`
    cat $rrna_no|sed -n '/Brassica rapa/{p;n;p}' > rfam.Br.rrna.fa
    

    利用数据库进行blast筛选

    #!/bin/bash
    cd /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/all_fa/collapsed/
    all_collapsed=`ls *.fa`
    cd /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/blastresult
    for each in $all_collapsed
            do
            mkdir /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/blastresult/${each%_collapsed.fa}_blast_result
            cd /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/blastresult/${each%_collapsed.fa}_blast_result
            blastn -task blastn-short -query /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/all_fa/collapsed/$each -out ${each%.fa}_trna.blast -db /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/database/rfam.Br.trna.fa -outfmt 6 -evalue 0.01 -num_threads 15 > trna.log 2>&1
            blastn -task blastn-short -query /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/all_fa/collapsed/$each -out ${each%.fa}_rrna.blast -db /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/database/rfam.Br.rrna.fa -outfmt 6 -evalue 0.01 -num_threads 15 > rrna.log 2>&1
            blastn -task blastn-short -query /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/all_fa/collapsed/$each -out ${each%.fa}_snrna.blast -db /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/database/rfam.Br.snrna.fa -outfmt 6 -evalue 0.01 -num_threads 15 > snrna.log 2>&1
            echo "blastn on $each against rfam trna/rna/snrna/ completed."
            cat *.blast | awk '{print $1}' | sort | uniq > annotated.txt
            sed -n '/^>/p' /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/all_fa/collapsed/$each | sed 's/^>//g' | sort >  all_seq.txt
            sort all_seq.txt annotated.txt annotated.txt | uniq -u > unannotated.txt
            perl /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/script/find_seq.pl /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/all_fa/collapsed/$each unannotated.txt ../${each%.fa}_rfam_filtered.fa
    done
    

    脚本及得到结果展示

    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:07 202_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 61105346 Aug 27 11:07 202_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:08 203_blast_result(文件夹)
    -rw-rw-r-- 1 zhaorz zhaorz 44977522 Aug 27 11:08 203_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:08 211_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 74839342 Aug 27 11:08 211_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:09 212_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 73569078 Aug 27 11:09 212_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:09 213_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 38660943 Aug 27 11:09 213_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:10 281_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 27353974 Aug 27 11:10 281_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:10 282_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz  2577866 Aug 27 11:10 282_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:11 283_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 45590628 Aug 27 11:11 283_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:11 801_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 80959125 Aug 27 11:11 801_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:12 802_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 70634094 Aug 27 11:12 802_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:12 803_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 73491769 Aug 27 11:13 803_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:13 811_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 99104952 Aug 27 11:13 811_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:14 812_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 72978623 Aug 27 11:14 812_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:14 813_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 85759911 Aug 27 11:15 813_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:15 881_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 25394610 Aug 27 11:15 881_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:15 882_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 36754402 Aug 27 11:16 882_collapsed_rfam_filtered.fa
    drwxrwxr-x 2 zhaorz zhaorz     4096 Aug 27 11:16 883_blast_result
    -rw-rw-r-- 1 zhaorz zhaorz 59156802 Aug 27 11:16 883_collapsed_rfam_filtered.fa
    文件处理脚本
    
    

    3.bowtie过滤比对上的结果

    #!/bin/bash
    cd /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/blastresult
    for each in `ls *.fa`
    do
            cd /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/exon_filter
            bowtie -S -f -a -v 1 --best --strata -m 20 -p 10 --al ${each%.fa}_for_exon.fa --un ${each%.fa}_no_exom.fa /40t_1/zhaorz/Brapa/new_gff_cds_v1.5.fa /40t_1/zhaorz/Brap_mRNA_miRNA/2.miRNA_analysis/1.sRNAanno/blastresult/$each ${each%.fa}_exon_alignment.sam
            rm ${each%.fa}_exon_alignment.sam
    done
    

    4.对最终没有比对上的fa进行预测

    -rw-rw-r-- 1 zhaorz zhaorz 45580592 Aug 27 11:27 201_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 50716287 Aug 27 11:28 202_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 34495664 Aug 27 11:28 203_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 61030605 Aug 27 11:28 211_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 63974638 Aug 27 11:28 212_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 32602771 Aug 27 11:28 213_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 22284478 Aug 27 11:28 281_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz  2206115 Aug 27 11:28 282_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 38551549 Aug 27 11:28 283_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 70757578 Aug 27 11:28 801_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 62661095 Aug 27 11:28 802_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 63474741 Aug 27 11:28 803_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 87078561 Aug 27 11:28 811_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 66189962 Aug 27 11:28 812_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 73387062 Aug 27 11:28 813_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 22931447 Aug 27 11:28 881_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 32011463 Aug 27 11:29 882_collapsed_rfam_filtered_no_exom.fa
    -rw-rw-r-- 1 zhaorz zhaorz 51799744 Aug 27 11:29 883_collapsed_rfam_filtered_no_exom.fa
    

    利用 cat *fa > final_fa(不确定可行性)
    得到一个fa文件进行下一步的预测

    nohup miRDP2-v1.1.4_pipeline.bash -g /40t_1/zhaorz/Brapa/Brapa_sequence_v1.5.fa -x /40t_1/zhaorz/Brapa/bt_v1.5_index/Brapa_sequence_v1.5.fa -p 10 -f -i final_fa  -o . &
    
    

    相关文章

      网友评论

        本文标题:小RNA预测前面步骤

        本文链接:https://www.haomeiwen.com/subject/nvmmiltx.html