美文网首页LinuxCHIP分析
数据分析2-.sra数据拆分与过滤

数据分析2-.sra数据拆分与过滤

作者: 王铄_d468 | 来源:发表于2021-01-19 14:30 被阅读0次

    1.拆分

    >fastq-dump --gzip --split-3 SRR6449842

    2.fastqc质量检测

    参考:
    https://zhuanlan.zhihu.com/p/20731723

    fastqc -o [输出目录] -t [线程数] SRR2050895.fastq.gz

    3.过滤

    3.1对于双端测序不知道接头序列

    fastp --detect_adapter_for_pe -w 8 -i SRR6449842_1.fastq.gz -I SRR6449842_2.fastq.gz -o clean_SRR6449842_1.fastq.gz -O clean_SRR6449842_2.fastq.gz -j SRR6449842_report.json -h SRR6449842_report.html

    3.2对于单端测序不知道接头序列

    fastp  -w 16 -i SRR2050895.fastq.gz -o clean_SRR2050895.fastq.gz  -j SRR2050895_report.json -h SRR2050895_report.html

    输出结果为:

    Detecting adapter sequence for read1... GTGTAAGCATCTGGGTAGTCTGAGTAGCGTCGTGGTATTCCTGAAAGGCCCAGGAAATGT Read1 before filtering: 

    total reads: 45230766 

    total bases: 2236673891 

    Q20 bases: 2219017453(99.2106%) 

    Q30 bases: 2172754983(97.1422%) 

     Read1 after filtering:

     total reads: 45099144 

    total bases: 2229459599 

    Q20 bases: 2211862439(99.2107%) 

    Q30 bases: 2165728977(97.1414%) 

     Filtering result: 

    reads passed filter: 45099144 

    reads failed due to low quality: 352 

    reads failed due to too many N: 64 

    reads failed due to too short: 131206 

    reads with adapter trimmed: 206061

    bases trimmed due to adapters: 7694138 

     Duplication rate (may be overestimated since this is SE data): 57.3777% 

     JSON report: SRR2050895_report.json 

    HTML report: SRR2050895_report.html 

     fastp -w 16 -i SRR2050895.fastq.gz -o clean_SRR2050895.fastq.gz -j SRR2050895_report.json -h SRR2050895_report.html 

    fastp v0.20.0, time used: 226 seconds

    再次使用fastqc进行质控,发现前11个碱基的GC含量有问题

    无所谓了,今天2021年1月20日先把剩余的.fq下载下来。

    相关文章

      网友评论

        本文标题:数据分析2-.sra数据拆分与过滤

        本文链接:https://www.haomeiwen.com/subject/frjkzktx.html