fastp -- 结果文件

作者: QXPLUS | 来源:发表于2022-05-13 09:06 被阅读0次

    以A01样本为例:

    1_rawdata/A01_R1.fq.gz 1_rawdata/A01_R2.fq.gz

    经过cutadapt后得到
    tempR1 = 'A01_cut_R1.fq' tempR2 = 'A01_cut_R2.fq'

    # fastp 
    # 输入文件: tempR1 = '1_rawdata/A01_cut_R1.fq' tempR2 = '1_rawdata/A01_cut_R2.fq'
    # 输出文件: out = '2_cleandata/A01*'
    
    fastp = """
    /software/fastp/fastp-master/fastp -i {1} -I {2} \
    -o {3}_clean_R1.fq -O {3}_clean_R2.fq \
    -l 50 -g -W 5 \
    -A -w 8 -5 \
    -j {3}_fastp.json -h {3}_fastp.html -s 10
    """
    tempR1 = 'A01_cut_R1.fq' 
    tempR2 = 'A01_cut_R2.fq'
    out = '2_cleandata/A01'
    comm2 = fastp.format(0, tempR1, tempR2, out)
    
    # '/software/fastp/fastp-master/fastp \
    # -i 1_rawdata/A01_cut_R1.fq \
    # -I 1_rawdata/A01_cut_R2.fq \
    # -o 2_cleandata/A01_clean_R1.fq \
    # -O #2_cleandata/A01_clean_R2.fq \
    # -l 50 -g -W 5 -A -w 8 -5 \
    # -j 2_cleandata/A01_fastp.json -h 2_cleandata/A01_fastp.html \
    # -s 10'
    popen(comm2).read()
    
    • fastp 执行结果
    >>> popen(comm2).read()
    Read1 before filtering:
    total reads: 109205
    total bases: 16271850
    Q20 bases: 15778246(96.9665%)
    Q30 bases: 15067666(92.5996%)
    
    Read2 before filtering:
    total reads: 109205
    total bases: 16271850
    Q20 bases: 15607406(95.9166%)
    Q30 bases: 14743647(90.6083%)
    
    Read1 after filtering:
    total reads: 105506
    total bases: 15715013
    Q20 bases: 15353976(97.7026%)
    Q30 bases: 14692223(93.4916%)
    
    Read2 aftering filtering:
    total reads: 105506
    total bases: 15712636
    Q20 bases: 15278227(97.2353%)
    Q30 bases: 14483473(92.1772%)
    
    Filtering result:
    reads passed filter: 211012
    reads failed due to low quality: 7120
    reads failed due to too many N: 32
    reads failed due to too short: 246
    
    Duplication rate: 0.669288%
    
    Insert size peak (evaluated by paired-end reads): 267
    
    JSON report: 2_cleandata/A01_fastp.json
    HTML report: 2_cleandata/A01_fastp.html
    
    /software/fastp/fastp-master/fastp -i 1_rawdata/A01_cut_R1.fq -I 1_rawdata/A01_cut_R2.fq -o 2_cleandata/A01_clean_R1.fq -O 2_cleandata/A01_clean_R2.fq -l 50 -g -W 5 -A -w 8 -5 -j 2_cleandata/A01_fastp.json -h 2_cleandata/A01_fastp.html -s 10 
    fastp v0.20.0, time used: 2 seconds
    
    • 结果文件
      由于在命令中有一个参数-s/--split 10,所以会得到split成10份后的处理结果。
    [Mb18@login TESTA01]$ ll -t 2_cleandata/
    total 76072
    -rw-r--r-- 1 Mb18 mb 3888374 May 12 14:54 0001.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 3887372 May 12 14:54 0001.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb 3898490 May 12 14:54 0002.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 3898254 May 12 14:54 0002.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb 3882417 May 12 14:54 0003.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 3881973 May 12 14:54 0003.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb 3898367 May 12 14:54 0004.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 3897963 May 12 14:54 0004.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb 3891595 May 12 14:54 0005.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 3891717 May 12 14:54 0005.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb 3896737 May 12 14:54 0006.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 3896309 May 12 14:54 0006.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb 3900485 May 12 14:54 0007.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 3899737 May 12 14:54 0007.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb 3893412 May 12 14:54 0008.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 3892842 May 12 14:54 0008.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb 1767806 May 12 14:54 0009.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 1767612 May 12 14:54 0009.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb 5736744 May 12 14:54 0010.A01_clean_R1.fq
    -rw-r--r-- 1 Mb18 mb 5735894 May 12 14:54 0010.A01_clean_R2.fq
    -rw-r--r-- 1 Mb18 mb  474717 May 12 14:54 A01_fastp.html
    -rw-r--r-- 1 Mb18 mb  125054 May 12 14:54 A01_fastp.json
    
    
    • A01_fastp.json
    {
        "summary": {
            "before_filtering": {
                "total_reads":218410,
                "total_bases":32543700,
                "q20_bases":31385652,
                "q30_bases":29811313,
                "q20_rate":0.964416,
                "q30_rate":0.916039,
                "read1_mean_length":149,
                "read2_mean_length":149,
                "gc_content":0.499539
            },
            "after_filtering": {
                "total_reads":211012,
                "total_bases":31427649,
                "q20_bases":30632203,
                "q30_bases":29175696,
                "q20_rate":0.97469,
                "q30_rate":0.928345,
                "read1_mean_length":148,
                "read2_mean_length":148,
                "gc_content":0.496156
            }
        },
        "filtering_result": {
    ...
        },
        "duplication": {
    ...
        },
        "insert_size": {
    ...
        },
        "read1_before_filtering": {
    ...
        },
        "read1_after_filtering": {
    ...
        },
    ...
    
    • fastp.report.html


      A01_fastp.html
    before filtering after filtering
    • 对结果进行处理
    f = open("2_cleandata/A01_fastp.json").read()
    ss = eval(f)
    
    hq_r = ss["summary"]["after_filtering"]["total_reads"]
    hq_b = ss["summary"]["after_filtering"]["total_bases"]
    
    with open("2_cleandata/A01_summary2.xls", "w") as out2:
        out2.write("SampleID\tLib\tHQ Reads(%)\tHQ Data(bp)\tHQ Data(%)\n")
        out2.write("%s\tSPE\t%.2f\t%d\t%.2f\n" % ('A01', 1.0*hq_r/summary[0]*100, hq_b, 1.0*hq_b/summary[1]*100))
    
    
    • 统计结果
      统计过滤后的reads占比
    [login TESTA01]$ less -S 2_cleandata/A01_summary2.xls
    
    SampleID        Lib     HQ Reads(%)     HQ Data(bp)     HQ Data(%)
    A01     SPE     96.57   31427649        95.89
    
    
    • 最后,将拆开的10个结果合并,用于后续分析
    cat 2_cleandata/[0-9][0-9][0-9][0-9].${sample}_clean_R1.fq > 2_cleandata/${sample}_clean_R1.fq &
    cat 2_cleandata/[0-9][0-9][0-9][0-9].${sample}_clean_R2.fq > 2_cleandata/${sample}_clean_R2.fq &
    

    相关文章

      网友评论

        本文标题:fastp -- 结果文件

        本文链接:https://www.haomeiwen.com/subject/ykxeurtx.html