美文网首页Usearch documentation信息收集
Usearch fastq_mergepairs 命令使用信息搬

Usearch fastq_mergepairs 命令使用信息搬

作者: 代号北极能 | 来源:发表于2019-06-28 16:33 被阅读0次

    All the following information come from www.drive5.com, I just use this as a notebook for my learning, I declare no commercial interest with this. Everyone who see this document should refer to www.drive5.com. 

    I got some problem when I was trying to merge my read data, then I collected some information, they are shown as following. 


    The fastq_mergepairs command merges (assembles) paired-end reads to create consensus sequences and, optionally, consensus quality scores. This command has many features and options so I recommend spending some time browsing the documentation to get familiar with the capabilities of fastq_mergepairs and issues that arise in read merging.


    Basic usage

    The simplest way to use fastq_mergepairs is to specify the the forward and reverse FASTQ filenames and an output FASTQ filename.

    usearch -fastq_mergepairs SampleA_R1.fastq -reverse SampleA_R2.fastq -fastqout merged.fq


    Automatic R2 filename

    If the -reverse option is omitted, the reverse FASTQ filename is constructed by replacing R1 with R2. The following command line is equivalent to the example above.

    usearch -fastq_mergepairs SampleA_R1.fastq -fastqout merged.fq


    Merging multiple FASTQ file pairs in a single command

    You can specify two or more FASTQ filenames following -fastq_mergepairs. In the following example, SampleA and SampleB are both merged. The R2 filenames are constructed automatically as explained above, or can be given explicitly using the -reverse option.

    usearch -fastq_mergepairs SampleA_R1.fastq SampleB_R1.fastq -fastqout merged.fq

    usearch -fastq_mergepairs *_R1*.fastq  -fastqout merged.fq (This is what I was using when I had 45 reads).


    Adding sample identifiers to read labels

    If multiple samples are combined into a single file as shown in some of the above examples, then you lose track of which read came from which sample. This is addressed by adding a sample identifier to each read label. The simplest method is to use the -sample option, e.g.

    usearch -fastq_mergepairs SampleA_R1.fastq -fastqout merged.fq -sample SampleA

    The string sample=SampleA; will be added at the end of the read label.


    Getting the sample identifier from the FASTQ filename

    FASTQ filenames are often based on the sample identifier, e.g. SampleA_R1.fastq. If you specify  -relabel @ then fastq_mergepairs gets the sample identifier from the FASTQ file name by truncating at the first underscore (_) or period (.). A period and the read number is added after the sample identifier to make the new read label, which replaces the original label. This differs from the -sample option, which adds the sample= annotation at the end of the label. The usearch_global command understands both of these methods for putting sample identifiers into read labels..

    usearch -fastq_mergepairs SampleA_R1.fastq -fastqout merged.fq  -relabel @


    Merging multiple files with sample identifiers

    By using wildcards and the  -relabel @ option you can merge multiple files and add sample identifiers to the read labels, for example:

    usearch -fastq_mergepairs *R1*.fastq -fastqout merged.fq  -relabel @


    fastq_mergepairs options

    Input files

    -

    fastq_mergepairs  Forward FASTQ filename(s).  -reverse  Reverse FASTQ filename(s). If not given, constructed by replacing R1 with R2.

    -interleaved  Forward and reverse reads are interleaved in the same file (sometimes produced by SRA fastq-dump).

    Output files

    -

    fastqout  FASTQ filename for merged reads.

    -fastaout  FASTA filename for merged reads.

    -fastqout_notmerged_fwd  FASTQ filename for forward reads which were not merged.

    -fastaout_notmerged_fwd  FASTA filename for forward reads which were not merged.

    -fastqout_notmerged_rev  FASTQ filename for reverse reads which were not merged.

    -fastaout_notmerged_rev  FASTA filename for reverse reads which were not merged.

    Reports

     -report   Filename for summary report. See Reviewing a fastq_mergepairs report to check for problems.

    -tabbedout  Tabbed text file containing detailed information about merging process for each pair including reason for discarding.

    -alnout  Human-readable alignments. Useful for trouble-shooting.

    Merged read labels

    -relabel  Prefix string for output labels. The read number 1, 2, 3... is appended after the prefix.

    -relabel @ Relabel using prefix string constructed from FASTQ filename, this will be understood as the sample identifier.

      -sample  xxx Append sample identifier to read label using sample=xxx; format. This is an alternative method for adding sample ids.

    -fastq_eeout  Add ee=xxx; annotation with the number of expected errors in the merged read.

    -label_suffix  Suffix to append to merged read label. Can be used e.g. to add sample=xxx; type of sample identifier annotations.

    Filtering

      -fastq_maxdiffs  Maximum number of mismatches in the alignment. Default 5. Consider increasing if you have long overlaps.

    -fastq_pctid  Minimum %id of alignment. Default 90. Consider decreasing if you have long overlaps.

    -fastq_nostagger  Discard staggered pairs. Default is to trim overhangs (non-biological sequence).

    -fastq_minmergelen  Minimum length for the merged sequence. See Filtering artifacts by setting a merge length range.

    -fastq_maxmergelen  Maximum length for the merged sequence.

    -fastq_minqual  Discard merged read if any merged Q score is less than the given value. (No minimum by default).

    -fastq_minovlen  Discard pair if alignment is shorter than given value. Default 16.

    Pre-processing of reads before alignment

     -fastq_trunctail  Truncate reads at the first Q score with <= this value. Default 2.

    -fastq_minlen  Discard pair if either read is shorter than this, after truncating by -fastq_trunctail if applicable. Default 64.

    Multi-threading

     -threads Specifies the number of threads. Default 10, or the number of CPU cores, which ever is less.

    相关文章

      网友评论

        本文标题:Usearch fastq_mergepairs 命令使用信息搬

        本文链接:https://www.haomeiwen.com/subject/gmrecctx.html