后续代码准备
文件夹建立primers.fasta文件,保存接头序列
>primer_5p
AAGCAGTGGTATCAACGCAGAGTACATGGGG
>primer_3p
AAGCAGTGGTATCAACGCAGAGTAC
运行pb的去接头软件lima,(不是基因芯片的差异分析用的limma)参数如下:
lima:Demultiplex barcoded samples 可以去接头,可以拆分多barcode的样本
lima /home/Acer1/ZhaoJing/rice_pacbio/movie.ccs.bam /home/Acer1/ZhaoJing/rice_pacbio/primers.fasta /home/Acer1/ZhaoJing/rice_pacbio/demux.ccs.bam --isoseq --no-pbi
Usage: lima [options] INPUT BARCODES OUTPUT
Lima, Demultiplex Barcoded PacBio Data and Clip Barcodes
Library Design:
-s,--same Only keep same barcodes in a pair in BAM
output.
-d,--different Only keep different barcodes in a pair in BAM
output. Enforces --min-passes ≥ 1.
Input Limitations:
-p,--per-read Do not tag per ZMW, but per read.
-f,--score-full-pass Only use subreads flanked by adapters for
barcode identification.
-n,--max-scored-barcode-pairs Only use up to N barcode pair regions to find
the barcode, 0 means use all. [0]
-b,--max-scored-barcodes Analyze at maximum the provided number of
barcodes per ZMW; 0 means deactivated. [0]
-a,--max-scored-adapters Analyze at maximum the provided number of
adapters per ZMW; 0 means deactivated. [0]
-u,--min-passes Minimal number of full passes. [0]
-l,--min-length Minimum sequence length after clipping. [50]
-L,--max-input-length Maximum input sequence length, 0 means
deactivated. [0]
-M,--bad-adapter-ratio Maximum ratio of bad adapter. [0]
-P,--shared-prefix Barcodes may be substrings of others.
Barcode Region:
-w,--window-size-mult The candidate region size multiplier:
barcode_length * multiplier. [1.5]
-W,--window-size-bp The candidate region size in bp. If set,
--window-size-mult will be ignored. [0]
-r,--min-ref-span Minimum reference span relative to the barcode
length. [0.5]
-R,--min-scoring-regions Minimum number of barcode regions with
sufficient relative span to the barcode length.
[1]
Score Filters:
-m,--min-score Reads below the minimum barcode score are
removed from downstream analysis. [0]
-i,--min-end-score Minimum end barcode score threshold is applied
to the individual leading and trailing ends.
[0]
-x,--min-signal-increase The minimal score difference, between first
and combined, required to call a barcode pair
different. [10]
-y,--min-score-lead The minimal score lead required to call a
barcode pair significant. [10]
Index Sorting:
-k,--keep-tag-idx-order Keep identified order of barcode pair indices
in BC tag; CCS only.
-K,--keep-split-idx-order Keep identified order of barcode pair indices
in split BAM names; CCS only.
Aligner Configuration:
--ccs CCS mode, use optimal alignment options -A 1
-B 4 -D 3 -I 3 -X 4.
-A,--match-score Score for a sequence match. [4]
-B,--mismatch-penalty Penalty for a mismatch. [13]
-D,--deletion-penalty Deletions penalty. [7]
-I,--insertion-penalty Insertion penalty. [7]
-X,--branch-penalty Branch penalty. [4]
Output Restrictions:
--split-bam Split BAM output by barcode pair.
--split-bam-named Split BAM output by resolved barcode pair name.
--bam-handles Maximum number of open BAM files. [500]
--dump-clips Dump clipped regions in a separate output file
<prefix>.lima.clips
--dump-removed Dump removed records to
<prefix>.lima.removed.bam.
--no-pbi Do not generate a PBI file that is needed for SMRTLink.
--no-bam Do not generate BAM output.
--no-reports Do not generate reports.
Single Side:
-S,--single-side Assign single side barcodes by score clustering.
--scored-adapter-ratio Minimum ratio of scored vs sequenced adapters. [0.25]
IsoSeq:
--isoseq Activate specialized IsoSeq mode.
Advanced:
--peek Demux the first N ZMWs and return the mean
score; 0 means peeking deactivated. [0]
--guess Try to guess the used barcodes, using the
provided mean score threshold; 0 means guessing
deactivated. [0]
--guess-min-count Minimum number of ZMWs observed to whitelist
barcodes. [0]
--peek-guess Try to infer the used barcodes subset, by
peeking at the first 50,000 ZMWs, whitelisting
barcode pairs with more than 10 counts and mean
score ≥ 45.
Options:
-h,--help Output this help.
--version Output version info.
-j,--num-threads Number of threads to use, 0 means
autodetection. [0]
--emit-tool-contract Emit tool contract.
--resolved-tool-contract Use args from resolved tool contract.
Arguments:
input Source BAM or DATASET
barcode FASTA or BARCODESET file
output Output BAM or DATASET file

默认参数,时间比较短,十几分钟,600M的css.reads.bam就搞完了
网友评论