美文网首页群体遗传学
2023-07-21 IsoSeq3利用pacbio测序数据进行

2023-07-21 IsoSeq3利用pacbio测序数据进行

作者: 麦冬花儿 | 来源:发表于2023-08-11 23:05 被阅读0次

使用IsoSeq3利用pacbio测序数据进行全长转录本序列分析,为了得到更准确的基因注释作准备,但价格昂贵,且无法进行表达量分析

mkdir -p /home/train/08.RNA-seq_analysis_by_trinity
cd /home/train/08.RNA-seq_analysis_by_trinity

# 对PacBio转录组测序数据进行全长转录本序列分析
# https://isoseq.how/clustering/schematic-workflow.html
mkdir -p /home/train/08.RNA-seq_analysis_by_trinity/IsoSeq
cd /home/train/08.RNA-seq_analysis_by_trinity/IsoSeq

#将subreads转换成ccs数据
PATH=/opt/biosoft/miniconda3_for_pbbioconda/bin/:$PATH
samtools view -h ~/00.incipient_data/data_for_gene_prediction_and_RNA-seq/m54086_170204_081430.subreads.bam | head -n 10000 | samtools view -b > m54086_170204_081430.subreads.bam
pbindex m54086_170204_081430.subreads.bam #对pacbio测序结果的bam建立索引
# ccs的正常运行需要提供chemistry.xml测序试剂信息;在运行ccs数据之前配置文件chemistry.xml 
# wget https://raw.githubusercontent.com/PacificBiosciences/pbcore/develop/pbcore/chemistry/resources/mapping.xml -O chemistry.xml
cp ~/00.incipient_data/data_for_genome_assembling/chemistry.xml .
perl -p -i -e 's/<SoftwareVersion>5.0/<SoftwareVersion>4.0/' chemistry.xml
export SMRT_CHEMISTRY_BUNDLE_DIR=$PWD#将当前目录写到环境变量
# 运行ccs命令将Subreads转换为CCS数据。默认参数--min-passes为3表示至少有3条全长的subreads;--min-snr为2.5表示信噪比,即SSubreads的准确率至少为2.5/3.5=71.4%。
ccs --min-rq 0.9 m54086_170204_081430.subreads.bam m54086_170204_081430.ccs.bam
#使用samtools对结果进行查看
[train@MiWiFi-R3P-srv IsoSeq]$ samtools view m54086_170204_081430.ccs.bam | les

# 去除引物和barcode碱基信息。
echo '>NEB_5p
GCAATGAAGTCGCAGGGTTGGG
>Clontech_5p
AAGCAGTGGTATCAACGCAGAGTACATGGGG
>NEB_Clontech_3p
GTACTCTGCGTTGATACCACTGCTT' > barcoded_primers.fasta
lima m54086_170204_081430.ccs.bam barcoded_primers.fasta m54086_170204_081430.lima.bam --isoseq --peek-guess

统计筛选后的条数
[train@MiWiFi-R3P-srv IsoSeq]$ samtools view m54086_170204_081430.lima.Clontech_5p--NEB_Clontech_3p.bam | wc -l
656


# 去除PolyA和合体序列,得到FLNC序列
isoseq refine m54086_170204_081430*lima*.bam barcoded_primers.fasta m54086_170204_081430.flnc.bam --require-polya

# 对FLNC reads进行聚类,得到全长转录本序列信息: hq.fasta.gz with predicted accuracy ≥ 0.99; lq.fasta.gz with predicted accuracy < 0.99。
isoseq cluster m54086_170204_081430.flnc.bam clustered.bam --verbose --use-qvs

[train@MiWiFi-R3P-srv IsoSeq]$ samtools view m54086_170204_081430.flnc.bam | wc -l
646
[train@MiWiFi-R3P-srv IsoSeq]$ samtools view clustered.bam | wc -l
41
[train@MiWiFi-R3P-srv IsoSeq]$ ll -t
total 37784
-rw-r--r-- 1 train train     8588 Jul 21 14:36 clustered.cluster
-rw-r--r-- 1 train train    15327 Jul 21 14:36 clustered.hq.fasta.gz
-rw-r--r-- 1 train train     2752 Jul 21 14:36 clustered.lq.fasta.gz
-rw-r--r-- 1 train train      404 Jul 21 14:36 clustered.bam.pbi
-rw-r--r-- 1 train train    21560 Jul 21 14:36 clustered.bam
-rw-r--r-- 1 train train      381 Jul 21 14:36 clustered.hq.bam.pbi
-rw-r--r-- 1 train train    18707 Jul 21 14:36 clustered.hq.bam
-rw-r--r-- 1 train train      109 Jul 21 14:36 clustered.lq.bam.pbi
-rw-r--r-- 1 train train     3352 Jul 21 14:36 clustered.lq.bam
-rw-r--r-- 1 train train     1568 Jul 21 14:36 clustered.transcriptset.xml
-rw-r--r-- 1 train train     6505 Jul 21 14:36 clustered.cluster_report.csv
-rw-r--r-- 1 train train     8195 Jul 21 14:33 m54086_170204_081430.flnc.bam.pbi
-rw-r--r-- 1 train train  1509378 Jul 21 14:33 m54086_170204_081430.flnc.bam
-rw-r--r-- 1 train train     1611 Jul 21 14:33 m54086_170204_081430.flnc.consensusreadset.xml
-rw-r--r-- 1 train train      821 Jul 21 14:33 m54086_170204_081430.flnc.filter_summary.report.json
-rw-r--r-- 1 train train    50339 Jul 21 14:33 m54086_170204_081430.flnc.report.csv
-rw-r--r-- 1 train train     1990 Jul 21 14:29 m54086_170204_081430.lima.Clontech_5p--NEB_Clontech_3p.consensusreadset.xml
-rw-r--r-- 1 train train     3826 Jul 21 14:29 m54086_170204_081430.lima.consensusreadset.xml
-rw-r--r-- 1 train train      639 Jul 21 14:29 m54086_170204_081430.lima.json
-rw-r--r-- 1 train train      108 Jul 21 14:29 m54086_170204_081430.lima.lima.counts
-rw-r--r-- 1 train train      845 Jul 21 14:29 m54086_170204_081430.lima.lima.summary
-rw-r--r-- 1 train train     8342 Jul 21 14:29 m54086_170204_081430.lima.Clontech_5p--NEB_Clontech_3p.bam.pbi
-rw-r--r-- 1 train train  1564566 Jul 21 14:29 m54086_170204_081430.lima.Clontech_5p--NEB_Clontech_3p.bam
-rw-r--r-- 1 train train   101927 Jul 21 14:29 m54086_170204_081430.lima.lima.clips
-rw-r--r-- 1 train train   183452 Jul 21 14:29 m54086_170204_081430.lima.lima.report
-rw-r--r-- 1 train train      118 Jul 21 14:29 m54086_170204_081430.lima.lima.guess
-rw-r--r-- 1 train train      119 Jul 21 14:29 barcoded_primers.fasta
-rw-r--r-- 1 train train      862 Jul 21 14:17 m54086_170204_081430.ccs.ccs_report.txt
-rw-r--r-- 1 train train    61814 Jul 21 14:17 m54086_170204_081430.ccs.zmw_metrics.json.gz
-rw-r--r-- 1 train train     8030 Jul 21 14:17 m54086_170204_081430.ccs.bam.pbi
-rw-r--r-- 1 train train  1694358 Jul 21 14:17 m54086_170204_081430.ccs.bam
-rw-r--r-- 1 train train     8366 Jul 21 14:15 chemistry.xml
-rw-r--r-- 1 train train    89364 Jul 21 14:10 m54086_170204_081430.subreads.bam.pbi
-rw-r--r-- 1 train train 33228366 Jul 21 14:09 m54086_170204_081430.subreads.bam

相关文章

  • wtdbg2 | 三代测序数据组装软件③

    wtdbg2软件介绍 wdbg2能利用三代Pacbio 或 Nanopore 测序数据进行基因组组装。在组装过程中...

  • 15.PacBio测序

    PacBio SMRT技术以SMRT芯片为测序载体,应用边合成边测序的思想进行测序,最大的特点就是单分子测序,测序...

  • 高通量测序原理

    测序类型 ROCHE/454 测序 illnumina 测序 Pacbio 测序 nanopore 测序 主流的...

  • hifiasm对HiFi PacBio进行组装

    hifiasm是一个能有效利用PacBio HiFi测序技术,在分型组装图(pahsed assembly gpr...

  • HIFISAM 组装

    hifiasm是一个能有效利用PacBio HiFi测序技术,在分型组装图(pahsed assembly gpr...

  • 动植物基因组组装要点小结

    组装策略 二代测序平台如Illumina、BGI,稳定可靠,数据质量高,成本低,读长短。三代测序平台如PacBio...

  • 「三代组装」Pacbio组装后如何用自身数据进行polish(更

    之前那我由于需要对PacBio的组装结果进行polish,于是写了「三代组装」Pacbio组装后如何用自身数据进行...

  • 序列组装

    1.利用fastqc对模拟测序的序列进行质控分析 1.1 使用art-illumina模拟测序,生成高通量数据(a...

  • Pacbio测序(一)

    Pabio主要发布三个测序仪,RSII实体比较大,Sequel平台在节约了大量的光学检测部件的空间以后,原来外置的...

  • hifi测序组装流程及相关问题

    三代测序又称单分子测序,读长长于二代测序,目前应用较广泛的有PacBio公司的Single-Molecule ...

网友评论

    本文标题:2023-07-21 IsoSeq3利用pacbio测序数据进行

    本文链接:https://www.haomeiwen.com/subject/pchzudtx.html