immunarch — Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R
数据分析的第一步应该是了解你的数据。对于R语言用户来讲,在了解完数据之后,就是如何把数据导入到R环境中。我们已经提到,immunarch几乎支持所有免疫组库的数据格式,今天我们以10XGenomics VDJ数据为例讲讲,如何载入数据。
10x Genomics有多种pipeline用于单细胞和生物系统的空间视图,包括单细胞免疫图谱。10x Genomics Chromium单细胞免疫分析解决方案可以同时分析以下内容:
- T细胞和B细胞的V(D)J转录本和克隆型。
- 5 '基因表达。
- 细胞表面蛋白/抗原特异性(特征条形码)在单细胞分辨率相同的一组细胞。
他们的端到端pipeline包括我们熟悉的CellRanger软件,其中包括以下管道的免疫分析:
- cellranger mkfastq
- cellranger vdj
- cellranger count
在处理数据时,cellranger 会有很多输出文件。您应该使用filtered contigs csv
文件,因为它们包含条形码信息。
.
├── vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv <-- This contains the count data we want!
├── vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv
├── vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv
├── vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv
├── vdj_v1_mm_c57bl6_pbmc_t_matrix.h5
├── vdj_v1_mm_c57bl6_pbmc_t_bam.bam.bai
├── vdj_v1_mm_c57bl6_pbmc_t_molecule_info.h5
├── vdj_v1_mm_c57bl6_pbmc_t_raw_feature_bc_matrix.tar.gz
├── vdj_v1_mm_c57bl6_pbmc_t_analysis.tar.gz
在您的R环境中运行下面的代码,以将数据加载为Immunarch的格式。您可以在包含Cellranger输出文件的整个文件夹上运行它。repLoad将忽略不支持的文件格式。
library(immunarch)
immdata_10x <- repLoad(file_path)
我们关心的是file_path
下面应该是什么。
- 多个样本的filtered contigs csv ,注意改成样本名(同一个路径下不能有同样的文件)
- metadata.txt(样本分组信息)
metadata 是这样的:
Sample Sex Age Status
immunoseq_1 M 1 C
immunoseq_2 M 2 C
immunoseq_3 F 3 A
文件夹大概率是这样的:
# For instance you have a following structure in your folder:
# >_ ls
# immunoseq1.txt
# immunoseq2.txt
# immunoseq3.txt
# metadata.txt
再不清楚的话,就看示例文件,自己构造。载入过程十分轻松:
> immdata_10x <- repLoad(file_path)
== Step 1/3: loading repertoire files... ==
Processing "/filepath/C57BL_mice_igenrichment" ...
-- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv" -- 10x (filt.contigs)
[!] Removed 2917 clonotypes with no nucleotide and amino acid CDR3 sequence.
-- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv" -- unsupported format, skipping
-- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv" -- 10x (consensus)
-- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv" -- 10x (filt.contigs)
[!] Removed 1198 clonotypes with no nucleotide and amino acid CDR3 sequence.
== Step 2/3: checking metadata files and merging... ==
Processing "<initial>" ...
-- Metadata file not found; creating a dummy metadata...
== Step 3/3: splitting data by barcodes and chains... ==
Done!
这时数据就可用了:
> immdata_10x
$data$vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRA
# A tibble: 710 x 17
Clones Proportion CDR3.nt CDR3.aa V.name D.name J.name V.end D.start D.end J.start VJ.ins VD.ins DJ.ins chain ClonotypeID ConsensusID
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <chr> <chr> <chr>
1 55 0.00414 TGTGCTATGGC… CAMATGG… TRAV13… None TRAJ56 NA NA NA NA NA NA NA TRA clonotype306 clonotype30…
2 55 0.00414 TGTGCAGCTAG… CAASGNT… TRAV7-4 None TRAJ27 NA NA NA NA NA NA NA TRA clonotype338 clonotype33…
3 53 0.00399 TGTGCAGCAAG… CAARDSG… TRAV14… None TRAJ11 NA NA NA NA NA NA NA TRA clonotype617 clonotype61…
4 45 0.00339 TGCGCAGTCAG… CAVSNNT… TRAV3-3 None TRAJ27 NA NA NA NA NA NA NA TRA clonotype435 clonotype43…
5 43 0.00324 TGTGCAGTCAG… CAVSNMG… TRAV7D… None TRAJ9 NA NA NA NA NA NA NA TRA clonotype401 clonotype40…
6 42 0.00316 TGTGCAGCAAG… CAASPNY… TRAV14… None TRAJ21 NA NA NA NA NA NA NA TRA clonotype5 clonotype5_…
7 37 0.00279 TGTGCAGTGAG… CAVSSGG… TRAV7D… None TRAJ6 NA NA NA NA NA NA NA TRA clonotype453 clonotype45…
8 35 0.00264 TGTGCAGCAAG… CAASATS… TRAV14… None TRAJ22 NA NA NA NA NA NA NA TRA clonotype809 clonotype80…
9 32 0.00241 TGTGCAGCAAG… CAASPNY… TRAV14… None TRAJ21 NA NA NA NA NA NA NA TRA clonotype150 clonotype15…
10 32 0.00241 TGTGCTCTGGG… CALGDEA… TRAV6-… None TRAJ30 NA NA NA NA NA NA NA TRA clonotype393 clonotype39…
# … with 700 more rows
$meta
Sample Chain Source
1 vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_Multi Multi vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
2 vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_TRA TRA vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
3 vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_TRB TRB vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
5 vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRA TRA vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations
6 vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRB TRB vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations
恭喜! 现在您的数据已经为探索做好了准备。请按照这里的步骤了解有关如何研究数据集的更多信息。一个重要的注意事项是,有些contigs文件缺少条形码列—cell的惟一标识。
这些文件可以用于分析单链数据(只有alpha或beta TCRs),但为了分析配对链数据并充分利用单细胞技术的全部力量,您应该将带有条形码的文件读入到Immunarch。
网友评论