关键词:Transposable Element;ERV内源性反转录病毒;单细胞测序分析;Seurat;scTE。
![](https://img.haomeiwen.com/i7192913/6327e64c5015290f.png)
背景:
采用scTE对10X 单细胞测序数据进行TE定量,再倒入Seurat进行下游分析。Jiekai 实验室,2021年3月发表在自然通讯杂志。
转座因子 (Transposable Element,TE) 占典型真核生物基因组的大部分,并以不清楚的方式导致细胞异质性。单细胞测序技术是探索细胞的强大工具,但分析通常以基因为中心,并且尚未解决 TE 表达问题。
方法:
1. 安装scTE
# scTE works with python >=3.6.
$ git clone https://github.com/JiekaiLab/scTE.git ## 进入你想要下载scTE的文件夹。
$ cd scTE
$ python setup.py install ## 进行安装
# Building genome indices
$ scTE_build -g mm10 # Mouse
$ scTE_build -g hg38 # Human
2. 对10x的输出结果bam文件进行scTE分析。
$ scTE -i ../run_cellranger_count/run_count_YL002273_S2/outs/possorted_genome_bam.bam -o YL002272_S2 -x /home/ye.liu/yang-secondary/ye/biotools/scTE/mm10.exclusive.idx --hdf5 True -CB CR -UMI UB
--hdf5 True
结果输出是hdf5格式。如果用Seurat进行下游分析需要转换为Seurat object。
-CB
cell barcode,要确认bam
文件中你的cell barcode的标签是CR还是CB。如果是CR就-CB CR
,如果是CB就-CB CB
。
查看示例bam,倒数第四列是CB:
$ samtools view test.bam
A00519:758:HTCCHDSXY:3:2535:21296:19774 16 chr1 14021 0 90M * 0 0 TGGATTTCTATCTCCCTGGCTTGGTGCCAGTTCCTCCAAGTCGATGGCACCTCCCTCCCTCTCAACCACTTGAGCAAACTCCAAGACATC ,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFF:FFFFF NH:i:5 HI:i:1 AS:i:88 nM:i:0 RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3 RE:A:I xf:i:0 CR:Z:CTCCCTCCACTGCGAC CY:Z:FFFFFFFFFFFFFFFF CB:Z:CTCCCTCCACTGCGAC-1 UR:Z:AAGGCGTAGTAG UY:Z:FFFFFFFFFFFF UB:Z:AAGGCGTAGTAG
A00519:758:HTCCHDSXY:1:1355:17237:31720 0 chr1 14260 0 90M * 0 0 CTCCCTCTCATCCCAGAGAAACAGGTCAGCTGGGAGCTTCTGCCCCCACTGCCTAGGGACCAACAGGGGCAGGAGGCAGTCACTGACCCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:5 HI:i:1 AS:i:88 nM:i:0 RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:1 RE:A:I xf:i:0 CR:Z:TCGTCCACAGTATGAA CY:Z:FFFFFFFFFFFFFFFF CB:Z:TCGTCCACAGTATGAA-1 UR:Z:GACTTATTTTTT UY:Z:FFFFFFFFFFFF UB:Z:GACTTATTTTTT
A00519:758:HTCCHDSXY:3:2227:16703:32080 16 chr1 14411 1 90M * 0 0 TCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAG FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:88 nM:i:0 RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3 RE:A:I xf:i:0 CR:Z:TTGAGTGGTTGTGGCC CY:Z:FFFFFFFFFFFFFFFF CB:Z:TTGAGTGGTTGTGGCC-1 UR:Z:TATAATGCTCAG UY:Z:FFFFFFFFFFFF UB:Z:TATAATGCTCAG
A00519:758:HTCCHDSXY:3:2563:23665:33802 16 chr1 14411 1 90M * 0 0 TCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAG FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:88 nM:i:0 RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3 RE:A:I xf:i:0 CR:Z:TGTTGAGAGGCAATGC CY:Z:FFFFFFFFFFFFFFFF CB:Z:TGTTGAGAGGCAATGC-1 UR:Z:ACGGGTGTGGAG UY:Z:FFFFFFFFFFFF UB:Z:ACGGGTGTGGAG
3. hdf5 转化成Seurat object
使用Convert()进行转换。
using the function Convert from SeuratDisk.
# R
library(SeuratDisk)
library(Seurat)
# 转换为h5seurat 文件
Convert("../../../YL002272_S1.h5ad", dest = "h5seurat", overwrite = TRUE)
# 再将其导入R
Seurat.obj <- LoadH5Seurat("../../../YL002272_S1.h5seurat")
将count matrix中的gene 和 TE分开
# R
## load TE names
te = read.csv('../data/mm10.TEname.txt', sep = '\t', header = F)
##
Gene = subset(Seurat.obj, features = rownames(Seurat.obj)[!rownames(Seurat.obj) %in% te$V1])
TEs = subset(Seurat.obj, features = rownames(Seurat.obj)[rownames(Seurat.obj) %in% te$V1])
TEs
可以进行Seurat对应的分析。
如何下载mm10.TEname.txt文件
# hg38
$ wget -c http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz -O hg38.te.txt
$ zcat hg38.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11 | sort | uniq > hg38.TEname.txt
# mm10
wget -c http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz -O mm10.te.txt
zcat mm10.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11 | sort | uniq > mm10.TEname.txt
# if you need to know the family and class info for the TE names
zcat hg38.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11,12,13 | sort | uniq > hg38.TEnamefamilyclass.txt
zcat mm10.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11,12,13 | sort | uniq > mm10.TEnamefamilyclass.txt
### Note: check this page https://github.com/jphe/scTE/issues/3
参考文献:
https://github.com/JiekaiLab/scTE
https://www.nature.com/articles/s41467-021-21808-x
网友评论