单细胞转录组技术是来解释细胞状态的有力武器,在分群得到不同细胞类型的划分之前,有时候需要消除细胞周期的影响。毕竟同样的细胞类型也有着不同的状态,细胞周期就是状态之一。
在R中有scran以及seurat包均可推断以及回归掉细胞周期影响的函数。各方的引用几本来自这篇文献: Scialdone et al. (2015),感兴趣的同学可以学习一下。
在scran中已经附带了人和小鼠细胞周期相关的基因:
library(scater)
# Getting pre-trained marker sets
mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))
head(mm.pairs$G1)
first second
1 ENSMUSG00000000001 ENSMUSG00000001785
2 ENSMUSG00000000001 ENSMUSG00000005470
3 ENSMUSG00000000001 ENSMUSG00000012443
4 ENSMUSG00000000001 ENSMUSG00000015120
5 ENSMUSG00000000001 ENSMUSG00000022033
6 ENSMUSG00000000001 ENSMUSG00000023015
hs.pairs <- readRDS(system.file("exdata", "human_cycle_markers.rds", package="scran"))
head(hs.pairs$G2M)
first second
1 ENSG00000100519 ENSG00000146457
2 ENSG00000100519 ENSG00000132646
3 ENSG00000100519 ENSG00000124171
4 ENSG00000123975 ENSG00000100519
5 ENSG00000126787 ENSG00000100519
6 ENSG00000161057 ENSG00000100519
推测细胞周期的状态(G1,S,G2M)是根据成对的基因表达变化来的。
而 seurat中是根据细胞周期基因来推断的:
Seurat::cc.genes.updated.2019
$s.genes
[1] "MCM5" "PCNA" "TYMS" "FEN1" "MCM7" "MCM4" "RRM1" "UNG" "GINS2" "MCM6" "CDCA7"
[12] "DTL" "PRIM1" "UHRF1" "CENPU" "HELLS" "RFC2" "POLR1B" "NASP" "RAD51AP1" "GMNN" "WDR76"
[23] "SLBP" "CCNE2" "UBR7" "POLD3" "MSH2" "ATAD2" "RAD51" "RRM2" "CDC45" "CDC6" "EXO1"
[34] "TIPIN" "DSCC1" "BLM" "CASP8AP2" "USP1" "CLSPN" "POLA1" "CHAF1B" "MRPL36" "E2F8"
$g2m.genes
[1] "HMGB2" "CDK1" "NUSAP1" "UBE2C" "BIRC5" "TPX2" "TOP2A" "NDC80" "CKS2" "NUF2" "CKS1B" "MKI67"
[13] "TMPO" "CENPF" "TACC3" "PIMREG" "SMC4" "CCNB2" "CKAP2L" "CKAP2" "AURKB" "BUB1" "KIF11" "ANP32E"
[25] "TUBB4B" "GTSE1" "KIF20B" "HJURP" "CDCA3" "JPT1" "CDC20" "TTK" "CDC25C" "KIF2C" "RANGAP1" "NCAPD2"
[37] "DLGAP5" "CDCA2" "CDCA8" "ECT2" "KIF23" "HMMR" "AURKA" "PSRC1" "ANLN" "LBR" "CKAP5" "CENPE"
[49] "CTCF" "NEK2" "G2E3" "GAS2L3" "CBX5" "CENPA"
那,今天和大家介绍一种类似scran的在python中实现的方法:pypairs
# -*- coding: utf-8 -*-
"""
Created on Sat Mar 28 22:45:55 2020
# https://pypairs.readthedocs.io/en/latest/welcome.html
@author: Administrator
"""
from pypairs import pairs, datasets
# Load samples from the oscope scRNA-Seq dataset with known cell cycle
training_data = datasets.leng15(mode='sorted')
training_data.obs
Out[155]:
category
G2_Exp1.059 G2M
G2_Exp1.069 G2M
G2_Exp1.075 G2M
G2_Exp1.063 G2M
G2_Exp1.029 G2M
...
G1_Exp1.063 G1
G1_Exp1.083 G1
G1_Exp1.030 G1
G1_Exp1.018 G1
G1_Exp1.046 G1
help(datasets.leng15)
Help on function leng15 in module pypairs.datasets:
leng15(mode: Union[str, NoneType] = 'all', gene_sub: Union[Iterable[int], NoneType] = None, sample_sub: Union[Iterable[int], NoneType] = None) -> anndata._core.anndata.AnnData
Single cell RNA-seq data of human hESCs to evaluate Oscope [Leng15]_
Total 213 H1 single cells and 247 H1-Fucci single cells were sequenced.
The 213 H1 cells were used to evaluate Oscope in identifying oscillatory genes.
The H1-Fucci cells were used to confirm the cell cycle gene cluster identified
by Oscope in the H1 hESCs.
Normalized expected counts are provided in GSE64016_H1andFUCCI_normalized_EC.csv.gz
Reference
---------
GEO-Dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE64016
Parameters
----------
mode
sample selection mode:
- 'all' for all samples, default
- 'sorted' for all samples with known cell cycle (G2, S or G1)
- 'unsorted' for all samples with unknown cell cycle (H1)
gene_sub
Index based array of subsetted genes
sample_sub
Index based array of subsetted samples
Returns
-------
adata : :class:`~anndata.AnnData`
Annotated data matrix containing the normalized gene counts
获得细胞周期相关基因对,类似scran的sandbag:
# Run sandbag() to identify marker pairs
marker_pairs = pairs.sandbag(training_data, fraction=0.3)
marker_pairs['G2M'][1:10]
Out[166]:
[('KAT5', 'SYNJ2BP-COX16'),
('USF1', 'KIAA0040'),
('CAMK2N1', 'ST3GAL6'),
('CAMK2N1', 'TMEM108'),
('CAMK2N1', 'PAK3'),
('CAMK2N1', 'ZNF789'),
('CAMK2N1', 'NBPF11'),
('CAMK2N1', 'RHOBTB1'),
('CAMK2N1', 'KCNG3')]
读入我们的数据
import scanpy as sc
results_file = 'E:\learnscanpy\write\pbmc3k.h5ad'
sdata = sc.read_h5ad(results_file)
sdata
AnnData object with n_obs × n_vars = 2223 × 2208
obs: 'n_genes', 'percent_mito', 'n_counts', 'percent_mito1', 'leiden'
var: 'gene_ids', 'feature_types', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
uns: 'leiden', 'leiden_colors', 'leiden_sizes', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'umap'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
sdata.obs
n_genes percent_mito ... percent_mito1 leiden
AAACGCTGTGTGATGG-1 2348 0.099821 ... 998.205811 NewCluster3
AAACGCTTCTAGGCAT-1 2470 0.065564 ... 655.640869 Dendritic
AAAGAACCACCGTCTT-1 420 0.021944 ... 219.435730 NK
AAAGAACTCACCTGGG-1 2479 0.074067 ... 740.672119 B
AAAGAACTCGGAACTT-1 2318 0.079713 ... 797.133240 Dendritic
... ... ... ... ...
TTTGGTTTCCGGTAAT-1 2277 0.079324 ... 793.240723 CD8 T
TTTGTTGCAGGTACGA-1 1986 0.071226 ... 712.257751 CD14 Monocytes
TTTGTTGCAGTCTCTC-1 2485 0.055394 ... 553.935852 CD8 T
TTTGTTGTCCTTGGAA-1 1998 0.081037 ... 810.366882 Megakaryocytes
TTTGTTGTCGCACGAC-1 2468 0.070987 ... 709.868225 CD8 T
[2223 rows x 5 columns]
进行预测:
sresult = pairs.cyclone(sdata, marker_pairs)
print(sresult)
G2M S G1 max_class cc_prediction
AAACGCTGTGTGATGG-1 0.032 0.986 0.701 S G1
AAACGCTTCTAGGCAT-1 0.848 0.970 0.543 S G2M
AAAGAACCACCGTCTT-1 0.320 0.467 0.855 G1 G1
AAAGAACTCACCTGGG-1 0.190 0.846 0.288 S S
AAAGAACTCGGAACTT-1 0.358 0.723 0.871 G1 G1
... ... ... ... ...
TTTGGTTTCCGGTAAT-1 0.451 0.689 0.392 S S
TTTGTTGCAGGTACGA-1 0.680 0.410 0.684 G1 G1
TTTGTTGCAGTCTCTC-1 0.544 0.763 0.278 S G2M
TTTGTTGTCCTTGGAA-1 0.033 0.924 0.936 G1 G1
TTTGTTGTCGCACGAC-1 0.908 0.748 0.745 G2M G2M
[2223 rows x 5 columns]
在umap空间可视化一下:
sc.tl.umap(sdata)
sc.pl.umap(sdata,color = ['pypairs_cc_prediction','leiden'])
在scanpy中消除每种细胞周期得分的影响:
sc.pp.regress_out(sdata, ['n_counts', 'percent_mito','pypairs_G2M','pypairs_G1','pypairs_G1'])
sc.pp.scale(sdata, max_value=10)
sc.tl.pca(sdata, svd_solver='arpack')
sc.pp.neighbors(sdata, n_neighbors=10, n_pcs=40)
sc.tl.leiden(sdata)
sc.tl.umap(sdata)
sc.pl.umap(sdata,color = ['pypairs_cc_prediction','leiden'])
PyPairs - A python scRNA-Seq classifier
This is a python-reimplementation of the Pairs algorithm as described by A. Scialdone et. al. (2015). Original Paper available under: <https://doi.org/10.1016/j.ymeth.2015.06.021>
网友评论