loom文件生成后就是Seurat结果和loom文件的整合了
可以分为在R中Seurat提取分组及坐标信息,以及python中整合loom文本和Seurat的信息以及可视化。
高亮:这里主要是要注意loom文件中barcode样本名称前缀的和Seurat中Cell ID的前缀要吻合
参考:https://www.jianshu.com/p/70c19748ac1a
(一)提取Seruat中的信息
table(seurat_obj$celltype)
# 获得每个细胞的UMAP或TSNE坐标,使用 Embeddings函数
write.csv(Embeddings(seurat_obj, reduction = "umap"), file = "cell_embeddings.csv")
# 获取每个细胞的barcode
write.csv(Cells(seurat_obj), file = "cellID_obs.csv", row.names = FALSE)
# 提取每个细胞的cluster信息
write.csv(seurat_obj@meta.data[, 'cluster', drop = FALSE], file = "cell_clusters.csv")
# 提取每个细胞的celltype信息
write.csv(seurat_obj@meta.data[, 'celltype', drop = FALSE], file = "cell_celltype.csv")
# 获取celltype的颜色信息
hue_pal()(length(levels(seurat_obj$celltype)))
# 获取cluster的颜色信息
hue_pal()(length(levels(seurat_obj$cluster)))
(二)多个loom文件整合
import loompy
files=['sample_one','sample_two']
output_filename='combined.loom'
loompy.combine(files, output_filename, key="Accession")
(三) loom文件和Seurat合并
import scvelo as scv
import pandas as pd
import numpy as np
import os
loom_data = scv.read('combined.loom', cache=False)
#这里就是给loom文件中的barcode做替换,使其与Seurat中一致
loom_data.obs = loom_data.obs.rename(index = lambda x: x.replace(':', '_').replace('x', '-1').replace('*****','*****').replace('******','*****'))
loom_data.obs.head()
#读取seurat中的meta信息
meta_path = "/path/"
sample_obs = pd.read_csv(os.path.join(meta_path, "cellID_obs.csv"))
cell_umap= pd.read_csv(os.path.join(meta_path, "cell_embeddings.csv"), header=0, names=["CellID", "UMAP_1", "UMAP_2"])
cell_clusters = pd.read_csv(os.path.join(meta_path, "cell_clusters.csv"), header=0, names=["CellID", "cluster"])
cell_celltype = pd.read_csv(os.path.join(meta_path, "cell_celltype.csv"), header=0, names=["CellID", "celltype"])
#对细胞文件和RNA剪切速率文件取交集,保留关注的细胞类型
sample_one = loom_data[np.isin(loom_data.obs.index, sample_obs)]
sample_one.obs.head()
#构建umap坐标,cluster,celltype信息数据框
sample_one_index = pd.DataFrame(sample_one.obs.index)
sample_one_index = sample_one_index.rename(columns = {0:'CellID'})
umap_ordered = sample_one_index.merge(cell_umap, on = "CellID")
umap_ordered.head()
celltype_ordered = sample_one_index.merge(cell_celltype, on = "CellID")
celltype_ordered.head()
clusters_ordered = sample_one_index.merge(cell_clusters, on = "CellID")
clusters_ordered.head()
#将umap坐标与cluster信息加入sample_one
umap_ordered = umap_ordered.iloc[:,1:]
clusters_ordered = clusters_ordered.iloc[:,1:]
celltype_ordered = celltype_ordered.iloc[:,1:]
sample_one.obsm['X_umap'] = umap_ordered.values
sample_one.uns['clusters'] = clusters_ordered.values
sample_one.obs['celltype'] = celltype_ordered.values
adata = sample_one
# some gene labels are duplicated (Ensembl IDs are still unique!!)
adata.var_names_make_unique()
# save model to file
adata.write('bl_celltype_dynamicModel.h5ad', compression = 'gzip')
# read h5ad file
adata= scv.read('mbc_celltype_dynamicModel.h5ad')
#可视化
# Running RNA Velocity
scv.pp.filter_and_normalize(adata,min_shared_counts=30, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
scv.tl.velocity(adata, mode = "stochastic")
scv.tl.velocity_graph(adata)
scv.pl.velocity_embedding(adata, basis='X_umap', arrow_size=5)
ident_colours = ["#B2DF8A", "#FB9A99", "#33A02C", "#A6CEE3", "#1F78B4"]
scv.pl.velocity_embedding_stream(adata, basis='X_umap',color = "celltype", palette = ident_colours, size = 20,alpha =0.8, save="UMAP_stream.pdf", figsize=(7,5), legend_fontsize = 9, show=False, title='')
Python比R对于RNA速率分析的软件更兼容,虽然我python基础非常之薄弱,但是基本代码原理都是相同的,但之后还是好好学习python了。
网友评论