library(Seurat)
- Setup the Seurat Object
pbmc_data <- Read10X(data.dir = "../data/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc_data)
- Standard pre-processing workflow
filtration of cells
data normalization
data scaling (for dimensionality reduction)
detection of highly variable features
pbmc <- PercentageFeatureSet(pbmc, pattern = "^MT-", col.name = "percent.mt")
pbmc <- SCTransform(pbmc, vars.to.regress = "percent.mt", verbose = FALSE)
- Dimensionality reduction, clustering and visualization
# These are now standard steps in the Seurat workflow for visualization and clustering
pbmc <- RunPCA(pbmc, verbose = FALSE)
ElbowPlot(pbmc, ndims = 35)
# reticulate::py_install(packages = 'umap-learn')
pbmc <- RunUMAP(pbmc, dims = 1:30, umap.method='umap-learn')
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindClusters(pbmc, verbose = FALSE, resolution = 0.5)
DimPlot(pbmc, reduction='umap', label = TRUE) + NoLegend()
- Find maker genes
# find markers for every cluster compared to all remaining cells, report only the positive ones
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
# visualization
FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP",
"CD8A"))
Question:
-
Wouldn't be better to take TPM as the input of Seurat?
https://github.com/satijalab/seurat/issues/668
Yep! We can simply log transform TPM before perform dimension reduction.
Count to TPM (GenomicFeatures lib + GTF file + count >>> TPM)
References
https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html
网友评论