美文网首页
Seurat(v4)官方教程 | Introduction to

Seurat(v4)官方教程 | Introduction to

作者: 生信小博士 | 来源:发表于2021-10-31 22:52 被阅读0次

    https://satijalab.org/seurat/articles/integration_introduction.html
    https://blog.csdn.net/zengwanqin/article/details/114969068

    Integration goals
    The following tutorial is designed to give you an overview of the kinds of comparative analyses on complex cell types that are possible using the Seurat integration procedure. Here, we address a few key goals:

    1Create an ‘integrated’ data assay for downstream analysis
    2Identify cell types that are present in both datasets
    3Obtain cell type markers that are conserved in both control and stimulated cells
    4Compare the datasets to find cell-type specific responses to stimulation

    1Setup the Seurat objects

    library(Seurat)
    library(SeuratData)
    library(patchwork)

    BiocManager::install('SeuratData')

    devtools::install_github('satijalab/seurat-data')

    install dataset

    InstallData("ifnb")

    load dataset

    LoadData("ifnb")

    ifnb
    str(ifnb)

    split the dataset into a list of two seurat objects (stim and CTRL)

    ifnb.list <- SplitObject(ifnb, split.by = "stim")

    normalize and identify variable features for each dataset independently

    ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
    x <- NormalizeData(x)
    x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
    })

    select features that are repeatedly variable across datasets for integration

    features <- SelectIntegrationFeatures(object.list = ifnb.list)

    2Perform integration

    We then identify anchors using the FindIntegrationAnchors() function, which takes a list of Seurat objects as input, and use these anchors to integrate the two datasets together with IntegrateData().

    immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
    immune.anchors
    str(immune.anchors)

    this command creates an 'integrated' data assay

    immune.combined <- IntegrateData(anchorset = immune.anchors)

    3Perform an integrated analysis
    Now we can run a single integrated analysis on all cells!

    specify that we will perform downstream analysis on the corrected data note that the

    original unmodified data still resides in the 'RNA' assay

    DefaultAssay(immune.combined) <- "integrated"

    Run the standard workflow for visualization and clustering

    immune.combined <- ScaleData(immune.combined, verbose = FALSE)
    immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
    immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
    immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
    immune.combined <- FindClusters(immune.combined, resolution = 0.5)

    Visualization

    p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
    p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
    p1 + p2

    To visualize the two conditions side-by-side, we can use the split.by argument to show each condition colored by cluster.

    DimPlot(immune.combined, reduction = "umap", split.by = "stim")

    4Identify conserved cell type markers
    To identify canonical cell type marker genes that are conserved across conditions, we provide the FindConservedMarkers() function. This function performs differential gene expression testing for each dataset/group and combines the p-values using meta-analysis methods from the MetaDE R package. For example, we can calculated the genes that are conserved markers irrespective of stimulation condition in cluster 6 (NK cells).

    For performing differential expression after integration, we switch back to the original

    data

    DefaultAssay(immune.combined) <- "RNA"

    '''
    BiocManager::install('multtest')
    install.packages('metap')
    library('multtest')
    '''

    nk.markers <- FindConservedMarkers(immune.combined, ident.1 = 6, grouping.var = "stim", verbose = FALSE)
    head(nk.markers)

    We can explore these marker genes for each cluster and use them to annotate our clusters as specific cell types.
    str(nk.markers)
    rownames(nk.markers)
    head(rownames(nk.markers))

    FeaturePlot(immune.combined, features = head(rownames(nk.markers),5), min.cutoff = "q9")
    FeaturePlot(immune.combined, features = c("CD3D", "SELL", "CREM", "CD8A", "GNLY", "CD79A", "FCGR3A",
    "CCL2", "PPBP"), min.cutoff = "q9")

    immune.combined <- RenameIdents(immune.combined, 0 = "CD14 Mono", 1 = "CD4 Naive T", 2 = "CD4 Memory T",
    3 = "CD16 Mono", 4 = "B", 5 = "CD8 T", 6 = "NK", 7 = "T activated", 8 = "DC", 9 = "B Activated",
    10 = "Mk", 11 = "pDC", 12 = "Eryth", 13 = "Mono/Mk Doublets", 14 = "HSPC")
    DimPlot(immune.combined, label = TRUE)

    The DotPlot() function with the split.by parameter can be useful for viewing conserved cell type markers across conditions, showing both the expression level and the percentage of cells in a cluster expressing any given gene. Here we plot 2-3 strong marker genes for each of our 14 clusters.

    Idents(immune.combined) <- factor(Idents(immune.combined), levels = c("HSPC", "Mono/Mk Doublets",
    "pDC", "Eryth", "Mk", "DC", "CD14 Mono", "CD16 Mono", "B Activated", "B", "CD8 T", "NK", "T activated",
    "CD4 Naive T", "CD4 Memory T"))
    markers.to.plot <- c("CD3D", "CREM", "HSPH1", "SELL", "GIMAP5", "CACYBP", "GNLY", "NKG7", "CCL5",
    "CD8A", "MS4A1", "CD79A", "MIR155HG", "NME1", "FCGR3A", "VMO1", "CCL2", "S100A9", "HLA-DQA1",
    "GPR183", "PPBP", "GNG11", "HBA2", "HBB", "TSPAN13", "IL3RA", "IGJ", "PRSS57")
    DotPlot(immune.combined, features = markers.to.plot, cols = c("blue", "red"), dot.scale = 8, split.by = "stim") +
    RotatedAxis()

    5Identify differential expressed genes across conditions
    Now that we’ve aligned the stimulated and control cells, we can start to do comparative analyses and look at the differences induced by stimulation. One way to look broadly at these changes is to plot the average expression of both the stimulated and control cells and look for genes that are visual outliers on a scatter plot. Here, we take the average expression of both the stimulated and control naive T cells and CD14 monocyte populations and generate the scatter plots, highlighting genes that exhibit dramatic responses to interferon stimulation.

    library(ggplot2)
    library(cowplot)
    theme_set(theme_cowplot())
    Idents(immune.combined)
    t.cells <- subset(immune.combined, idents = "CD4 Naive T")
    head(t.cells)
    tail(t.cells)
    str(t.cells)

    '''
    my_tcells<-immune.combined@assays$RNA[,orig.ident=c('IMMUNE_CTRL','IMMUNE_STIM')]
    head(my_tcells)
    colnames(my_tcells)
    '''
    Idents(t.cells) ##此时的level为 CD4 Naive T
    Idents(t.cells) <- "stim" ##此时的level为 CONTROL 和STIM

    t.cellsRNA t.cells@assaysRNA
    t.cells@assaysRNA@data avg.t.cells <- as.data.frame(log1p(AverageExpression(t.cells, verbose = FALSE)RNA))
    avg.t.cells

    avg.t.cells$gene <- rownames(avg.t.cells)
    avg.t.cells

    ----------

    cd14.mono <- subset(immune.combined, idents = "CD14 Mono")
    Idents(cd14.mono) <- "stim"
    avg.cd14.mono <- as.data.frame(log1p(AverageExpression(cd14.mono, verbose = FALSE)RNA)) avg.cd14.monogene <- rownames(avg.cd14.mono)
    avg.cd14.mono

    genes.to.label = c("ISG15", "LY6E", "IFI6", "ISG20", "MX1", "IFIT2", "IFIT1", "CXCL10", "CCL8")
    p1 <- ggplot(avg.t.cells, aes(CTRL, STIM)) + geom_point() + ggtitle("CD4 Naive T Cells")
    p1
    p1 <- LabelPoints(plot = p1, points = genes.to.label, repel = FALSE)
    p2 <- ggplot(avg.cd14.mono, aes(CTRL, STIM)) + geom_point() + ggtitle("CD14 Monocytes")
    p2 <- LabelPoints(plot = p2, points = genes.to.label, repel = TRUE)
    p1 + p2

    '''

    如何确定gene.to.label的基因?可以通过寻找std最大的方式吗?

    library(dplyr)
    avg.t.cells

    avg.t.cells %>%
    select(CTRL,STIM) %>% unlist () %>%
    sd()
    '''

    As you can see, many of the same genes are upregulated in both of these cell types and likely represent a conserved interferon response pathway.

    Because we are confident in having identified common cell types across condition, we can ask what genes change in different conditions for cells of the same type. First, we create a column in the meta.data slot to hold both the cell type and stimulation information and switch the current ident to that column. Then we use FindMarkers() to find the genes that are different between stimulated and control B cells. Notice that many of the top genes that show up here are the same as the ones we plotted earlier as core interferon response genes. Additionally, genes like CXCL10 which we saw were specific to monocyte and B cell interferon response show up as highly significant in this list as well.

    head(immune.combined@meta.data,10)
    immune.combinedcelltype.stim <- paste(Idents(immune.combined), immune.combinedstim, sep = "_")
    head(immune.combined@meta.data,10)

    immune.combined$celltype <- Idents(immune.combined)
    head(immune.combined@meta.data,10)
    Idents(immune.combined)
    Idents(immune.combined) <- "celltype.stim"
    Idents(immune.combined)
    b.interferon.response <- FindMarkers(immune.combined, ident.1 = "B_STIM", ident.2 = "B_CTRL", verbose = FALSE)
    head(b.interferon.response, n = 15)

    Another useful way to visualize these changes in gene expression is with the split.by option to the FeaturePlot() or VlnPlot() function. This will display FeaturePlots of the list of given genes, split by a grouping variable (stimulation condition here). Genes such as CD3D and GNLY are canonical cell type markers (for T cells and NK/CD8 T cells) that are virtually unaffected by interferon stimulation and display similar gene expression patterns in the control and stimulated group. IFI6 and ISG15, on the other hand, are core interferon response genes and are upregulated accordingly in all cell types. Finally, CD14 and CXCL10 are genes that show a cell type specific interferon response. CD14 expression decreases after stimulation in CD14 monocytes, which could lead to misclassification in a supervised analysis framework, underscoring the value of integrated analysis. CXCL10 shows a distinct upregulation in monocytes and B cells after interferon stimulation but not in other cell types.

    FeaturePlot(immune.combined, features = c("CD3D", "GNLY", "IFI6"), split.by = "stim", max.cutoff = 3,
    cols = c("grey", "red"))

    plots <- VlnPlot(immune.combined, features = c("LYZ", "ISG15", "CXCL10"), split.by = "stim", group.by = "celltype",split.plot = TRUE,
    pt.size = 0, combine = FALSE)
    wrap_plots(plots = plots, ncol = 1)

    https://satijalab.org/seurat/articles/integration_introduction.html
    https://blog.csdn.net/zengwanqin/article/details/114969068

    Integration goals
    The following tutorial is designed to give you an overview of the kinds of comparative analyses on complex cell types that are possible using the Seurat integration procedure. Here, we address a few key goals:

    1Create an ‘integrated’ data assay for downstream analysis
    2Identify cell types that are present in both datasets
    3Obtain cell type markers that are conserved in both control and stimulated cells
    4Compare the datasets to find cell-type specific responses to stimulation

    1Setup the Seurat objects

    library(Seurat)
    library(SeuratData)
    library(patchwork)

    BiocManager::install('SeuratData')

    devtools::install_github('satijalab/seurat-data')

    install dataset

    InstallData("ifnb")

    load dataset

    LoadData("ifnb")

    ifnb
    str(ifnb)

    split the dataset into a list of two seurat objects (stim and CTRL)

    ifnb.list <- SplitObject(ifnb, split.by = "stim")

    normalize and identify variable features for each dataset independently

    ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
    x <- NormalizeData(x)
    x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
    })

    select features that are repeatedly variable across datasets for integration

    features <- SelectIntegrationFeatures(object.list = ifnb.list)

    2Perform integration

    We then identify anchors using the FindIntegrationAnchors() function, which takes a list of Seurat objects as input, and use these anchors to integrate the two datasets together with IntegrateData().

    immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
    immune.anchors
    str(immune.anchors)

    this command creates an 'integrated' data assay

    immune.combined <- IntegrateData(anchorset = immune.anchors)

    3Perform an integrated analysis
    Now we can run a single integrated analysis on all cells!

    specify that we will perform downstream analysis on the corrected data note that the

    original unmodified data still resides in the 'RNA' assay

    DefaultAssay(immune.combined) <- "integrated"

    Run the standard workflow for visualization and clustering

    immune.combined <- ScaleData(immune.combined, verbose = FALSE)
    immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
    immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
    immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
    immune.combined <- FindClusters(immune.combined, resolution = 0.5)

    Visualization

    p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
    p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
    p1 + p2

    To visualize the two conditions side-by-side, we can use the split.by argument to show each condition colored by cluster.

    DimPlot(immune.combined, reduction = "umap", split.by = "stim")

    4Identify conserved cell type markers
    To identify canonical cell type marker genes that are conserved across conditions, we provide the FindConservedMarkers() function. This function performs differential gene expression testing for each dataset/group and combines the p-values using meta-analysis methods from the MetaDE R package. For example, we can calculated the genes that are conserved markers irrespective of stimulation condition in cluster 6 (NK cells).

    For performing differential expression after integration, we switch back to the original

    data

    DefaultAssay(immune.combined) <- "RNA"

    '''
    BiocManager::install('multtest')
    install.packages('metap')
    library('multtest')
    '''

    nk.markers <- FindConservedMarkers(immune.combined, ident.1 = 6, grouping.var = "stim", verbose = FALSE)
    head(nk.markers)

    We can explore these marker genes for each cluster and use them to annotate our clusters as specific cell types.
    str(nk.markers)
    rownames(nk.markers)
    head(rownames(nk.markers))

    FeaturePlot(immune.combined, features = head(rownames(nk.markers),5), min.cutoff = "q9")
    FeaturePlot(immune.combined, features = c("CD3D", "SELL", "CREM", "CD8A", "GNLY", "CD79A", "FCGR3A",
    "CCL2", "PPBP"), min.cutoff = "q9")

    immune.combined <- RenameIdents(immune.combined, 0 = "CD14 Mono", 1 = "CD4 Naive T", 2 = "CD4 Memory T",
    3 = "CD16 Mono", 4 = "B", 5 = "CD8 T", 6 = "NK", 7 = "T activated", 8 = "DC", 9 = "B Activated",
    10 = "Mk", 11 = "pDC", 12 = "Eryth", 13 = "Mono/Mk Doublets", 14 = "HSPC")
    DimPlot(immune.combined, label = TRUE)

    The DotPlot() function with the split.by parameter can be useful for viewing conserved cell type markers across conditions, showing both the expression level and the percentage of cells in a cluster expressing any given gene. Here we plot 2-3 strong marker genes for each of our 14 clusters.

    Idents(immune.combined) <- factor(Idents(immune.combined), levels = c("HSPC", "Mono/Mk Doublets",
    "pDC", "Eryth", "Mk", "DC", "CD14 Mono", "CD16 Mono", "B Activated", "B", "CD8 T", "NK", "T activated",
    "CD4 Naive T", "CD4 Memory T"))
    markers.to.plot <- c("CD3D", "CREM", "HSPH1", "SELL", "GIMAP5", "CACYBP", "GNLY", "NKG7", "CCL5",
    "CD8A", "MS4A1", "CD79A", "MIR155HG", "NME1", "FCGR3A", "VMO1", "CCL2", "S100A9", "HLA-DQA1",
    "GPR183", "PPBP", "GNG11", "HBA2", "HBB", "TSPAN13", "IL3RA", "IGJ", "PRSS57")
    DotPlot(immune.combined, features = markers.to.plot, cols = c("blue", "red"), dot.scale = 8, split.by = "stim") +
    RotatedAxis()

    5Identify differential expressed genes across conditions
    Now that we’ve aligned the stimulated and control cells, we can start to do comparative analyses and look at the differences induced by stimulation. One way to look broadly at these changes is to plot the average expression of both the stimulated and control cells and look for genes that are visual outliers on a scatter plot. Here, we take the average expression of both the stimulated and control naive T cells and CD14 monocyte populations and generate the scatter plots, highlighting genes that exhibit dramatic responses to interferon stimulation.

    library(ggplot2)
    library(cowplot)
    theme_set(theme_cowplot())
    Idents(immune.combined)
    t.cells <- subset(immune.combined, idents = "CD4 Naive T")
    head(t.cells)
    tail(t.cells)
    str(t.cells)

    '''
    my_tcells<-immune.combined@assays$RNA[,orig.ident=c('IMMUNE_CTRL','IMMUNE_STIM')]
    head(my_tcells)
    colnames(my_tcells)
    '''
    Idents(t.cells) ##此时的level为 CD4 Naive T
    Idents(t.cells) <- "stim" ##此时的level为 CONTROL 和STIM

    t.cellsRNA t.cells@assaysRNA
    t.cells@assaysRNA@data avg.t.cells <- as.data.frame(log1p(AverageExpression(t.cells, verbose = FALSE)RNA))
    avg.t.cells

    avg.t.cells$gene <- rownames(avg.t.cells)
    avg.t.cells

    ----------

    cd14.mono <- subset(immune.combined, idents = "CD14 Mono")
    Idents(cd14.mono) <- "stim"
    avg.cd14.mono <- as.data.frame(log1p(AverageExpression(cd14.mono, verbose = FALSE)RNA)) avg.cd14.monogene <- rownames(avg.cd14.mono)
    avg.cd14.mono

    genes.to.label = c("ISG15", "LY6E", "IFI6", "ISG20", "MX1", "IFIT2", "IFIT1", "CXCL10", "CCL8")
    p1 <- ggplot(avg.t.cells, aes(CTRL, STIM)) + geom_point() + ggtitle("CD4 Naive T Cells")
    p1
    p1 <- LabelPoints(plot = p1, points = genes.to.label, repel = FALSE)
    p2 <- ggplot(avg.cd14.mono, aes(CTRL, STIM)) + geom_point() + ggtitle("CD14 Monocytes")
    p2 <- LabelPoints(plot = p2, points = genes.to.label, repel = TRUE)
    p1 + p2

    '''

    如何确定gene.to.label的基因?可以通过寻找std最大的方式吗?

    library(dplyr)
    avg.t.cells

    avg.t.cells %>%
    select(CTRL,STIM) %>% unlist () %>%
    sd()
    '''

    As you can see, many of the same genes are upregulated in both of these cell types and likely represent a conserved interferon response pathway.

    Because we are confident in having identified common cell types across condition, we can ask what genes change in different conditions for cells of the same type. First, we create a column in the meta.data slot to hold both the cell type and stimulation information and switch the current ident to that column. Then we use FindMarkers() to find the genes that are different between stimulated and control B cells. Notice that many of the top genes that show up here are the same as the ones we plotted earlier as core interferon response genes. Additionally, genes like CXCL10 which we saw were specific to monocyte and B cell interferon response show up as highly significant in this list as well.

    head(immune.combined@meta.data,10)
    immune.combinedcelltype.stim <- paste(Idents(immune.combined), immune.combinedstim, sep = "_")
    head(immune.combined@meta.data,10)

    immune.combined$celltype <- Idents(immune.combined)
    head(immune.combined@meta.data,10)
    Idents(immune.combined)
    Idents(immune.combined) <- "celltype.stim"
    Idents(immune.combined)
    b.interferon.response <- FindMarkers(immune.combined, ident.1 = "B_STIM", ident.2 = "B_CTRL", verbose = FALSE)
    head(b.interferon.response, n = 15)

    Another useful way to visualize these changes in gene expression is with the split.by option to the FeaturePlot() or VlnPlot() function. This will display FeaturePlots of the list of given genes, split by a grouping variable (stimulation condition here). Genes such as CD3D and GNLY are canonical cell type markers (for T cells and NK/CD8 T cells) that are virtually unaffected by interferon stimulation and display similar gene expression patterns in the control and stimulated group. IFI6 and ISG15, on the other hand, are core interferon response genes and are upregulated accordingly in all cell types. Finally, CD14 and CXCL10 are genes that show a cell type specific interferon response. CD14 expression decreases after stimulation in CD14 monocytes, which could lead to misclassification in a supervised analysis framework, underscoring the value of integrated analysis. CXCL10 shows a distinct upregulation in monocytes and B cells after interferon stimulation but not in other cell types.

    FeaturePlot(immune.combined, features = c("CD3D", "GNLY", "IFI6"), split.by = "stim", max.cutoff = 3,
    cols = c("grey", "red"))

    plots <- VlnPlot(immune.combined, features = c("LYZ", "ISG15", "CXCL10"), split.by = "stim", group.by = "celltype",split.plot = TRUE,
    pt.size = 0, combine = FALSE)
    wrap_plots(plots = plots, ncol = 1)

    相关文章

      网友评论

          本文标题:Seurat(v4)官方教程 | Introduction to

          本文链接:https://www.haomeiwen.com/subject/ismaaltx.html