美文网首页
Nat Med.作者提供全文的绘图代码,对于学习作图很有帮助(一

Nat Med.作者提供全文的绘图代码,对于学习作图很有帮助(一

作者: 小杜的生信筆記 | 来源:发表于2024-09-22 15:18 被阅读0次

    本期教程

    获得本期教程全文代码:在订阅号后台回复关键词:20240923

    2022年教程总汇

    https://mp.weixin.qq.com/s/Lnl258WhbK2a8pRZFuIyVg

    2023年教程总汇

    https://mp.weixin.qq.com/s/wCTswNP8iHMNvu5GQauHdg

    引言

    今天分享的文章是2024发表在Nat Med.期刊中,来自上海交通大学医学院的文章,作者提供了全文的绘图代码,确实,对于学习绘图提供充分的素材。也是一个学习作图具重大意义的文章。但是,依旧是基于你有一定基础技能之上。

    文中并未给你提供作图原数据,因此,这些都是需要自己去摸索。

    文章

    Prospective observational study on biomarkers of response in pancreatic ductal adenocarcinoma

    文章图形亮点:

    1. 全文的图形并不复杂,因此,我们在制图中,并不需要追求那些很复杂的图形,除非,你很牛X,可以自己绘制。
    2. 图形中网络图。这个是一个相对值得我们学习的图形之一,WGNCA和DEG都可以做网路图。
    3. 提供你全部的绘图代码。
    

    文章图形

    < Figure 1 , Figure 2 , Figure 3 , Figure 4 , Figure 5 >

    Code

    1. Figure 1

    1.1 PCA

    library(scatterplot3d)
    library(tidyverse)
    library(openxlsx)
    library(factoextra) # Extract and Visualize the Results of Multivariate Data Analyses
    color.bin <- c("#00599F","#D01910")
    dir.create("./results/Figure1",recursive = T)
    #----------------------------------------------------------------------------------
    #  Step 1: Load Proteomics data
    #----------------------------------------------------------------------------------
    pro  <- readRDS("./data/proteomics/20230412_PDAC_PRO_exp.rds") # 4787 protein * 281 samples
    meta <- readRDS("./data/proteomics/20230412_PDAC_PRO_meta.rds")
    identical(rownames(meta),colnames(pro)) # check names
    #----------------------------------------------------------------------------------
    #  Step 2: PCA
    #----------------------------------------------------------------------------------
    res.pca.comp <- prcomp(pro, scale = F)
    #----------------------------------------------------------------------------------
    #  Step 3: Plot
    #----------------------------------------------------------------------------------
    plot.data <- as.data.frame(res.pca.comp$rotation[, 1:10])
    plot.data <- plot.data %>% 
                  mutate(ID=rownames(plot.data),
                         Type=meta$Type,
                         TypeColor=color.bin[as.numeric(as.factor(Type))])
    pdf("./results/Figure1/1B.PRO_PCA3d.pdf", width = 7, height = 7)
    scatterplot3d(x = plot.data$PC2, 
                  y = plot.data$PC1, 
                  z = plot.data$PC3,
                  color = plot.data$TypeColor,
                  pch = 16, cex.symbols = 1,
                  scale.y = 0.7, angle = 45,
                  xlab = "PC2", ylab = "PC1", zlab = "PC3",
                  main="3D Scatter Plot of proteomics",
                  col.axis = "#444444", col.grid = "#CCCCCC")
    legend("bottom", legend = levels(as.factor(meta$Type)),
          col =  color.bin,  pch = 16,
          inset = -0.15, xpd = TRUE, horiz = TRUE)
    dev.off()
    write.xlsx(plot.data, "./results/Figure1/1B.PRO_PCA3d.xlsx", overwrite = T)
    
    # variance percent of each PC
    p <- fviz_eig(res.pca.comp)
    var_explained <- get_eig(res.pca.comp)
    # var_explained <- res.pca.comp$sdev^2 / sum(res.pca.comp$sdev^2)
    ggsave("./results/Figure1/1B.PRO_PCA_percent.pdf",p,width = 5, height = 5)
    write.xlsx(var_explained, "./results/Figure1/1B.PRO_PCA_percant.xlsx",rowNames=T, overwrite = T)
    

    1.2 Volcano plot

    library(limma)
    library(tidyverse)
    library(openxlsx)
    library(ggpubr)
    library(ggthemes)
    #----------------------------------------------------------------------------------
    #  Step 1: Load Proteomics data
    #----------------------------------------------------------------------------------
    exp  <- readRDS("./data/proteomics/20230412_PDAC_PRO_exp.rds") # 4787 protein * 281 samples
    meta <- readRDS("./data/proteomics/20230412_PDAC_PRO_meta.rds")
    identical(rownames(meta),colnames(exp)) # check names
    #----------------------------------------------------------------------------------
    #  Step 2: limma
    #----------------------------------------------------------------------------------
    meta      <- meta %>% mutate(contrast=as.factor(Type)) 
    design    <- model.matrix(~ 0 + contrast , data = meta) # un-paired
    fit       <- lmFit(exp, design)
    contrast  <- makeContrasts( Tumor_Normal = contrastT - contrastN , levels = design)
    fits      <- contrasts.fit(fit, contrast)
    ebFit     <- eBayes(fits)
    limma.res <- topTable(ebFit, coef = "Tumor_Normal", adjust.method = 'fdr', number = Inf)
    ## result
    limma.res <- limma.res %>% filter(!is.na(adj.P.Val)) %>% 
                  mutate( logP = -log10(adj.P.Val) ) %>%
                  mutate( tag = "Tumor -vs- Normal")%>%
                  mutate( Gene = ID)
    # cutoff:  FC:1.5   adj.p:0.05
    limma.res <- limma.res %>% mutate(group=case_when(
                                      (adj.P.Val < 0.05 & logFC > 0.58) ~ "up",
                                      (adj.P.Val < 0.05 & logFC < -0.58) ~ "down",
                                      .default = "not sig"))
    table(limma.res$group) # UP:1213 ; DOWN:864 ; not:2710
    ## output
    write.xlsx( limma.res, "./results/Figure1/1C.PRO_Limma_fc1.5.xlsx", overwrite = T, rowNames = F) 
    #----------------------------------------------------------------------------------
    #  Step 3: Vasualization 
    #----------------------------------------------------------------------------------
    ## volcano
    limma.res <- limma.res %>% mutate(group=factor(group,levels = c("up","down","not sig")))
    my_label <- paste0( "FC:1.5 ; AdjP:0.05 ; ",
                        "Up:",table(limma.res$group)[1]," ; ",
                        "Down:",table(limma.res$group)[2])
    p <- ggscatter(limma.res,
                   x = "logFC", y = "logP",
                   color = "group", size = 2,
                   main = paste0("Tumor -vs- TAC"), # ***
                   xlab = "log2FoldChange", ylab = "-log10(adjusted P.value)",
                   palette = c("#D01910","#00599F","#CCCCCC"),
                   ylim = c(-1, 70),xlim=c(-8,8))+
      theme_base()+
      geom_hline(yintercept = -log10(0.05), linetype="dashed", color = "#222222") +
      geom_vline(xintercept = 0.58 , linetype="dashed", color = "#222222") +
      geom_vline(xintercept = -0.58, linetype="dashed", color = "#222222") +
      labs(subtitle = my_label)+
      theme(plot.background = element_blank())
    ggsave("./results/Figure1/1C.PRO_Limma_fc1.5.pdf", p, width = 10, height = 10)
    

    1.3 (d) Enrichment plot

    # data : enriched pathways table
    plot.data <- read.xlsx("./data/Extended Data Table 3.xlsx", sheet = 2, startRow = 2)
    plot.pathway <- c("GO:0006730~one-carbon metabolic process","GO:0006888~ER to Golgi vesicle-mediated transport","hsa00020:Citrate cycle (TCA cycle)","hsa00071:Fatty acid degradation","hsa00190:Oxidative phosphorylation","hsa00250:Alanine, aspartate and glutamate metabolism","hsa00260:Glycine, serine and threonine metabolism","hsa00280:Valine, leucine and isoleucine degradation","hsa00480:Glutathione metabolism","hsa00520:Amino sugar and nucleotide sugar metabolism","hsa00620:Pyruvate metabolism","hsa00630:Glyoxylate and dicarboxylate metabolism","hsa00640:Propanoate metabolism","hsa01100:Metabolic pathways","hsa01200:Carbon metabolism","hsa01212:Fatty acid metabolism","hsa01240:Biosynthesis of cofactors","hsa03010:Ribosome","hsa03060:Protein export","hsa04141:Protein processing in endoplasmic reticulum","hsa04972:Pancreatic secretion","GO:0001916~positive regulation of T cell mediated cytotoxicity","GO:0006096~glycolytic process","GO:0007165~signal transduction","GO:0007266~Rho protein signal transduction","GO:0045087~innate immune response","GO:0050778~positive regulation of immune response","GO:0050853~B cell receptor signaling pathway","GO:0050870~positive regulation of T cell activation","GO:0071346~cellular response to interferon-gamma","hsa04015:Rap1 signaling pathway","hsa04062:Chemokine signaling pathway","hsa04066:HIF-1 signaling pathway","hsa04151:PI3K-Akt signaling pathway","hsa04512:ECM-receptor interaction","hsa04610:Complement and coagulation cascades","hsa04621:NOD-like receptor signaling pathway","hsa04666:Fc gamma R-mediated phagocytosis")
    plot.data <- plot.data %>% filter(Term %in% plot.pathway) %>% 
                        mutate(LogFDR= -log10(FDR))
    # plot
    color.bin <- c("#00599F","#D01910")
    p <- ggscatter(plot.data, x = "LogFDR", y = "Fold.Enrichment", color = "Type",
                   main = "Enrichment of tumor/TAC protein",
                   size = "Ratio", shape = 16,
                   label = plot.data$Term, palette = color.bin) + theme_base()
    p <- p + scale_size(range = c(4,20)) + 
             scale_x_continuous(limit = c(-10, 40)) + 
             theme(plot.background = element_blank())
    ggsave(paste0("./results/Figure1/1D.PRO_deg.tn_enrich.pdf"), p, width = 11, height = 10)
    

    2. Figure 2

    2.1 (a) WGCNA

    library(WGCNA)
    dir.create("./results/Figure2/WGCNA",recursive = T)
    #----------------------------------------------------------------------------------
    #  Step 1: Loading expression data and set parameters
    #----------------------------------------------------------------------------------
    protein.nona.tumor <- readRDS("./data/proteomics/wgcna/20230412_PDAC_PRO_Tumor_exp.rds")
    corType = "pearson"
    corFnc = ifelse(corType=="spearman", cor, bicor)
    maxPOutliers = ifelse(corType=="pearson",1, 0.05)
    # input data normalization
    plot.mat <- t(protein.nona.tumor - rowMeans(protein.nona.tumor)) # row: samples | col: genes
    #----------------------------------------------------------------------------------
    #  Step 2: identification of outlier samples
    #----------------------------------------------------------------------------------
    sampleTree = hclust(dist(plot.mat), method = "ward.D")
    pdf(paste0("./results/Figure2/WGCNA/1.tree.pdf"), width = 20, height = 9)
    plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="")
    dev.off()
    
    #----------------------------------------------------------------------------------
    #  Step 3: analysis of network topology
    #----------------------------------------------------------------------------------
    # Choose a set of soft-thresholding powers
    powers = c(c(1:10), seq(from = 12, to = 30, by = 2))
    # Call the network topology analysis function
    sft = pickSoftThreshold(plot.mat, powerVector=powers, 
                            networkType="unsigned", verbose=5)
    # Plot the results:
    pdf(paste0("./results/Figure2/WGCNA/2.power.pdf"), width = 12, height = 8)
    par(mfrow = c(1,2))
    cex1 = 0.9
    plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
         xlab="Soft Threshold (power)",
         ylab="Scale Free Topology Model Fit, signed R^2",type="n",
         main = paste("Scale independence"))
    text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
         labels=powers,cex=cex1,col="red")
    abline(h=0.85,col="red")
    plot(sft$fitIndices[,1], sft$fitIndices[,5],
         xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
         main = paste("Mean connectivity"))
    text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, 
         cex=cex1, col="red")
    dev.off()
    
    #----------------------------------------------------------------------------------
    #  Step 4: soft threshold : power
    #----------------------------------------------------------------------------------
    power = sft$powerEstimate
    power
    #----------------------------------------------------------------------------------
    #  Step 5: One-step network construction and module detection
    #----------------------------------------------------------------------------------
    net = blockwiseModules(plot.mat, power = power, maxBlockSize = ncol(plot.mat),
                           TOMType = "signed", minModuleSize = 20, 
                           reassignThreshold = 0, mergeCutHeight = 0.0001,
                           numericLabels = TRUE, pamRespectsDendro = FALSE,
                           saveTOMs=TRUE, corType = corType, 
                           maxPOutliers = 1, loadTOMs=TRUE,
                           randomSeed = 931, # seed
                           saveTOMFileBase = paste0("./results/Figure2/WGCNA/WGCNA.tom"),
                           verbose = 3, pearsonFallback = "none", deepSplit = 3 )
    # module:
    table(net$colors) # 0 corresponds to unclassified genes
    # Convert [number] labels to [colors] for plotting
    moduleLabels = net$colors
    moduleColors = labels2colors(moduleLabels)
    # plot
    pdf(paste0( "./results/Figure2/WGCNA/3.module.pdf"), width = 8, height = 6)
    plotDendroAndColors(net$dendrograms[[1]], moduleColors[net$blockGenes[[1]]],
                        "Module colors",
                        dendroLabels = FALSE, hang = 0.03,
                        addGuide = TRUE, guideHang = 0.05)
    dev.off()
    
    #----------------------------------------------------------------------------------
    #  Step 6: Vasualization
    #----------------------------------------------------------------------------------
    ## data
    node.data <- readRDS("./data/proteomics/wgcna/2D-node.data.rds")
    edge.data <- readRDS("./data/proteomics/wgcna/2D-edge.data.rds")
    color.module <- c("#CCCCCC","#ecb888","#af88bb","#a032cb","#efbed6","#fc496a","#b6d37f","#589336","#7fd68e","#52c465","#3372e0","#84d7f6","#5394c3","#6376b3","#7f6cd7","#c4ceff","#fc9d40","#5c95e0","#cd7560","#ff70e4", "#ff8738", "#ffcead","#1cbf8b", "#b76d38", "#1584ff", "#7f006d", "#ffd35f","#E66F73","#F57F20","#1DBB95","#9CB79F","#F0B8D2","#A0485E","#A0688E","#C7E1DF","#51B1DF","#6D97D7","#5D6193","#CEC3E0","#A9917E","#7C7D80","#F4E192","#ADD666")
    names(color.module) <- paste0("ME", seq(0,length(color.module)-1) %>% str_pad(width = 2,side = "left",pad = "0"))
    ## plot network 
    gg <- ggplot()
    gg <- gg + geom_segment(mapping = aes(x = from.x, y = from.y, xend = to.x, yend = to.y),
                            color = "#CCCCCC", size = 0.01, data = edge.data) # draw a straight line
    gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = Module), 
                          size = 2, data = node.data) # add point
    gg <- gg + scale_size(range = c(0, 6) * 2) # specifies the minimum and maximum size 
    gg <- gg + theme_void()
    gg <- gg + labs(x = "", y = "", title = paste0("PPI"))
    gg <- gg + scale_colour_manual(values = color.module)
    ggsave(paste0("./results/Figure2/2A-PRO-modules.png"), gg, width = 10, height = 8)
    

    2.2 (b) Significantly up/down-regulated proteins

    #----------------------------------------------------------------------------------------------
    #  Overlay the log2 fold change between tumor and TAC on PRO in the WGCNA network
    #----------------------------------------------------------------------------------------------
    ## data 
    node.data <- readRDS("./data/proteomics/wgcna/2D-node.data.rds") # DEG & module genes
    edge.data <- readRDS("./data/proteomics/wgcna/2D-edge.data.rds")
    ## plot
    gg <- ggplot()
    gg <- gg + geom_segment(mapping = aes(x = from.x, y = from.y, xend = to.x, yend = to.y),
                            color = "#CCCCCC", size = 0.01, data = edge.data)
    gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = proSig), size = 1,
                          data = node.data[which(node.data$proSig == "zz"), ])
    gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = proSig), size = 2.5,
                          data = node.data[which(node.data$proSig != "zz"), ])
    gg <- gg + scale_size(range = c(0, 6) * 2)
    gg <- gg + theme_void()
    gg <- gg + labs(x = "", y = "", title = paste0("color by proSig"))
    gg <- gg + scale_colour_manual(values = c("#00599F","#D01910","#AAAAAA"))
    ggsave(paste0("./results/Figure2/2B-proSig.png"), gg, width = 8.5, height = 8)
    

    2.3 (c) Modules enrichment score

    color.bin <- c("#00599F","#d80700")
    #----------------------------------------------------------------------------------
    #  Step 1: Load the RJ-cohort 1 Data
    #----------------------------------------------------------------------------------
    # Scores of the 32 modules in TACs and PDACs of RJ-cohort 1
    plot.data <- read.xlsx("./data/Extended Data Table 4.xlsx", sheet = 2, startRow = 2, rowNames = T)
    #                       ME01         ME02        ME03       ME04
    # RJ-01-0143-N_PRO 0.3474261 -0.114677437 -0.04291552 0.06708347
    # RJ-01-0768-N_PRO 0.2532539  0.009051615 -0.03585508 0.03486392
    # RJ-01-0697-N_PRO 0.2839048 -0.032966990 -0.02943711 0.05803330
    # RJ-01-0609-N_PRO 0.3206755 -0.087559353 -0.05749618 0.10310726
    plot.info <- NULL
    module.name <- colnames(plot.data)
    plot.stat <- data.frame(Module = module.name,
                            Wilcox.P = NA,
                            Wilcox.Padj = NA)
    #----------------------------------------------------------------------------------
    #  Step 2: wilcox.test
    #----------------------------------------------------------------------------------
    for (i in 1:ncol(plot.data)) {
      sub <- data.frame(Sample = rownames(plot.data),
                        Score = as.numeric(plot.data[, i]), 
                        ScoreScale = scale(as.numeric(plot.data[, i])),
                        Module = colnames(plot.data)[i],
                        Type = substr(rownames(plot.data), 12, 16))
      plot.stat$Wilcox.P[i] <- wilcox.test(sub$ScoreScale ~ sub$Type)$p.value
      plot.info <- rbind(plot.info, sub)
    }
    plot.info <- plot.info %>% mutate(Type=factor(as.character(Type), levels = c("N_PRO","T_PRO"))) 
    plot.stat <- plot.stat %>% mutate(Wilcox.Padj=p.adjust(Wilcox.P, method = "fdr"))
    #----------------------------------------------------------------------------------
    #  Step 3: plot
    #----------------------------------------------------------------------------------
    p <- ggbarplot(plot.info, x = "Module", y = "ScoreScale",
                   color = "Type", fill = "Type",
                   palette = color.bin, width = 0.5, size = 0,
                   add = c("mean_se"), add.params = list(width = 0.5),
                   order = module.name,
                   position = position_dodge(0.6),
                   xlab = "", ylab = "Module Score of Protein")
    p <- p + theme_base() 
    p <- p + geom_hline(yintercept = 0, color = "black")
    p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
    p <- p + stat_compare_means(aes(group = Type, label = ..p.format..), size = 1, method = "wilcox.test", label.y = 1.2)  + theme(plot.background = element_blank())
    #----------------------------------------------------------------------------------
    #  Step 4: output
    #----------------------------------------------------------------------------------
    ggsave(paste0("./results/Figure2/Figure2c-1.pdf"), p, width = 12, height = 4)
    write.xlsx(plot.stat, paste0("./results/Figure2/Figure2c.padj-1.xlsx"))
    

    3. Figure 3

    3.1 (a) LASSO model for PDAC

    library(glmnet)
    library(openxlsx)
    library(survminer)
    library(survival)
    library(tidyverse)
    library(RColorBrewer)
    library(ComplexHeatmap)
    library(forestmodel)
    library(ROCR)
    dir.create("./results/Figure3/lasso",recursive = T)
    #----------------------------------------------------------------------------------
    #  Step 1: load data
    #----------------------------------------------------------------------------------
    ## PDAC Tumor proteomics
    pdac.pro  <- readRDS('./data/proteomics/20230412_PDAC_PRO_exp.rds') 
    pdac.pro  <- pdac.pro [,grep('_T_',colnames(pdac.pro))]   %>% t() %>% as.data.frame() # 191 tumor samples * 4787 protein
    ## WGCNA protein
    wgcna <- readRDS("./data/proteomics/wgcna/2D-node.data.rds")
    pdac.pro <- pdac.pro [,intersect(wgcna$Gene,colnames(pdac.pro))] #  191 samples * 3906 protein
    ## PDAC clinical data
    cli.pdac <- readRDS('./data/clinical/20230412_PDAC_191_patients_clinical.rds')
    rownames(cli.pdac) <- cli.pdac$ID
    ## Lasso data
    y.train <- data.frame(time=cli.pdac$Time_OS,
                          status=cli.pdac$Censoring_OS,
                          row.names = row.names(cli.pdac)) %>% 
               na.omit(y.train) %>% 
               as.matrix() 
    x.train <- pdac.pro[row.names(y.train),] %>% as.matrix() %>% na.omit()
    y.train <- y.train[row.names(x.train),]
    #----------------------------------------------------------------------------------
    #  Step 2: Fit the Lasso Regression Model
    #----------------------------------------------------------------------------------
    set.seed(1000) 
    fit <-glmnet(x=x.train,
                 y=y.train, 
                 family = "cox",  # cox : cox , binomial : logistic
                 alpha  = 1)      # 1 : lasso penalty, 0 : ridge penalty.
    pdf('./results/Figure3/lasso/0.Coefficients.pdf',width = 5,height = 4)
    plot(fit,xvar="lambda",label=F) 
    dev.off()
    # Cross-validation
    cvfit <- cv.glmnet(x=x.train,
                       y=y.train,
                       family = "cox",
                       type.measure="deviance",  
                       alpha  = 1,
                       nfolds = 10)
    png(paste0('./results/Figure3/lasso/S5A-Coeff.png'), width = 5,height = 5,res=1200,units="in")
    plot(fit,xvar="lambda",label=F) 
    abline(v = log(cvfit$lambda.min), col = "black",lty=2)
    text(-4,-4, labels = paste0("lambda.min:  ", round(cvfit$lambda.min,4)),col = "black",cex = 0.75)
    dev.off()
    
    png(paste0('./results/Figure3/lasso/S5B-cvfit.png'), width = 5,height = 5,res=1200,units="in")
    plot(cvfit)
    text(-3,250, labels = paste0("lambda.min:  ", round(cvfit$lambda.min,4)),col = "red",cex = 0.75)
    text(-3,200, labels = paste0("lambda.1se:  ", round(cvfit$lambda.1se,4)),col = "black",cex = 0.75)
    dev.off()
    
    #----------------------------------------------------------------------------------
    #  Step 3: Best Lasso model
    #----------------------------------------------------------------------------------
    lasso_best <- glmnet(x = x.train, 
                         y = y.train, 
                         alpha = 1,                    
                         lambda = cvfit$lambda.min,   
                         nfolds = 10, 
                         family = "cox")
    gene.coef  <- as.matrix(round(coef(lasso_best), 4))
    coef.lasso <- gene.coef[which(gene.coef[, 1] != 0), ]
    coef.lasso <- data.frame(genes=names(coef.lasso),
                       coefficient=as.numeric(coef.lasso)) %>% arrange(-coefficient)
    write.xlsx(coef.lasso, "./results/Figure3/lasso/2.coef_PDAC.Pro.xlsx", rowNames = F,colNames = T,overwrite = T)
    
    ## Lasso score 
    y <- as.data.frame(y.train) %>%
            mutate(LassoScore=predict(lasso_best,s=cvfit$lambda.min,newx=x.train)) %>% 
            mutate(LassoScore=as.numeric(LassoScore), ID=rownames(y.train))
    y <- y %>% mutate(LASSO.level=ifelse(LassoScore>median(LassoScore),"High" ,"Low")) %>%
               mutate(LASSO.level=factor(LASSO.level,levels = c( "Low","High"))) ## cutoff : median
    write.xlsx(y,paste0('./results/Figure3/lasso/3.Lasso_Score.xlsx'),rowNames = T,colNames = T,overwrite = T)
    #----------------------------------------------------------------------------------
    #  Step 4: Visualization
    #----------------------------------------------------------------------------------
    ## bar plot : gene cofficient 
    df.coef <- coef.lasso
    rownames(df.coef) <- df.coef$genes
    df.coef <- df.coef %>% transmute(gene=genes,cor=coefficient) %>% 
                    mutate(gene=paste0(gene," (",cor,")")) %>% 
                    arrange(cor) %>%  
                    mutate(gene=factor(gene,levels = gene)) %>% 
                    mutate(group=ifelse(cor>0,"A","B"))
    p1 <- ggplot(df.coef, aes(gene, cor, fill = group)) + 
      geom_bar(stat = 'identity') +
      ylab('coefficient') +
      xlab('') +
      guides(colour=F,fill=F) + 
      theme_bw(base_size = 12) + 
      scale_fill_manual(values = brewer.pal(9,'Set1')) +
      theme(panel.grid =element_blank(),
            axis.text = element_text(colour = 'black'),
            axis.text.x = element_text(size = 8)) +
      coord_flip()
    ggsave("./results/Figure3/5A-Lasso_cofficient_barplot.png",p1,width = 3,height = 4)
    
    ## dot plot : lasso score
    df.score <- y
    df.score <- df.score %>% mutate(samples=rownames(df.score))  %>% 
                    arrange(LassoScore) %>% 
                    mutate(samples=factor(samples,levels = samples))
    p2 <- ggscatter(df.score, x="samples", y="LassoScore", color = "LASSO.level",
                    palette = c("#0ac9c9","#ea358d"))+
      geom_hline(yintercept=median(df.score$LassoScore), linetype=2,color='black') +
      geom_vline(xintercept=median(1:nrow(df.score))+0.5, linetype=2,color='black') +
      annotate('text', 
               x=median(1:nrow(df.score))+15, 
               y=median(df.score$LassoScore)+0.5, 
               label=paste0('cutoff: ', round(median(df.score$LassoScore),4)),size=5, color='black') +
      ylab('PDAC Score') +
      xlab('') 
    ggsave("./results/Figure3/5A-Lasso_score_dotplot.png",p2,width = 8,height = 4)
    
    ## Heatmap : protein abundance
    heat.data <- t(x.train)
    heat.data <- heat.data[rev(rownames(df.coef)),row.names(df.score)]
    heat.data <- t(scale(t(heat.data)))
    png("./results/Figure3/5A-Lasso_gene_heatmap.png",width = 8,height = 4,res=1200,units="in")
    p3 <- Heatmap(heat.data, name = "Z-score",
                  col = colorRampPalette(c("#00599F","#1c7ec9","#FFFFFF",
                                            "#e53f39","#D01910"))(100),
                  cluster_rows = F,cluster_columns = F,show_column_names = F)
    draw(p3)
    dev.off()
    
    #----------------------------------------------------------------------------------
    #  Step 5: receiver operating characteristic (ROC)
    #----------------------------------------------------------------------------------
    # performance of model
    lasso.prob <- predict(cvfit, newx=x.train , family = "cox",
                          s=c(cvfit$lambda.min, cvfit$lambda.1se) )
    re <- cbind(as.data.frame(y.train),lasso.prob)  %>% as.data.frame()
    colnames(re) <- c('time','status','prob_min','prob_1se')
    # model: lambda.min
    pred_min <- prediction(predictions=re$prob_min, labels=re$status) # predictions: containing the prediction. labels: true class labels
    perf_min <- performance(pred_min,"tpr","fpr")
    auc_min  <- performance(pred_min,"auc")@y.values[[1]]
    # mocel: lambda.1se
    pred_1se <- prediction(re$prob_1se, re$status)
    perf_1se <- performance(pred_1se,"tpr","fpr")
    auc_1se  <- performance(pred_1se,"auc")@y.values[[1]]
    ## plot
    png(paste0('./results/Figure3/lasso/S5C-ROC.png'), width = 5,height = 5,res=1200,units="in")
    plot(perf_min,colorize=F, col="red") 
    plot(perf_1se,colorize=F, col="blue",add = T)
    lines(c(0,1),c(0,1), col = "gray", lty = 4 )
    text(0.7,0.3, labels = paste0("lambda.min: AUC=", round(auc_min,2)),col = "red")
    dev.off()
    
    #----------------------------------------------------------------------------------
    #  Step 6: Multivariate analysis
    #----------------------------------------------------------------------------------
    # data
    y_forest <- y %>% left_join(cli.pdac , by =c('ID'='ID'))
    y_forest <- y_forest %>% select(  c('time','status','LASSO.level',
                            'age','gender','CA125',
                            'CA19_9','CEA','Smoking',"Drinking",'AJCC_V8',
                            'Censoring_chemo'))
    y_forest <- y_forest %>% rename(LassoLevel=LASSO.level,
                                    Age=age,Gender=gender,
                                    CA199=CA19_9,
                                    Adjuvant_therapy=Censoring_chemo,
                                    AJCC_stage=AJCC_V8)
    y_forest <- y_forest %>% mutate(Gender=factor(Gender,levels = c("Female","Male")))
    # AJCC_stage
    table(y_forest$AJCC_stage)
    a <- y_forest$AJCC_stage
    b <- recode(a, Ia = "I", Ib = "I",IIa= "II", IIb= "II",III="III",IV="IV")
    y_forest$AJCC_stage <- factor(b,levels = c("I","II","III"))
    # coxph
    cfit <- coxph(Surv(time,status)~LassoLevel+Age+Gender+CA125+CA199+
                    CEA+Smoking+Drinking+AJCC_stage+Adjuvant_therapy,
                  data = y_forest)
    # Forest plots
    p <- forest_model(cfit,
          format_options = forest_model_format_options(colour = brewer.pal(9,'Set1')[2],                 shape = 16,text_size =  15,point_size = 2.5, banded = T),
          factor_separate_line = F )
    ggsave(paste0("./results/Figure3/S5D-multivariate_forest.png"), p,  width = 8, height = 6)
    

    3.2 (b) Kaplan-Meier curves in OS/DFS

    #----------------------------------------------------------------------------------
    #  Step 1: Loading data 
    #----------------------------------------------------------------------------------
    # (1) Sample information and clinical characteristics of the 191 PDAC patients (the RJ-cohort 1)
    rj1.cohort <- read.xlsx("./data/Extended Data Table 2.xlsx", startRow = 2)
    #           ID     Proteomic_ID        RNAseq_ID Age Gender      BMI DM Smoking
    # 1 RJ-01-0020 RJ-01-0020-T_PRO RJ-01-0020-T_RNA  61 Female 19.63000  1       0
    # 2 RJ-01-0038 RJ-01-0038-T_PRO RJ-01-0038-T_RNA  73   Male 22.46003  0       0
    # 3 RJ-01-0050 RJ-01-0050-T_PRO RJ-01-0050-T_RNA  69 Female 17.63085  0       0
    # 4 RJ-01-0069 RJ-01-0069-T_PRO RJ-01-0069-T_RNA  69 Female 17.48179  1       0
    # 5 RJ-01-0070 RJ-01-0070-T_PRO RJ-01-0070-T_RNA  65   Male 23.43750  0       0
    # 6 RJ-01-0074 RJ-01-0074-T_PRO RJ-01-0074-T_RNA  61   Male 23.66144  0       0
    rj1.cohort <- rj1.cohort %>%  mutate(LassoLevel=factor(as.character(LassoLevel), levels = c("Low","High")))
    color.bin.lasso <- c("#00599F","#d80700")
    #----------------------------------------------------------------------------------
    #  Step 2: OS survival stratified by lasso level
    #----------------------------------------------------------------------------------
    info <- summary(coxph(Surv(OS_month, OS_status) ~ LassoLevel, data = rj1.cohort))
    anno.text <- ""
    for (i in 1:nrow(info$conf.int)) {
      anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
    }
    anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ LassoLevel, data = rj1.cohort)$pvalue, 4) )
    anno.text <- str_replace_all(anno.text, "LassoLevel", "")
    fit <- survfit(Surv(OS_month, OS_status) ~ LassoLevel, data = rj1.cohort)
    p1 <- ggsurvplot(fit, 
                     data = rj1.cohort,
                     xlab = 'Time (Months)',
                     pval = TRUE,
                     risk.table = TRUE, 
                     risk.table.height = 0.28,
                     conf.int.alpha = 0.05,
                     conf.int = TRUE, 
                     palette = color.bin.lasso,
                     axes.offset = TRUE,
                     break.time.by = 12,  xlim = c(0, 48),
                     title= paste0("OS LassoLevel \n", anno.text))
    #----------------------------------------------------------------------------------
    #  Step 2: DFS survival stratified by lasso level
    #----------------------------------------------------------------------------------
    info <- summary(coxph(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj1.cohort))
    anno.text <- ""
    for (i in 1:nrow(info$conf.int)) {
      anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
    }
    anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj1.cohort)$pvalue, 4) )
    anno.text <- str_replace_all(anno.text, "LassoLevel", "")
    fit <- survfit(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj1.cohort)
    p2 <- ggsurvplot(fit, 
                     data = rj1.cohort,
                     xlab = 'Time (Months)',
                     pval = TRUE,
                     risk.table = TRUE, 
                     risk.table.height = 0.28,
                     conf.int.alpha = 0.05,
                     conf.int = TRUE, 
                     palette = color.bin.lasso,
                     axes.offset = TRUE,
                     break.time.by = 12,  xlim = c(0, 48),
                     title= paste0("DFS LassoLevel \n", anno.text))
    ## output
    p <- arrange_ggsurvplots(list(p1, p2), ncol = 2, nrow = 1, print = FALSE)
    ggsave(paste0("./results/Figure3/Figure3b.pdf"), p, width = 16, height = 10)
    

    3.3 (c) Lasso proteins in module

    ## data
    node.data <- readRDS("./data/proteomics/wgcna/2D-node.data.rds")
    edge.data <- readRDS("./data/proteomics/wgcna/2D-edge.data.rds")
    ## lasso gene plot
    color.module <- c("#ecb888","#af88bb","#a032cb","#efbed6","#fc496a","#b6d37f","#589336","#7fd68e","#52c465","#3372e0","#84d7f6","#5394c3","#6376b3","#7f6cd7","#c4ceff","#fc9d40","#5c95e0","#cd7560","#ff70e4", "#ff8738", "#ffcead","#1cbf8b", "#b76d38", "#1584ff", "#7f006d", "#ffd35f","#E66F73","#F57F20","#1DBB95","#9CB79F","#F0B8D2","#A0485E","#A0688E","#C7E1DF","#51B1DF","#6D97D7","#5D6193","#CEC3E0","#A9917E","#7C7D80","#F4E192","#ADD666")
    gg <- ggplot()
    gg <- gg + geom_segment(mapping = aes(x = from.x, y = from.y, xend = to.x, yend = to.y), color = "#CCCCCC", size = 0.01, data = edge.data)
    gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = Module), size = 1, data = node.data)
    gg <- gg + geom_point(mapping = aes(x = pos.x, y = pos.y, color = Module), fill = "red", size = 5, shape = 21, data = node.data[which(node.data$ISlassoGene == "yes"), ])
    gg <- gg + geom_text(mapping = aes(x = pos.x, y = pos.y, label = Gene), size = 5, color = "black", data = node.data[which(node.data$ISlassoGene == "yes"), ])
    gg <- gg + scale_size(range = c(0, 6) * 2)
    gg <- gg + theme_void()
    gg <- gg + labs(x = "", y = "", title = paste0("color by ISlassoGene"))
    gg <- gg + scale_colour_manual(values = color.module)
    gg <- gg + theme(legend.position = "none")
    ggsave(paste0("./results/Figure3/3c-Lasso_gene_in_module.png"), gg, width = 8, height = 8.2)
    

    3.4 (d) LassoScore~ModuleScore

    library(ggcorrplot)
    #----------------------------------------------------------------------------------
    #  Step 1: Load the Data
    #----------------------------------------------------------------------------------
    # (1) Sample information and clinical characteristics of the 191 PDAC patients (the RJ-cohort 1)
    rj1.cohort <- read.xlsx("./data/Extended Data Table 2.xlsx", startRow = 2)
    #           ID     Proteomic_ID        RNAseq_ID Age Gender      BMI DM Smoking
    # 1 RJ-01-0020 RJ-01-0020-T_PRO RJ-01-0020-T_RNA  61 Female 19.63000  1       0
    # 2 RJ-01-0038 RJ-01-0038-T_PRO RJ-01-0038-T_RNA  73   Male 22.46003  0       0
    # 3 RJ-01-0050 RJ-01-0050-T_PRO RJ-01-0050-T_RNA  69 Female 17.63085  0       0
    # 4 RJ-01-0069 RJ-01-0069-T_PRO RJ-01-0069-T_RNA  69 Female 17.48179  1       0
    # 5 RJ-01-0070 RJ-01-0070-T_PRO RJ-01-0070-T_RNA  65   Male 23.43750  0       0
    # 6 RJ-01-0074 RJ-01-0074-T_PRO RJ-01-0074-T_RNA  61   Male 23.66144  0       0
    
    # (2) Scores of the 32 modules in TACs and PDACs of RJ-cohort 1. 
    module.ssgsea.pro.all <- read.xlsx("./data/Extended Data Table 4.xlsx", sheet = 2, startRow = 2, rowNames = T)
    module.ssgsea.pro.all <- t(module.ssgsea.pro.all)
    #      RJ-01-0143-N_PRO RJ-01-0768-N_PRO RJ-01-0697-N_PRO RJ-01-0609-N_PRO RJ-01-1055-N_PRO
    # ME01       0.34742611      0.253253916       0.28390477       0.32067554      0.325797851
    # ME02      -0.11467744      0.009051615      -0.03296699      -0.08755935     -0.096947986
    # ME03      -0.04291552     -0.035855079      -0.02943711      -0.05749618      0.016639009
    # ME04       0.06708347      0.034863920       0.05803330       0.10310726      0.007498371
    # ME05       0.19099018      0.100153070       0.14246697       0.16675071      0.127932887
    # ME06       0.03893163      0.240865770       0.14178269       0.07618613      0.038306853
    #----------------------------------------------------------------------------------
    #  Step 2:  Spearman correlation : lasso score ~ module score
    #----------------------------------------------------------------------------------
    plot.mat <- data.frame(t(module.ssgsea.pro.all[, rj1.cohort$Proteomic_ID]), LassoScore = rj1.cohort[, c("LassoScore")] )
    corr <- cor(plot.mat, use = "complete.obs", method = "spearman")
    p.mat <- cor_pmat(plot.mat, use = "complete.obs", method = "spearman")
    p.adj <- p.mat
    for (i in 1:ncol(p.adj)) {
      p.adj[, i] <- p.adjust(p.adj[, i], method = "fdr")
    }
    write.xlsx(list(Corr = corr, p.mat = p.mat, p.adj = p.adj), "./results/Figure3/Figure3d-1.xlsx", overwrite = T, rowNames = T)
    # plot
    pdf(paste0("./results/Figure3/Figure3d-1.pdf"), width = 10, height = 10)
    corrplot::corrplot(corr[, ], method = "ellipse",
                       col = colorRampPalette(c("#1B3361","#76C7FF","#FFFFFF","#FF987A","#6A0D28"))(200)) 
    dev.off()
    
    #----------------------------------------------------------------------------------
    #  Step 3:  Spearman correlation : lasso score ~ module score in Cao et al cohort
    #----------------------------------------------------------------------------------
    module.ssgsea.pro.cell <- readRDS("./data/Figure2/cao.module.ssgsea.pro.rds")
    cell.meta.data <- read.xlsx("./data/Figure2/cao.meta.data.xlsx")
    cell.meta.data$LassoLevel <- factor(as.character(cell.meta.data$LassoLevel), levels = c("Low","High"))
    plot.mat <- data.frame(t(module.ssgsea.pro.cell[, cell.meta.data$Proteomic_ID]), LassoScore = cell.meta.data[, c("LassoScore")] )
    # Spearman correlation
    corr <- cor(plot.mat, use = "complete.obs", method = "spearman")
    p.mat <- cor_pmat(plot.mat, use = "complete.obs", method = "spearman")
    p.adj <- p.mat
    for (i in 1:ncol(p.adj)) {
      p.adj[, i] <- p.adjust(p.adj[, i], method = "fdr")
    }
    write.xlsx(list(Corr = corr, p.mat = p.mat, p.adj = p.adj), "./results/Figure3/Figure3d-2.xlsx", overwrite = T, rowNames = T)
    # plot 
    pdf(paste0("./results/Figure3/Figure3d-2.pdf"),  width = 10, height = 10)
    corrplot::corrplot(corr[, ], method = "ellipse",
                       col = colorRampPalette(c("#1B3361","#76C7FF","#FFFFFF","#FF987A","#6A0D28"))(200)) 
    dev.off()
    

    3.5 (e) Cao et al validation

    ## data
    cell.meta.data <- read.xlsx("./data/Figure2/cao.meta.data.xlsx")
    color.bin.lasso <- c("#00599F","#d80700")
    cell.meta.data <- cell.meta.data %>% mutate(LassoLevel=factor(LassoLevel,level=c("Low","High")))
    ## coxph
    info <- summary(coxph(Surv(OS_month, OS_status) ~ LassoLevel, data = cell.meta.data))
    anno.text <- ""
    for (i in 1:nrow(info$conf.int)) {
      anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
    }
    anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ LassoLevel, data = cell.meta.data)$pvalue, 4) )
    anno.text <- str_replace_all(anno.text, "LassoLevel", "")
    fit <- survfit(Surv(OS_month, OS_status) ~ LassoLevel, data = cell.meta.data)
    p1 <- ggsurvplot(fit, 
                     data = cell.meta.data,
                     xlab = 'Time (Months)',
                     pval = TRUE,
                     risk.table = TRUE, 
                     risk.table.height = 0.28,
                     conf.int.alpha = 0.05,
                     conf.int = TRUE, 
                     palette = color.bin.lasso,
                     axes.offset = TRUE,
                     break.time.by = 12,  xlim = c(0, 48),
                     title= paste0("OS LassoLevel \n", anno.text))
    p <- arrange_ggsurvplots(list(p1), ncol = 1, nrow = 1, print = FALSE)
    ggsave(paste0("./results/Figure3/Figure3e.pdf"), p, width = 8, height = 10)
    

    3.6 (f) RJ-cohort 2 validation

    #----------------------------------------------------------------------------------
    #  Step 1: Load the Data
    #----------------------------------------------------------------------------------
    # The LASSO score in RJ-cohort 2
    rj2.cohort <- read.xlsx("./data/Extended Data Table 5.xlsx", sheet = 3, startRow = 2)
    rj2.cohort <- rj2.cohort %>% mutate(LassoLevel=factor(as.character(LassoLevel), levels = c("Low","High")))
    color.bin.lasso <- c("#00599F","#d80700")
    #----------------------------------------------------------------------------------
    #  Step 2: OS survival stratified by lasso level
    #----------------------------------------------------------------------------------
    info <- summary(coxph(Surv(OS_month, OS_status) ~ LassoLevel, data = rj2.cohort))
    anno.text <- ""
    for (i in 1:nrow(info$conf.int)) {
      anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
    }
    anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(OS_month, OS_status) ~ LassoLevel, data = rj2.cohort)$pvalue, 4) )
    anno.text <- str_replace_all(anno.text, "LassoLevel", "")
    fit <- survfit(Surv(OS_month, OS_status) ~ LassoLevel, data = rj2.cohort)
    p1 <- ggsurvplot(fit, 
                     data = rj2.cohort,
                     xlab = 'Time (Months)',
                     pval = TRUE,
                     risk.table = TRUE, 
                     risk.table.height = 0.28,
                     conf.int.alpha = 0.05,
                     conf.int = TRUE, 
                     palette = color.bin.lasso,
                     axes.offset = TRUE,
                     break.time.by = 12,  xlim = c(0, 48),
                     title= paste0("OS LassoLevel \n", anno.text))
    #----------------------------------------------------------------------------------
    #  Step 3: DFS survival stratified by lasso level
    #----------------------------------------------------------------------------------
    info <- summary(coxph(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj2.cohort))
    anno.text <- ""
    for (i in 1:nrow(info$conf.int)) {
      anno.text <- paste0(anno.text, "\n", paste0(rownames(info$conf.int)[i], " HR=", round(info$conf.int[i, 1], 3), " CI=", round(info$conf.int[i, 3], 3), "-", round(info$conf.int[i, 4], 3), " P=", signif(info$coefficients[i, 5], 4) ))
    }
    anno.text <- paste0(anno.text, "\nKaplan-Meier P=", signif(survdiff(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj2.cohort)$pvalue, 4) )
    anno.text <- str_replace_all(anno.text, "LassoLevel", "")
    fit <- survfit(Surv(DFS_month, DFS_status) ~ LassoLevel, data = rj2.cohort)
    p2 <- ggsurvplot(fit, 
                     data = rj2.cohort,
                     xlab = 'Time (Months)',
                     pval = TRUE,
                     risk.table = TRUE, 
                     risk.table.height = 0.28,
                     conf.int.alpha = 0.05,
                     conf.int = TRUE, 
                     palette = color.bin.lasso,
                     axes.offset = TRUE,
                     break.time.by = 12,  xlim = c(0, 48),
                     title= paste0("DFS LassoLevelRNA \n", anno.text))
    # output
    p <- arrange_ggsurvplots(list(p1, p2), ncol = 2, nrow = 1, print = FALSE)
    ggsave(paste0("./results/Figure3/Figure3f.pdf"), p, width = 16, height = 10)
    

    相关文章

      网友评论

          本文标题:Nat Med.作者提供全文的绘图代码,对于学习作图很有帮助(一

          本文链接:https://www.haomeiwen.com/subject/ypxxrjtx.html