美文网首页空间转录组空间转录组
【10X空间转录组Visium】(三)跑通Visium全流程记录

【10X空间转录组Visium】(三)跑通Visium全流程记录

作者: Geekero | 来源:发表于2020-03-25 10:20 被阅读0次

    旧号无故被封,小号再发一次

    更多空间转录组文章:

    1. 新版10X Visium
    2. 旧版Sptial

    下载数据集

    https://support.10xgenomics.com/spatial-gene-expression/datasets
    我选择的是:Mouse Brain Section (Coronal)

    $ tar -xvf V1_Adult_Mouse_Brain_fastqs.tar
    $ ls
    V1_Adult_Mouse_Brain_S5_L001_I1_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L001_R2_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L002_R1_001.fastq.gz
    V1_Adult_Mouse_Brain_S5_L001_I2_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L002_I1_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L002_R2_001.fastq.gz
    V1_Adult_Mouse_Brain_S5_L001_R1_001.fastq.gz  V1_Adult_Mouse_Brain_S5_L002_I2_001.fastq.gz
    
    • 同一个样本的测序数据,这里总共有2条lane
    • 每条lane因为是双索引的缘故,所以存在I1 I2 R1 R2共4个fastq文件、
    • 所以总共有8条fastq
      与之对应的情况是:


      image.png

    运行spaceranger count

    此处选择自动对齐的方案
    由于服务器没有连接外网:所以手动下载slide文件
    https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/count

    $ spaceranger count --id=V1_Adult_Mouse_Brain \
                          --transcriptome=/share/nas1/Data/luohb/Visium/reference/refdata-cellranger-mm10-3.0.0/  \
                          --fastqs=/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain_fastqs \
                          --sample=V1_Adult_Mouse_Brain \
                          --image=/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain_image.tif \
                          --slide=V19L01-041 \
                          --area=C1 \
                          --slidefile=/share/nas1/Data/luohb/Visium/test2/V19L01-041.gpr \
                          --localcores=32   \
                          --localmem=128
    

    顺利地跑完了,因为服务器同时还跑着几个比较大的任务,然后居然跑了接近13个小时。。。


    image.png

    查看结果文件

    $ ls
    _cmdline   _finalstate  _jobmode  _mrosource  _perf              _sitecheck              _tags       _uuid                         _vdrkill
    _filelist  _invocation  _log      outs        _perf._truncated_  SPATIAL_RNA_COUNTER_CS  _timestamp  V1_Adult_Mouse_Brain.mri.tgz  _versions
    
    $ cd outs/
    $ ls
    analysis       filtered_feature_bc_matrix     metrics_summary.csv  possorted_genome_bam.bam      raw_feature_bc_matrix     spatial
    cloupe.cloupe  filtered_feature_bc_matrix.h5  molecule_info.h5     possorted_genome_bam.bam.bai  raw_feature_bc_matrix.h5  web_summary.html
    
    
    • 查看web_summary.html


      image.png
      image.png
    • 查看count管道输出几个包含自动二级分析结果的CSV文件
    $cd analysis/
    $ls
    clustering  diffexp  pca  tsne  umap
    

    1. PCA降维结果:

    $cd /pca/10_components
    $ls
    components.csv  dispersion.csv  features_selected.csv  projection.csv  variance.csv
    

    投影

    $head -3 projection.csv 
    Barcode,PC-1,PC-2,PC-3,PC-4,PC-5,PC-6,PC-7,PC-8,PC-9,PC-10
    AAACAAGTATCTCCCA-1,-10.281241313083257,-24.67223115562252,-0.19850052930601336,-2.1734929997144388,6.630976878797487,-0.12128746693282366,6.040708059434257,4.657495740394594,16.344239212184327,6.523601903899456
    AAACAATCTACTAGCA-1,17.830458684877186,-27.53526668134934,15.877302377060623,9.74572143694312,-0.7208195934715782,-4.339470398396214,2.5444608437485288,-5.084679351848514,2.9247276185469495,-1.0731021612191327
    

    components matrix

    $less -S components.csv
    PC,ENSMUSG00000051951,ENSMUSG00000089699,ENSMUSG00000025900,ENSMUSG00000025902,ENSMUSG00000033845,ENSMUSG00000025903,ENSMUSG00000104217,ENSMUSG00000033813,(略……)
    1,9.807402710059275e-05,-0.0007359419037463138,0.0018506647696503106,0.0019216677830155664,-0.009477278899046813,-0.005003056852125207,0.0,-0.008498306263180
    2,-0.0013017257339919546,0.0015759310908915448,0.0013809836795030965,0.0009513422156874659,0.007418499981929492,0.003222355732773671,0.0,0.00887178686827463,
    3,-0.001920230193482586,0.003378841598139873,-0.00012165106820253075,-0.00024897415838216264,-0.0031447165300072175,-0.007787586978438225,0.0,-0.003148852394
    (略……)
    

    总方差的比例

    $head -3 variance.csv
    PC,Proportion.Variance.Explained
    1,0.030645967432188836
    2,0.015067575203691749
    

    归一化的离散度

    $head -3 dispersion.csv
    Feature,Normalized.Dispersion
    ENSMUSG00000051951,0.261762717719762
    ENSMUSG00000089699,-1.5988672040435437
    

    2. t-SNE结果文件:

    $cd ../../tsne/2_components/
    $ls
    projection.csv
    
    $head -5 projection.csv 
    Barcode,TSNE-1,TSNE-2
    AAACAAGTATCTCCCA-1,-18.47081216664088,7.240054873818881
    AAACAATCTACTAGCA-1,-4.219964329936257,-9.182632464702484
    AAACACCAATAACTGC-1,14.744060324279337,13.360913482080413
    AAACAGAGCGACTCCT-1,-11.72411901642397,-7.924228663324808
    

    3. 聚类结果:

    $cd ../../clustering/
    $ls
    graphclust          kmeans_2_clusters  kmeans_4_clusters  kmeans_6_clusters  kmeans_8_clusters
    kmeans_10_clusters  kmeans_3_clusters  kmeans_5_clusters  kmeans_7_clusters  kmeans_9_clusters
    

    对于每个聚类, spaceranger为每个点生成聚类分配cluster assignments

    打开聚类3看看:

    $cd kmeans_3_clusters
    $ls
    clusters.csv
    $head -5 clusters.csv 
    Barcode,Cluster
    AAACAAGTATCTCCCA-1,1
    AAACAATCTACTAGCA-1,3
    AAACACCAATAACTGC-1,2
    AAACAGAGCGACTCCT-1,1
    

    4. 差异表达分析:

    $cd ../../diffexp/
    $ls
    graphclust          kmeans_2_clusters  kmeans_4_clusters  kmeans_6_clusters  kmeans_8_clusters
    kmeans_10_clusters  kmeans_3_clusters  kmeans_5_clusters  kmeans_7_clusters  kmeans_9_clusters
    

    这次看个总表:

    $cd /graphclust
    $ls
    differential_expression.csv
    $head -3 differential_expression.csv 
    Feature ID,Feature Name,Cluster 1 Mean Counts,Cluster 1 Log2 fold change,Cluster 1 Adjusted p value,Cluster 2 Mean Counts,Cluster 2 Log2 fold change,Cluster 2 Adjusted p value,Cluster 3 Mean Counts,Cluster 3 Log2 fold change,Cluster 3 Adjusted p value,Cluster 4 Mean Counts,Cluster 4 Log2 fold change,Cluster 4 Adjusted p value,Cluster 5 Mean Counts,Cluster 5 Log2 fold change,Cluster 5 Adjusted p value,Cluster 6 Mean Counts,Cluster 6 Log2 fold change,Cluster 6 Adjusted p value,Cluster 7 Mean Counts,Cluster 7 Log2 fold change,Cluster 7 Adjusted p value,Cluster 8 Mean Counts,Cluster 8 Log2 fold change,Cluster 8 Adjusted p value,Cluster 9 Mean Counts,Cluster 9 Log2 fold change,Cluster 9 Adjusted p value
    ENSMUSG00000051951,Xkr4,0.09115907843838432,0.15688013442205495,0.9130108472807676,0.08789156406190936,0.094226986457139,1.0,0.059424476860418934,-0.5579910544947899,0.4792687534164091,0.09747791035014447,0.270272692975412,0.7950049780312995,0.08717356987748102,0.14776402072440886,1.0,0.05406634025868632,-0.6310298603360582,0.7980928917515894,0.15030400022885756,0.9570457266970553,0.22931236900985477,0.0606581027791399,-0.4319057525382224,1.0,0.10761817731957228,0.4400508833584902,1.0
    ENSMUSG00000089699,Gm1992,0.0016574377897888059,1.3866145310996707,0.8220253607506287,0.0,0.423008752385563,1.0,0.0,0.22991150489664136,1.0,0.0033613072534532575,2.5793194965660433,0.5338242296758853,0.0,2.3542148981918345,1.0,0.003180372956393313,2.490599584065473,0.8676482778053517,0.0,1.5959470345290159,1.0,0.0,1.4568374963600368,1.0,0.0,2.146642828481177,1.0
    

    5 .矩阵:Feature-Barcode Matrices
    矩阵的每个元素是与特征(行)和条形码(列)关联的UMI的数量。

    $cd /share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs
    $ls
    analysis       filtered_feature_bc_matrix     metrics_summary.csv  possorted_genome_bam.bam      raw_feature_bc_matrix     spatial
    cloupe.cloupe  filtered_feature_bc_matrix.h5  molecule_info.h5     possorted_genome_bam.bam.bai  raw_feature_bc_matrix.h5  web_summary.html
    $tree filtered_feature_bc_matrix
    filtered_feature_bc_matrix
    ├── barcodes.tsv.gz
    ├── features.tsv.gz
    └── matrix.mtx.gz
    0 directories, 3 files
    
    $tree raw_feature_bc_matrix
    raw_feature_bc_matrix
    ├── barcodes.tsv.gz
    ├── features.tsv.gz
    └── matrix.mtx.gz
    0 directories, 3 files
    
    $gzip -cd filtered_feature_bc_matrix/features.tsv.gz |head -3
    ENSMUSG00000051951  Xkr4    Gene Expression
    ENSMUSG00000089699  Gm1992  Gene Expression
    ENSMUSG00000102343  Gm37381 Gene Expression
    

    其中:

    第一列 第二列 第三列
    功能ID 基因名 标识特征的类型
    

    尝试将矩阵加载到R

    library(Matrix)
    matrix_dir = "/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs/filtered_feature_bc_matrix/"
    barcode.path <- paste0(matrix_dir, "barcodes.tsv.gz")
    features.path <- paste0(matrix_dir, "features.tsv.gz")
    matrix.path <- paste0(matrix_dir, "matrix.mtx.gz")
    mat <- readMM(file = matrix.path)
    feature.names = read.delim(features.path, 
                               header = FALSE,
                               stringsAsFactors = FALSE)
    barcode.names = read.delim(barcode.path, 
                               header = FALSE,
                               stringsAsFactors = FALSE)
    colnames(mat) = barcode.names$V1
    rownames(mat) = feature.names$V1
    dim(mat)
    [1] 31053  2698
    

    尝试将矩阵加载到Python

    import csv
    import gzip
    import os
    import scipy.io
     
    matrix_dir = "/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs/filtered_feature_bc_matrix"
    mat = scipy.io.mmread(os.path.join(matrix_dir, "matrix.mtx.gz"))
    
    
    features_path = os.path.join(matrix_dir, "features.tsv.gz")
    feature_ids = [row[0] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
    gene_names = [row[1] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
    feature_types = [row[2] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
    barcodes_path = os.path.join(matrix_dir, "barcodes.tsv.gz")
    barcodes = [row[0] for row in csv.reader(gzip.open(barcodes_path), delimiter="\t")]
    

    6. 看图片

    $cd spatial/
    $ls
    aligned_fiducials.jpg  detected_tissue_image.jpg  scalefactors_json.json  tissue_hires_image.png  tissue_lowres_image.png  tissue_positions_list.csv
    

    tissue_hires_image.png:较高像素的明场图片


    image.png

    tissue_lowres_image.png:较低像素的明场图片


    image.png
    aligned_fiducials.jpg(尺寸与 tissue_hires_image.png相同):用于验证基准对齐是否成功
    image.png

    相应的像素坐标转换文件:scalefactors_json.json

    $cat scalefactors_json.json
    {"spot_diameter_fullres": 89.44476048022638, "tissue_hires_scalef": 0.17011142, "fiducial_diameter_fullres": 144.48769000651953, "tissue_lowres_scalef": 0.05
    

    PS:这部有点像旧流程的ST_spot_detector的步骤了

    其中:

    • issue_hires_scalef:将原始全分辨率图像中的像素位置转换为tissue_hires_image.png中的像素位置的比例因子。
    • tissue_lowres_scalef:将原始全分辨率图像中的像素位置转换为tissue_lowres_image.png中的像素位置的比例因子。
    • fiducial_diameter_fullres:跨越原始全分辨率图像中基准点直径的像素数。
    • spot_diameter_fullres:跨越原始全分辨率图像中组织点直径的像素数。

    detected_tissue_image.jpg:


    image.png

    tissue_positions_list.txt:

    $head -2 tissue_positions_list.csv
    ACGCCTGACACGCGCT-1,0,0,0,1252,1211
    TACCGATCCAACACTT-1,0,1,1,1372,1280
    

    其中列对应着:

    • barcode:与该点相关的条形码的顺序。
    • in_tissue:二进制,指示该斑点位于组织的内部(1)还是外部(0)。
    • array_row:点在阵列中的行坐标从0到77。该阵列有78行。
    • array_col:阵列中点的列坐标。为了表示 the orange crate arrangement of the spots,此列索引对偶数行使用0到126的偶数,对奇数行使用1到127的奇数。注意,每行(偶数或奇数)有64个斑点。
    • pxl_col_in_fullres:全分辨率图像中斑点中心的列像素坐标。
    • pxl_row_in_fullres:全分辨率图像中斑点中心的行像素坐标。

    7. BAM:Barcoded BAM

    $cd outs/
    $samtools view possorted_genome_bam.bam |head -5
    A00984:21:HMKLFDMXX:2:2117:10357:1235   16  1   3000100 255 25M199730N72M23S    *   0   0   TTTTTTTTTTTTTTTTTTTTTTTTGCAAGAAAAAAAATCAGATAACCGAGGAAAATTATTCATTATGAAGTACTACTTTCCACTTCATTTCATCCCATGTACTCTGCGTTGATACCACTG    F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF    NH:i:1  HI:i:1  AS:i:83 nM:i:1  RE:A:I  xf:i:0  ts:i:21 li:i:0  BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:GACGACGATCCGCGTT   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:GACGACGATCCGCGTT-1 UR:Z:CCTGTTTGTTGT   UY:Z:FFFFFFFFFFFF   UB:Z:CCTGTTTGTTGT   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:2
    A00984:21:HMKLFDMXX:1:1306:5041:10034   16  1   3000100 255 25M199611N95M   *   0   0   TTTTTTTTTTTTTTTTTTTTTTTTGAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCA    FFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    NH:i:1  HI:i:1  AS:i:108    nM:i:0  RE:A:I  xf:i:0  li:i:0  BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:TGGTCTGTTGGGCGTA   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TGGTCTGTTGGGCGTA-1 UR:Z:GTTACCCTATGT   UY:Z:FFFFFFFFFFFF   UB:Z:GTTACCCTATGT   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1
    A00984:21:HMKLFDMXX:2:2345:21206:5087   16  1   3010019 255 98M22S  *   0   0   ATAGTGTCCCAGATTTCCTGGCTGTTTCTTGTTAGGATTTTTTTAGATTTAACATTTCTGTCATAGATTAATCTATTTTGCAGATGTAATCCCATGTACTCTGCGTTGATACCACTGCTT    F:FFFFFFFFFFF::FFF:FFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFF    NH:i:1  HI:i:1  AS:i:90 nM:i:3  RE:A:I  xf:i:0  ts:i:30 li:i:0  BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:ACGGTCACCGAGACCCY:Z:FFFFFFFFFFFFF,F:   CB:Z:ACGGTCACCGAGAACA-1 UR:Z:TCGATCTCGTAA   UY:Z:FFFFFFFFFFFF   UB:Z:TCGATCTCGTAA   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:2
    A00984:21:HMKLFDMXX:1:1164:15980:17738  16  1   3013014 255 17M186702N103M  *   0   0   TTTTTTTTTTTTTTTGTTTAAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCAAGTTT    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    NH:i:1  HI:i:1  AS:i:108    nM:i:0  RE:A:I  xf:i:0  li:i:0  BC:Z:ACCAGACAAC QT:Z:FFF,FFFFFF CR:Z:TCAAGGTTACTACACC   CY:Z:FFFFFFFFFFF:FFFF   CB:Z:TCAAGGTTACTACACC-1 UR:Z:CCGGGCAGTTAT   UY:Z:FFFFFFFFFFFF   UB:Z:CCGGGCAGTTAT   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1
    A00984:21:HMKLFDMXX:1:1451:3477:33912   16  1   3013014 255 17M186702N103M  *   0   0   TTTTTTTTTTTTTTTGTTTAAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCAAGTTT    FFFFFFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF    NH:i:1  HI:i:1  AS:i:108    nM:i:0  RE:A:I  xf:i:0  li:i:0  BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:TCAAGGTTACTACACC   CY:Z:FFFFFFFFFFF:F,FF   CB:Z:TCAAGGTTACTACACC-1 UR:Z:CCGGGCAGTTAT   UY:Z:FFFFFFFFFFFF   UB:Z:CCGGGCAGTTAT   RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1
    

    貌似没看到官网讲的AGAATGGTCTGCAT-1这种spot barcodeCB标签包含带短划线分隔符的后缀,后跟数字的结构啊。。。

    进行R的下游分析

    由于现在还没有现成的用于10X Visium空间转录组的R包,只好参考官网的R代码

    官网地址:https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/rkit

    通过Loupe Browser 4.0.0进行下游分析

    • 打开Xftp,打开 cloupe.cloupe
      image.png
    • 查看tSNE


      image.png
    • UMAP


      image.png
    • Feacture Plot


      image.png

      Feature Plot视图可让您可视化每个点的一个或两个基因的表达水平。此视图使得根据一个或两个基因的表达水平对点组进行阈值化变得容易。特征(在这种情况下为基因)可以在Y轴顶部或X轴右侧的文本框中输入。这些选择器还包含一个控件,用于在线性和对数刻度之间切换轴的刻度。


      image.png

    相关文章

      网友评论

        本文标题:【10X空间转录组Visium】(三)跑通Visium全流程记录

        本文链接:https://www.haomeiwen.com/subject/oshtuhtx.html