[空间转录组] Sepal——识别具有空间模式的基因

作者: zyp1997 | 来源:发表于2022-03-16 21:45 被阅读0次

[空间转录组] Sepal——识别具有空间模式的基因
前沿综述 | 如何从空间转录组数据中分析空间变异基因？
BT × IT | DeepST：通过深度学习识别空间转录组学中
研究方法 | 如何利用单细胞的数据提升空间转录组的分辨率
空间转录组教程||用BayesSpace提高分群和基因分辨率
干货 | 空间转录组和单细胞转录组整合分析方法
2022-05-17
空间转录组经典案例分析
stereoscope：利用scRNA-seq和空间转录组数据对
科普讲堂|空间转录组测序知多少

一、Sepal是干什么的？

原文：sepal: identifying transcript profiles with spatial patterns by diffusion-based modeling

识别空间转录组中具有空间模式的基因（genes with spatial patterns），并给出强弱的排序。
对具有空间模式的基因（取排序靠前的n个基因）进行聚类（pattern families），使得同一个类中的基因具有相同的空间模式，进而可以对每个类做生物解释（biological processes）。

二、Sepal的原理

其他方法往往假设数据服从某种分布，并依赖于假设检验（如：Trendsceek，SpatialDE，SPARK）。当数据与假设的分布不一致时，就不能得到理想的结果。

Sepal采取了不同的策略，文章认为基因在组织上的分布类似于物质的扩散，根据Fick第二定理和基因的在空间上的表达数据可以计算出每种基因的扩散时间，扩散时间更长说明更具空间模式，扩散时间更短说明分布更随机。因此，根据扩散时间可以给出基因具有空间模式由强到弱的排序。

具体公式与说明请见原文。

三、Sepal代码实现

原文GitHub代码（python）

1. 得到扩散时间表

sepal run -c counts.csv  -mo 10 -mc 10 -o . -ar 1k

-c 输入文件可以是.csv、.tsv、.h5ad（来自scanpy）格式，文件内容按照 n_locations x n_genes 排列，否则用 -t（或 --transpose）转置。
-ar 标注空间转录组类型，包括 visium,2k,1k。visium是10X的数据，1k是ST数据，2k不清楚是什么。
-mo、-mc、-ks等用来过滤基因。
-o 输出文件夹

1.PNG

average 表示扩散时间，被scale到 [0,1] 区间。

2. 排名靠前的基因画图

sepal analyze -c counts.csv -r *-top-diffusion-times.tsv -ar 1k -o . inspect -ng 20 -nc 5

-r sepal run 得到的.tsv文件
-ng 基因个数
-nc 每行画几个基因

3. 排序靠前基因聚类，得到pattern families

sepal analyze -c ./counts.csv -r *-top-diffusion-times.tsv -ar 1k -o . -ng 100 -nbg 100 -eps 0.85 --plot -nc 10

-nbg 取前多少的基因进行PCA
-ng 对前多少个基因进行聚类
-eps PCA方差贡献率的阈值，聚类数目与PC数目一致，-eps值越大，类的数目越多。

4. 对每个类（family）进行富集分析

sepal analyze  -c counts.csv  -r *-top-diffusion-times.tsv  -ar 1k -o . fea -fl *-family-index.tsv

-fl sepal analyze famliy 输出的文件，标注了基因所属类别。
-dbs 参考的数据库，默认使用 GO:BP。

四、详细参数

sepal run -h

                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal run [-h] -c COUNT_FILES [COUNT_FILES ...] -o OUT_DIR [-t]
                 [-mo MIN_OCCURANCE] [-mc MIN_COUNTS] [-mzp MAX_ZERO_FRACTION]
                 [-ks] [-dt TIME_STEP] [-eps THRESHOLD] [-dr DIFFUSION_RATE]
                 [-nw NUM_WORKERS] -ar {visium,2k,1k,unstructured} [-z]
                 [-ps PSEUDOCOUNT]

optional arguments:
  -h, --help            show this help message and exit
  -c COUNT_FILES [COUNT_FILES ...], --count_files COUNT_FILES [COUNT_FILES ...]
                        count files (default: None)
  -o OUT_DIR, --out_dir OUT_DIR
                        output directory (default: None)
  -t, --transpose       transpose count matrix (default: False)
  -mo MIN_OCCURANCE, --min_occurance MIN_OCCURANCE
                        minimum number of spot that gene has to occur within
                        (default: 5)
  -mc MIN_COUNTS, --min_counts MIN_COUNTS
                        minimum number of total counts for a gene (default:
                        20)
  -mzp MAX_ZERO_FRACTION, --max_zero_fraction MAX_ZERO_FRACTION
                        max fraction of spots with zero counts allowed for
                        gene (default: 1.0)
  -ks, --keep_spurious  include RP and MT profiles (default: False)
  -dt TIME_STEP, --time_step TIME_STEP
                        minimum number of total counts for a gene (default:
                        0.001)
  -eps THRESHOLD, --threshold THRESHOLD
                        threshold (eps) to use when assessing convergence
                        (default: 1e-08)
  -dr DIFFUSION_RATE, --diffusion_rate DIFFUSION_RATE
                        Diffusion rate (D) to use in simulations (default: 1)
  -nw NUM_WORKERS, --num_workers NUM_WORKERS
                        number of workers to use. If no number is provided,
                        the maximum number of available workers will be used.
                        (default: None)
  -ar {visium,2k,1k,unstructured}, --array {visium,2k,1k,unstructured}
                        array type (default: None)
  -z, --timeit          time analysis (default: False)
  -ps PSEUDOCOUNT, --pseudocount PSEUDOCOUNT
                        pseudocount in normalization (default: 2.0)

sepal analyze -h

                    _
                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal analyze [-h] [-c COUNT_DATA] [-r RESULTS] -o OUT_DIR
                     [-ar {visium,2k,1k,unstructured}] [-tr] [-rt]
                     [-ss SIDE_SIZE] [-nc N_COLS] [-qs QUANTILE_SCALING]
                     [-st SPLIT_TITLE SPLIT_TITLE] [-ps PSEUDOCOUNT]
                     [-sig SIGMA]
                     {inspect,family,fea} ...

positional arguments:
  {inspect,family,fea}

optional arguments:
  -h, --help            show this help message and exit
  -c COUNT_DATA, --count_data COUNT_DATA
                        count files (default: None)
  -r RESULTS, --results RESULTS
                        output directory (default: None)
  -o OUT_DIR, --out_dir OUT_DIR
                        output directory (default: None)
  -ar {visium,2k,1k,unstructured}, --array {visium,2k,1k,unstructured}
                        array type (default: None)
  -tr, --transpose      transpose count matrix (default: False)
  -rt, --rotate
  -ss SIDE_SIZE, --side_size SIDE_SIZE
                        side length in plot (default: 350)
  -nc N_COLS, --n_cols N_COLS
                        number f columns in plot (default: 5)
  -qs QUANTILE_SCALING, --quantile_scaling QUANTILE_SCALING
                        quantile to use for quantile scaling (default: None)
  -st SPLIT_TITLE SPLIT_TITLE, --split_title SPLIT_TITLE SPLIT_TITLE
                        split title (default: None)
  -ps PSEUDOCOUNT, --pseudocount PSEUDOCOUNT
                        pseudocount in normalization (default: 2.0)
  -sig SIGMA, --sigma SIGMA
                        sensitivity for selection of top genes (default: 1.5)

sepal analyze inspect -h

                    _
                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal analyze inspect [-h] [-sd STYLE_DICT] [-nc N_COLS] [-pv]
                             [-ng N_GENES]

optional arguments:
  -h, --help            show this help message and exit
  -sd STYLE_DICT, --style_dict STYLE_DICT
                        plot style as dict (default: None)
  -nc N_COLS, --n_cols N_COLS
                        number f columns in plot (default: 5)
  -pv, --pval           values are pvals (default: False)
  -ng N_GENES, --n_genes N_GENES
                        number of genes to visualize (default: None)

sepal analyze family -h

                    _
                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal analyze family [-h] [-ng N_GENES] [-nbg N_BASE_GENES]
                            [-eps THRESHOLD] [-p] [-sd STYLE_DICT]
                            [-nc N_COLS]

optional arguments:
  -h, --help            show this help message and exit
  -ng N_GENES, --n_genes N_GENES
                        included genes (default: 100)
  -nbg N_BASE_GENES, --n_base_genes N_BASE_GENES
                        basis genes (default: None)
  -eps THRESHOLD, --threshold THRESHOLD
                        threshold in clustering (default: 0.995)
  -p, --plot            threshold in clustering (default: False)
  -sd STYLE_DICT, --style_dict STYLE_DICT
                        plot style as dict (default: None)
  -nc N_COLS, --n_cols N_COLS
                        number f columns in plot (default: 5)

sepal analyze fea -h

                    _
                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal analyze fea [-h] -fl FAMILY_INDEX [-or ORGANISM]
                         [-dbs DATABASES [DATABASES ...]] [-ltx] [-md]
                         [-sa START_AT]

optional arguments:
  -h, --help            show this help message and exit
  -fl FAMILY_INDEX, --family_index FAMILY_INDEX
                        path to family indices (default: None)
  -or ORGANISM, --organism ORGANISM
                        organism to query against. See g:Profiler
                        documentation for supported organisms (default:
                        hsapiens)
  -dbs DATABASES [DATABASES ...], --databases DATABASES [DATABASES ...]
                        database to use in enrichment analysis (default:
                        ['GO:BP'])
  -ltx, --latex         save latex formatted table (default: False)
  -md, --markdown       save markdown formatted table (default: False)
  -sa START_AT, --start_at START_AT
                        start family enumeration at (default: 0)

[空间转录组] Sepal——识别具有空间模式的基因
一、Sepal是干什么的？原文：sepal: identifying transcript profiles w...
前沿综述 | 如何从空间转录组数据中分析空间变异基因？
空间转录组研究中的一项关键任务是识别跨空间位置具有不同空间表达模式的空间变异基因（SVG）。识别SVG为系统分析特...
BT × IT | DeepST：通过深度学习识别空间转录组学中
识别空间域（即在基因表达和组织学上具有空间相关性的区域）是空间转录组学中最重要的课题之一。近日，《Nucleic ...
研究方法 | 如何利用单细胞的数据提升空间转录组的分辨率
空间转录组技术可以提供基因表达的空间位置信息，对理解组织的生物学功能和病理具有重要意义。但是现有的空间转录组技术依...
空间转录组教程||用BayesSpace提高分群和基因分辨率
最近开发的空间基因表达技术，如空间转录组学可以在保留空间背景的同时，全面测量转录组谱。然而，现有的空间基因表达数据...
干货 | 空间转录组和单细胞转录组整合分析方法
空间转录组和单细胞转录组联合分析方法，有基于差异基因映射的多模式相交分析（Multimodal Intersect...
2022-05-17
Nat Biotech | DestVI识别同类细胞空间转录组的连续变化原创苏安图灵基因2022-05-17 0...
空间转录组经典案例分析
组织中空间原始位置上的基因表达模式对于了解其中细胞的类型和功能非常重要。10x Genomics空间转录组测序技术...
stereoscope：利用scRNA-seq和空间转录组数据对
空间转录组技术正在广泛应用，然而目前一些转录组的空间分析还达不到单细胞的分辨率水平。为了达到将基因表达置于空间环境...
科普讲堂|空间转录组测序知多少
一、空间转录组芯片空间转录组数据分析的核心是根据每个芯片上每个spot的基因表达信息进行聚类，然后将spot根据...