[空间转录组] Sepal——识别具有空间模式的基因

[空间转录组] Sepal——识别具有空间模式的基因

作者: zyp1997 | 来源:发表于2022-03-16 21:45 被阅读0次


原文:sepal: identifying transcript profiles with spatial patterns by diffusion-based modeling

  1. 识别空间转录组中具有空间模式的基因(genes with spatial patterns),并给出强弱的排序。

  2. 对具有空间模式的基因(取排序靠前的n个基因)进行聚类(pattern families),使得同一个类中的基因具有相同的空间模式,进而可以对每个类做生物解释(biological processes)。







1. 得到扩散时间表

sepal run -c counts.csv  -mo 10 -mc 10 -o . -ar 1k
  • -c 输入文件可以是.csv、.tsv、.h5ad(来自scanpy)格式,文件内容按照 n_locations x n_genes 排列,否则用 -t(或 --transpose)转置。
  • -ar 标注空间转录组类型,包括 visium,2k,1k。visium是10X的数据,1k是ST数据,2k不清楚是什么。
  • -mo、-mc、-ks等用来过滤基因。
  • -o 输出文件夹
  • average 表示扩散时间,被scale到 [0,1] 区间。

2. 排名靠前的基因画图

sepal analyze -c counts.csv -r *-top-diffusion-times.tsv -ar 1k -o . inspect -ng 20 -nc 5
  • -r sepal run 得到的.tsv文件
  • -ng 基因个数
  • -nc 每行画几个基因

3. 排序靠前基因聚类,得到pattern families

sepal analyze -c ./counts.csv -r *-top-diffusion-times.tsv -ar 1k -o . -ng 100 -nbg 100 -eps 0.85 --plot -nc 10
  • -nbg 取前多少的基因进行PCA
  • -ng 对前多少个基因进行聚类
  • -eps PCA方差贡献率的阈值,聚类数目与PC数目一致,-eps值越大,类的数目越多。

4. 对每个类(family)进行富集分析

sepal analyze  -c counts.csv  -r *-top-diffusion-times.tsv  -ar 1k -o . fea -fl *-family-index.tsv
  • -fl sepal analyze famliy 输出的文件,标注了基因所属类别。
  • -dbs 参考的数据库,默认使用 GO:BP。


  • sepal run -h
                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal run [-h] -c COUNT_FILES [COUNT_FILES ...] -o OUT_DIR [-t]
                 [-mo MIN_OCCURANCE] [-mc MIN_COUNTS] [-mzp MAX_ZERO_FRACTION]
                 [-ks] [-dt TIME_STEP] [-eps THRESHOLD] [-dr DIFFUSION_RATE]
                 [-nw NUM_WORKERS] -ar {visium,2k,1k,unstructured} [-z]
                 [-ps PSEUDOCOUNT]

optional arguments:
  -h, --help            show this help message and exit
                        count files (default: None)
  -o OUT_DIR, --out_dir OUT_DIR
                        output directory (default: None)
  -t, --transpose       transpose count matrix (default: False)
  -mo MIN_OCCURANCE, --min_occurance MIN_OCCURANCE
                        minimum number of spot that gene has to occur within
                        (default: 5)
  -mc MIN_COUNTS, --min_counts MIN_COUNTS
                        minimum number of total counts for a gene (default:
  -mzp MAX_ZERO_FRACTION, --max_zero_fraction MAX_ZERO_FRACTION
                        max fraction of spots with zero counts allowed for
                        gene (default: 1.0)
  -ks, --keep_spurious  include RP and MT profiles (default: False)
  -dt TIME_STEP, --time_step TIME_STEP
                        minimum number of total counts for a gene (default:
  -eps THRESHOLD, --threshold THRESHOLD
                        threshold (eps) to use when assessing convergence
                        (default: 1e-08)
  -dr DIFFUSION_RATE, --diffusion_rate DIFFUSION_RATE
                        Diffusion rate (D) to use in simulations (default: 1)
  -nw NUM_WORKERS, --num_workers NUM_WORKERS
                        number of workers to use. If no number is provided,
                        the maximum number of available workers will be used.
                        (default: None)
  -ar {visium,2k,1k,unstructured}, --array {visium,2k,1k,unstructured}
                        array type (default: None)
  -z, --timeit          time analysis (default: False)
  -ps PSEUDOCOUNT, --pseudocount PSEUDOCOUNT
                        pseudocount in normalization (default: 2.0)
  • sepal analyze -h
                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal analyze [-h] [-c COUNT_DATA] [-r RESULTS] -o OUT_DIR
                     [-ar {visium,2k,1k,unstructured}] [-tr] [-rt]
                     [-ss SIDE_SIZE] [-nc N_COLS] [-qs QUANTILE_SCALING]
                     [-st SPLIT_TITLE SPLIT_TITLE] [-ps PSEUDOCOUNT]
                     [-sig SIGMA]
                     {inspect,family,fea} ...

positional arguments:

optional arguments:
  -h, --help            show this help message and exit
  -c COUNT_DATA, --count_data COUNT_DATA
                        count files (default: None)
  -r RESULTS, --results RESULTS
                        output directory (default: None)
  -o OUT_DIR, --out_dir OUT_DIR
                        output directory (default: None)
  -ar {visium,2k,1k,unstructured}, --array {visium,2k,1k,unstructured}
                        array type (default: None)
  -tr, --transpose      transpose count matrix (default: False)
  -rt, --rotate
  -ss SIDE_SIZE, --side_size SIDE_SIZE
                        side length in plot (default: 350)
  -nc N_COLS, --n_cols N_COLS
                        number f columns in plot (default: 5)
                        quantile to use for quantile scaling (default: None)
                        split title (default: None)
  -ps PSEUDOCOUNT, --pseudocount PSEUDOCOUNT
                        pseudocount in normalization (default: 2.0)
  -sig SIGMA, --sigma SIGMA
                        sensitivity for selection of top genes (default: 1.5)
  • sepal analyze inspect -h
                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal analyze inspect [-h] [-sd STYLE_DICT] [-nc N_COLS] [-pv]
                             [-ng N_GENES]

optional arguments:
  -h, --help            show this help message and exit
  -sd STYLE_DICT, --style_dict STYLE_DICT
                        plot style as dict (default: None)
  -nc N_COLS, --n_cols N_COLS
                        number f columns in plot (default: 5)
  -pv, --pval           values are pvals (default: False)
  -ng N_GENES, --n_genes N_GENES
                        number of genes to visualize (default: None)
  • sepal analyze family -h
                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal analyze family [-h] [-ng N_GENES] [-nbg N_BASE_GENES]
                            [-eps THRESHOLD] [-p] [-sd STYLE_DICT]
                            [-nc N_COLS]

optional arguments:
  -h, --help            show this help message and exit
  -ng N_GENES, --n_genes N_GENES
                        included genes (default: 100)
  -nbg N_BASE_GENES, --n_base_genes N_BASE_GENES
                        basis genes (default: None)
  -eps THRESHOLD, --threshold THRESHOLD
                        threshold in clustering (default: 0.995)
  -p, --plot            threshold in clustering (default: False)
  -sd STYLE_DICT, --style_dict STYLE_DICT
                        plot style as dict (default: None)
  -nc N_COLS, --n_cols N_COLS
                        number f columns in plot (default: 5)
  • sepal analyze fea -h
                  .\ /.
                 < ~O~ >
┌─┐┌─┐┌─┐┌─┐┬     '/_\'
└─┐├┤ ├─┘├─┤│     \ | /
└─┘└─┘┴  ┴ ┴┴─┘    \|/
Version 1.0.0 |  see https://github.com/almaan/sepal
usage: sepal analyze fea [-h] -fl FAMILY_INDEX [-or ORGANISM]
                         [-dbs DATABASES [DATABASES ...]] [-ltx] [-md]
                         [-sa START_AT]

optional arguments:
  -h, --help            show this help message and exit
  -fl FAMILY_INDEX, --family_index FAMILY_INDEX
                        path to family indices (default: None)
  -or ORGANISM, --organism ORGANISM
                        organism to query against. See g:Profiler
                        documentation for supported organisms (default:
                        database to use in enrichment analysis (default:
  -ltx, --latex         save latex formatted table (default: False)
  -md, --markdown       save markdown formatted table (default: False)
  -sa START_AT, --start_at START_AT
                        start family enumeration at (default: 0)



      本文标题:[空间转录组] Sepal——识别具有空间模式的基因
