美文网首页
单细胞转录组代谢分析-Compass2

单细胞转录组代谢分析-Compass2

作者: Yayamia | 来源:发表于2023-05-31 12:42 被阅读0次

接1
不知道简书出了什么bug,没法更新上一个文章

继续Debug

  • 又出错了
Traceback (most recent call last):
  File "/home/user/test/miniconda3/envs/compass/bin/compass", line 8, in <module>
    sys.exit(entry())
  File "/home/user/test/miniconda3/envs/compass/lib/python3.8/site-packages/compass/main.py", line 588, in entry
    penalties = eval_reaction_penalties(args['data'], args['model'],
  File "/home/user/test/miniconda3/envs/compass/lib/python3.8/site-packages/compass/compass/penalties.py", line 96, in eval_reaction_penalties
    reaction_penalties = eval_reaction_penalties_shared(
  File "/home/user/test/miniconda3/envs/compass/lib/python3.8/site-packages/compass/compass/penalties.py", line 158, in eval_reaction_penalties_shared
    for name, expression_data in expression.iteritems():
  File "/home/user/test/miniconda3/envs/compass/lib/python3.8/site-packages/pandas/core/generic.py", line 5989, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'iteritems'

感觉还是python版本的问题,把iteritems改成items

还有很多诸如此类的问题,得一点点改……服了

改完之后:

Processing 30 samples using 8 processes

我先弄了300个细胞,然后micropooling选的是10,num_process=8,如果有一个样本30分钟,大概需要的时间就可以估算一下了

成功后如图

其中:

  • micropooled告诉我们哪些microcluster对应哪些barcode
  • micropooled_data是基于microcluster的基因表达矩阵
  • reactions.tsv:最重要的文件,为microcluster和reaction的矩阵,BUT!!它reaction为代号,需要比对出是什么
    To get more context on what the reaction identifiers are, you can visit virtual metabolic human or the resources directory of Compass where there are several csv’s which include information on the reactions in Recon2.

在github中\compass\Resources\Recon2_export里面有一个rxn_md.csv的文件,里面注释了各个反应

The "_neg" or "_pos" suffixes indicate one direction of a bidirectional reaction. The "_pos" direction indicates positive flux through the reaction. For this example reaction it indicates water, oxygen, and putrescine are being consumed by this reaction to produce ammonium, hyrodgen peroxide, and 4-Aminobutanal.

在后续处理中,对于Micropool的数据,第一步是区分哪些Cluster是哪些亚群,以相应细胞占据的比例计算
https://yoseflab.github.io/Compass/notebooks/Demo-micropools.html

因为我直接分开亚群micropool的,所以就不用再区分了

在Jupyter notebook激活Compass虚拟环境

问题:
from compass_analysis import cohens_d, wilcoxon_test, get_reaction_consistencies, get_metareactions, labeled_reactions, amino_acid_metab
这一步报错ModuleNotFoundError: No module named 'compass_analysis'

我看Issue上有人提出这个问题,回复人指路的compass_analysis.py在branch里面
终于找到这个节点了
我试了一下,安装在Lib路径还是不行,不过那个python文件也不是很长,所以我直接复制源代码到了jupyter里面

compass_analysis.py源代码见下:

import pandas as pd
import numpy as np
from scipy.stats import wilcoxon, mannwhitneyu, ranksums
from statsmodels.stats.multitest import multipletests
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as hcluster
from scipy.spatial.distance import squareform

def cohens_d(x, y):
    pooled_std = np.sqrt(((len(x)-1) * np.var(x, ddof=1) 
                          + (len(y)-1) * np.var(y, ddof=1)) / 
                             (len(x) + len(y) - 2))
    return (np.mean(x) - np.mean(y)) / pooled_std
    

def wilcoxon_test(consistencies_matrix, group_A_cells, group_B_cells):
    """
        Performs an unpaired wilcoxon test (or mann-whitney U test) for each reaction between group_A and group_B
    """
    #per reaction/meta-reaction, conduct wilcoxon test between group_A and group_B
    group_A = consistencies_matrix.loc[:,group_A_cells]
    group_B = consistencies_matrix.loc[:,group_B_cells]
    results = pd.DataFrame(index = consistencies_matrix.index, columns = ['wilcox_stat', 'wilcox_pval', 'cohens_d'], dtype='float64')
    for rxn in consistencies_matrix.index:
        A, B = group_A.loc[rxn].to_numpy().ravel(), group_B.loc[rxn].to_numpy().ravel()
        stat, pval = mannwhitneyu(A, B, alternative='two-sided')
        c_d = cohens_d(A, B)
        results.loc[rxn, ['wilcox_stat', 'wilcox_pval', 'cohens_d']] = stat, pval, c_d
    results['adjusted_pval'] = np.array(multipletests(results['wilcox_pval'], method='fdr_bh')[1], dtype='float64')
    return results

def get_reaction_consistencies(compass_reaction_scores, min_range=1e-3):
    df = -np.log(compass_reaction_scores + 1)
    df = df[df.max(axis=1) - df.min(axis=1) >= min_range]
    df = df - df.min().min()
    return df

def get_metareactions(reactions, height=0.02):
    """
        Returns an array of metareaction labels for each reaction
        Index k in the returned array has the metareaction label for reaction k.
    """
    #pairwise_reaction_correlations = reactions.T.corr(method='spearman') #Pandas method here is orders of magnitude slower
    pairwise_reaction_correlations = np.corrcoef(reactions.rank(axis=1))
    #Unfortunately due to floating point issues, these matrices are not always perfectly symmetric and the diagonal may be slightly off from 1
    pairwise_reaction_correlations[np.arange(reactions.shape[0]), np.arange(reactions.shape[0])] = 1.0
    pairwise_reaction_correlations = (pairwise_reaction_correlations + pairwise_reaction_correlations.T)/2
    assert(np.all(pairwise_reaction_correlations == pairwise_reaction_correlations.T))

    Z = hcluster.complete(squareform(1 - pairwise_reaction_correlations))
    return hcluster.fcluster(Z, height, criterion='distance')

这个Branch里面也有reaction_metadata.csv文件

读入reaction文件和reaction_metadata.csv

The numbers in the reactions tsv correspond to penalties for each reaction per cell, so we take the negative log to get scores that are higher the more active the reaction is predicted to be. This also drops reactions that are close to constant and therefore not informative for the comparison.

get_reaction_consistencies之前 get_reaction_consistencies之后

之后

We use the unpaired Wilcoxon rank-sum test (equivlanet to the Mann–Whitney U test) to analyze which reactions are predicted to be more active in pathogenic Th17p cells compared to the non-pathogenic Th17n cells. Positive values indicate higher potential activity in Th17p.

就跟着往下做就好了

注意annotation是这种格式

有一个问题,其他都算出了了,但adjusted p是NaN
理解了一下P值校正

这里,笔者主要对FDR校正方法的原理进行论述。FDR校正方法是Benjamini和Hochberg于1995年提出了一种多重比较校正的方法。其实,FDR具体的算法也有多种,如Storey法(由Storey等人提出)、Benjamini-Hochberg法(简称BH法)等。其中BH法目前应用最广,这里主要介绍这种方法的基本原理。
基于BH法的FDR校正过程:
第一步:将我们单独统计得到的一系列的p=[p1,p2,…,pn]从大到小进行重新排序,计为P=[P1,P2,…,Pn];
第二步:按照以下公式计算每个P值所对应的校正前的FDR值,这里称之为Q值:Q = Pi* (n/r),Pi表示P中元素值,n是P值个数,r依次为n,n-1,…,1。
第三步:对Q进行校正,得到FDR值。对于计算出来的Q=[Q1,Q2,…,Qn],若某一个Qi值大于前一位Qi-1值,则把Qi的值赋值为Qi-1;反之则保留相应的Q值。最终得到Q值称之为校正后的FDR值。
第四步:按照重排序之前的顺序返回各个p值对应的校正后的FDR值。
例子:假设p=[0.01, 0.005, 0.03, 0.03, 0.02, 0.04, 0.05],计算相应的校正后的FDR值。
笔者按照上述步骤,自行编制相应的Matlab程序,计算过程和结果如下:
按照上述第一步步骤,计算得到P=[0.0500, 0.0400, 0.0300, 0.0300, 0.0200, 0.0100, 0.0050];
按照第二步中的方法,计算得到Q=[0.0500, 0.0467, 0.0420, 0.0525, 0.0467, 0.0350, 0.0350];
按照第三步:得到校正后的FDR值为:FDR=[ 0.0500, 0.0467, 0.0420, 0.0420, 0.0420, 0.0350, 0.0350];
最后,转换成原来的顺序:FDR=[0.0350, 0.0350, 0.0420, 0.0420, 0.0420, 0.0467, 0.0500].
对于本例来说,如果总体的显著性水平设置为0.05,那么从得到的最后的FDR值来说,这几个p值都具有显著性差异。

最后直接把results表格输出,根据这里直接手动计算了Q值

最后作图的时候发现labeled_reactions, amino_acid_metab这两个没有,但其实无所谓,可以自定义

相关文章

网友评论

      本文标题:单细胞转录组代谢分析-Compass2

      本文链接:https://www.haomeiwen.com/subject/xvqyedtx.html