美文网首页收入即学习
用python进行配对样本差异分析

用python进行配对样本差异分析

作者: needle_princess | 来源:发表于2018-08-09 14:06 被阅读0次

    应用场景非常简单,成对的数据需要检验组间是否存在差异
    分成两步:
    1、检验正态性

    from scipy import stats
    ##检验是否正态
    def norm_test(data):
        t,p =  stats.shapiro(data)
        #print(t,p)
        if p>=0.05:
            return True
        else:
            return False
    

    2、根据正态性的检验结果,分别选择配对样本t检验和wilcoxon检验。目标是获取统计量和P值。方法的选择可以参考https://segmentfault.com/a/1190000007626742

    if norm_test(data_b) and norm_test(data_p):
      print('yes')
      t,p=ttest_rel(list(data_b),list(data_p))
    else:
      print('no')
      t,p=wilcoxon(list(data_b),list(data_p),zero_method='wilcox', correction=False)#
    

    这里有一个需要注意的坑点

    scipy包里带的wilcoxon函数返回的不是统计量z和P值,返回的是负秩和和P值,因此这里需要找到wilcoxon的源码,路径为:Lib\site-packages\scipy\stats\morestats.py
    点进morestats文件,将函数返回的数据改成z和p值,如下:

    def wilcoxon(x, y=None, zero_method="wilcox", correction=False):
        """
        Calculate the Wilcoxon signed-rank test.
    
        The Wilcoxon signed-rank test tests the null hypothesis that two
        related paired samples come from the same distribution. In particular,
        it tests whether the distribution of the differences x - y is symmetric
        about zero. It is a non-parametric version of the paired T-test.
    
        Parameters
        ----------
        x : array_like
            The first set of measurements.
        y : array_like, optional
            The second set of measurements.  If `y` is not given, then the `x`
            array is considered to be the differences between the two sets of
            measurements.
        zero_method : string, {"pratt", "wilcox", "zsplit"}, optional
            "pratt":
                Pratt treatment: includes zero-differences in the ranking process
                (more conservative)
            "wilcox":
                Wilcox treatment: discards all zero-differences
            "zsplit":
                Zero rank split: just like Pratt, but spliting the zero rank
                between positive and negative ones
        correction : bool, optional
            If True, apply continuity correction by adjusting the Wilcoxon rank
            statistic by 0.5 towards the mean value when computing the
            z-statistic.  Default is False.
    
        Returns
        -------
        statistic : float
            The sum of the ranks of the differences above or below zero, whichever
            is smaller.
        pvalue : float
            The two-sided p-value for the test.
    
        Notes
        -----
        Because the normal approximation is used for the calculations, the
        samples used should be large.  A typical rule is to require that
        n > 20.
    
        References
        ----------
        .. [1] http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test
    
        """
    
        if zero_method not in ["wilcox", "pratt", "zsplit"]:
            raise ValueError("Zero method should be either 'wilcox' "
                             "or 'pratt' or 'zsplit'")
    
        if y is None:
            d = asarray(x)
        else:
            x, y = map(asarray, (x, y))
            if len(x) != len(y):
                raise ValueError('Unequal N in wilcoxon.  Aborting.')
            d = x - y
    
        if zero_method == "wilcox":
            # Keep all non-zero differences
            d = compress(np.not_equal(d, 0), d, axis=-1)
    
        count = len(d)
        if count < 10:
            warnings.warn("Warning: sample size too small for normal approximation.")
    
        r = stats.rankdata(abs(d))
        r_plus = np.sum((d > 0) * r, axis=0)
        r_minus = np.sum((d < 0) * r, axis=0)
    
        if zero_method == "zsplit":
            r_zero = np.sum((d == 0) * r, axis=0)
            r_plus += r_zero / 2.
            r_minus += r_zero / 2.
    
        T = min(r_plus, r_minus)
        mn = count * (count + 1.) * 0.25
        se = count * (count + 1.) * (2. * count + 1.)
    
        if zero_method == "pratt":
            r = r[d != 0]
    
        replist, repnum = find_repeats(r)
        if repnum.size != 0:
            # Correction for repeated elements.
            se -= 0.5 * (repnum * (repnum * repnum - 1)).sum()
    
        se = sqrt(se / 24)
        correction = 0.5 * int(bool(correction)) * np.sign(T - mn)
        z = (T - mn - correction) / se
        prob = 2. * distributions.norm.sf(abs(z))
        #print('hehe')
        return Wilcoxonresult(z, prob)
    

    后面就可以愉快的用这个工具啦~

    相关文章

      网友评论

        本文标题:用python进行配对样本差异分析

        本文链接:https://www.haomeiwen.com/subject/sqnqbftx.html