美文网首页
Chi-Square & Cramér’s V

Chi-Square & Cramér’s V

作者: 7f0a92cda77c | 来源:发表于2021-07-05 15:00 被阅读0次

    卡方独立检验-判断两类因子彼此相关或相互独立的假设检验

    Step1: Chi-Square Independence Test - What Is It?

    The chi-square independence test is a procedure for testing if two categorical variables are related in some population.

    Example: a scientist wants to know if education level and marital status are related for all people in some country

    Name Marit Edu
    Cameron Never Married PhD or higher
    Benjamin Married Middle school or lower
    Camden Divorced Bachelors
    Brody Widowed Masters
    Connor Married PhD or higher

    Step2: Chi-Square Test - Observed Frequencies

    A good first step for these data is inspecting the contingency table of marital status by education. Such a table -shown below- displays the frequency distribution of marital status for each education category separately. So let's take a look at it.

    4种婚姻状态,5种教育水平-分类方面
    Marial Status Middle School or Lower High School Bachelor's Masters PhD or Higher Total
    Never Married 18 36 21 9 6 90
    Married 12 36 45 36 21 150
    Divorced 6 9 9 3 3 30
    Widowed 3 9 9 6 3 30
    Total 39 90 84 54 33 300
    Marial Status Middle School or Lower High School Bachelor's Masters PhD or Higher Total
    Never Married 46% 40% 25% 17% 18% 30%
    Married 31% 40% 54% 67% 64% 50%
    Divorced 15% 10% 11% 6% 9% 10%
    Widowed 8% 10% 11% 11% 9% 10%
    Total 39 90 84 54 33 300
    more highly educated respondents marry more often than less educated respondents Chi-Square Test - Stacked Bar Chart

    Step3: Chi-Square Test - Null Hypothesis

    The null hypothesis for a chi-square independence test is that

    two categorical variables are independent in some population.

    Independence means that the relative frequencies of one variable are identical over all levels of some other variable.

    Step4: Expected Frequencies

    Expected frequencies are the frequencies we expect in our sample if the null hypothesis holds.

    These expected frequencies are calculated as
    eij = \frac{oi*oj}{N}

    eij-is an expected frequency

    oi-is a marginal column frequency;

    oj-is a marginal row frequency;

    N-is the total sample size.

    a contingency table with observed frequencies we found in our sample;
    a contingency table with expected frequencies we should have found in our sample if the variables are really independent.

    我们在样本中发现的具有观察频率的列联表;
    如果变量真的独立,我们应该在样本中找到一个带有预期频率的列联表。

    计算的期望值:

    Marial Status Middle School or Lower High School Bachelor's Masters PhD or Higher Total
    Never Married 11.7 27 25.2 16.2 9.9 90
    Married 19.5 45 42 27 16.5 150
    Divorced 3.9 9 8.4 5.4 3.3 30
    Widowed 3.9 9 8.4 5.4 3.3 30
    Total 39 90 84 54 33 300

    Step5: Residuals

    rij = oij - eij

    For our example, this results in (5 * 4 =) 20 residuals. Larger (absolute) residuals indicate a larger difference between our data and the null hypothesis. We basically add up all residuals, resulting in a single number: the χ2 (pronounce “chi-square”) test statistic.

    Step6: Test Statistic

    The chi-square test statistic is calculated as:
    \chi^2=\sum \frac{(oij-eij)^2}{eij}

    so for our data :
    \chi^2=\frac {(18-11.7)^2}{11.7} +\frac{(36-27)^2}{27}+...+\frac{(6-5.4)^2}{5.4}=23.57

    Marial Status Middle School or Lower High School Bachelor's Masters PhD or Higher Total
    Never Married 3.392307692 3 0.7 3.2 1.53636364
    Married 2.884615385 1.8 0.21428571 3 1.22727273
    Divorced 1.130769231 0 0.04285714 1.06666667 0.02727273
    Widowed 0.207692308 0 0.04285714 0.06666667 0.02727273
    Total 23.56689977

    Step7: Chi-Square Test Assumptions

    The assumptions for a chi-square independence test are:

    1.independent observations.
    2.For a 2 by 2 table, all expected frequencies > 5.*
    For a larger table, all expected frequencies > 1 and no more than 20% of all cells may have expected frequencies < 5.
    If these assumptions hold, our χ2 test statistic follows a χ2 distribution. It's this distribution that tells us the probability of finding χ2 = 23.57.

    Step8: Chi-Square Test - Degrees of Freedom

    We'll get the p-value we're after from the chi-square distribution if we give it 2 numbers:

    1. the \chi^2 value(23.57)
    2. the degrees of freedom(df)

    df=(i−1)⋅(j−1)=(4-1)*(5-1)=12

    i is the number of rows in our contingency table
    j is the number of columns

    The degrees of freedom is basically a number that determines the exact shape of our distribution. The figure below illustrates this point.

    Chi-Square Distributions with Different DF

    And with df = 12, the probability of finding χ2 ≥ 23.57 ≈ 0.023.* This is our 1-tailed significance. It basically means, there's a 0.023 (or 2.3%) chance of finding this association in our sample if it is zero in our population.

    Chi-Square Distribution with 1-Tailed P-Value

    **Since this is a small chance, we no longer believe our null hypothesis of our variables being independent in our population. **

    Now, keep in mind that our p-value of 0.023 only tells us that the association between our variables is probably not zero. It doesn't say anything about the strength of this association: the effect size.

    Cramér’s V - Formula

    克莱姆公式:

    Cramér’s V 是一个介于 0 和 1 之间的数字,表示两个分类变量的关联程度。

    如果我们想知道 2 个分类变量是否相关,我们的第一个选项是卡方独立性检验。接近于零的 p 值意味着我们的变量在某些人群中不太可能完全不相关。然而,这并不意味着这些变量是强相关的。大样本量中的弱关联也可能导致 p = 0.000。

    A measure that does indicate the strength of the association is Cramér’s V, defined as:

    \sqrt\frac{\chi^2}{N(\kappa-1)}
    \phi~c denotes Cramer's V, refers to the "phi coeffeicient", a special case of Cramer's V

    (\chi^2) is the Pearson chi-square statistic from the test

    N is the sample size involved in the test

    \kappa is the less number of categories of either variable

    上述例子中:

    \phi~c= \sqrt\frac{\chi^2}{N(\kappa-1)}=\sqrt\frac{23.57}{300*(4-1)}=0.162=Cramér’s V

    综上:有关联,但是关联比较弱

    https://zhuanlan.zhihu.com/p/158156773

    插入公式

    https://www.spss-tutorials.com/chi-square-independence-test/

    相关文章

      网友评论

          本文标题:Chi-Square & Cramér’s V

          本文链接:https://www.haomeiwen.com/subject/sgfiultx.html