美文网首页大数据
ANOVA in python with 2018 FIFA d

ANOVA in python with 2018 FIFA d

作者: 不连续小姐 | 来源:发表于2018-12-21 04:37 被阅读0次

    Data Science Day 21:

    Last time we showed an example of using the independent T-test to compare the Age mean value between players in Real Madrid and Barcelona. What statistical method should we use if we want to compare the age mean among the players in Barcelona, Real Madrid, and Juventus?

    image

    kappilrinesh / Pixabay

    image

    RonnyK / Pixabay

    image

    RonnyK / Pixabay[/caption]

    Answer:

    We will use ANOVA( analysis of variance) Test, a case of GLM(Generalized Linear Model), for comparing the means between more than 2 groups.

    Null Hypothesis: Mean(A) = Mean(B) = Mean(C)

    ANOVA Assumptions:

    • Normality of the dependent variable
    • Homogeneity of Variance
    • Independent of observations

    Example: Kaggle FIFA 2018 dataset

    Null Hypothesis: There is NO significance in the mean of players' age among Real Madrid, Barcelona, and Juventus.

    H0: Age.mean(Real Madrid) = Age.mean(Barcelona) = Age.mean(Juventus)

    1. Dataset

      We choose the variable Age and Club (Real Madrid, Barcelona, and Juventus).

      image

    <pre class="EnlighterJSRAW" data-enlighter-language="python">data2=data1.loc[data1["club"].isin(["Real Madrid CF", "FC Barcelona","Juventus"])]</pre>

    2.Histogram Plot

    image
    plt.hist(data3.age, bins="auto", color="c" ,edgecolor="k",alpha=0.5)
    plt.hist(data4.age, bins="auto", color="r",edgecolor="k", alpha=0.5)
    plt.hist(data5.age, bins="auto", color="y",edgecolor="k", alpha=0.5)
    plt.xlabel('Age')
    plt.ylabel('Frequency')
    plt.title('Age Dist in Barcelona vs MFC vs Juventus')
    
    plt.show()
    

    3. KDE Density Plot

    image
    #kde
    df=pd.DataFrame({"mfc": data3.age, "barcelona":data4.age,
                    "juventus": data5.age ,})
    ax=df.plot.kde()
    plt.title("Density Plot for Players' Age in Barcelona vs MFC vs Juventus")
    plt.show()
    

    4. ANOVA Test

    stats.f_oneway(data3.age, data4.age, data5.age)
    F_onewayResult(statistic=4.8827728579356524, pvalue=0.010152460067260918)
    

    Outcome:

    F-statistics 4.88 and P-value= 0.01 which is indicating there is an overall significance of the players' mean age among MFC, Barcelona, and Juventus. Both Histogram and Density plots supported the outcome. However, we don't know where the difference lies between the groups, we can use the Bonferroni Method for further investigation.

    Bonus:

    I remember Song asked me, it is good to know what ANOVA is used for, but do you know which test generates the P-value of ANOVA?

    I thought since ANOVA has similar application as T-test, so the t.test generates P-value.
    However, the truth is F-test generates the ANOVA's P-value.

    Later, little rain mentioned T-test and F-test is convertible with the relation T^{2}= F.
    We will go over the relationship between T-test and F-test next time!

    Happy Studying and Soccer game watching!

    相关文章

      网友评论

        本文标题:ANOVA in python with 2018 FIFA d

        本文链接:https://www.haomeiwen.com/subject/keupkqtx.html