美文网首页GIS之时空数据分析
Python statsmodels简单回归分析

Python statsmodels简单回归分析

作者: 王叽叽的小心情 | 来源:发表于2020-02-25 11:13 被阅读0次

利用Python中的statsmodels模块进行简单的线性回归分析,参考文档:http://www.statsmodels.org/dev/example_formulas.html

在数据集方面分为三步

  1. 加载数据集(最好以DataFrame的形式)# load data
  2. 选择感兴趣的数据集子集 # subset columns
  3. 删除丢失数值的观测值 # remove missing observations

在拟合方面分为三步

  1. 构建回归模型 # select a model
  2. 拟合模型 # model fitting
  3. 输出模型结果 # res.summary()
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
import pandas

df = sm.datasets.get_rdataset("Guerry", "HistData").data
df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()
df.head()
Out[7]: 
   Lottery  Literacy  Wealth Region
0       41        37      73      E
1       38        51      22      N
2       66        13      61      C
3       80        46      76      E
4       79        69      83      E

mod = smf.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
res = mod.fit()
print(res.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                Lottery   R-squared:                       0.338
Model:                            OLS   Adj. R-squared:                  0.287
Method:                 Least Squares   F-statistic:                     6.636
Date:                Fri, 21 Feb 2020   Prob (F-statistic):           1.07e-05
Time:                        12:02:01   Log-Likelihood:                -375.30
No. Observations:                  85   AIC:                             764.6
Df Residuals:                      78   BIC:                             781.7
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept      38.6517      9.456      4.087      0.000      19.826      57.478
Region[T.E]   -15.4278      9.727     -1.586      0.117     -34.793       3.938
Region[T.N]   -10.0170      9.260     -1.082      0.283     -28.453       8.419
Region[T.S]    -4.5483      7.279     -0.625      0.534     -19.039       9.943
Region[T.W]   -10.0913      7.196     -1.402      0.165     -24.418       4.235
Literacy       -0.1858      0.210     -0.886      0.378      -0.603       0.232
Wealth          0.4515      0.103      4.390      0.000       0.247       0.656
==============================================================================
Omnibus:                        3.049   Durbin-Watson:                   1.785
Prob(Omnibus):                  0.218   Jarque-Bera (JB):                2.694
Skew:                          -0.340   Prob(JB):                        0.260
Kurtosis:                       2.454   Cond. No.                         371.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

相关文章

网友评论

    本文标题:Python statsmodels简单回归分析

    本文链接:https://www.haomeiwen.com/subject/vdyzqhtx.html