美文网首页
Lidar variables selection in SAS

Lidar variables selection in SAS

作者: 森森森sl | 来源:发表于2017-08-17 10:19 被阅读0次

    Date: Aug 16, 2017

    - All variables used are log transformed to improve R square.

    - In SAS codes, words in bolds are keywords in SAS for programming.



    There are various model selection methods in SAS PROC REG in which I used STEPWISE and RSQUARE.

    STEPWISE is the most popular model selection methods in PROC REG. We can make adjustments to SLE  and SLS. I used the default setting.

    PROC REG DATA = PLOTlog;

    MODEL BIO_MG_HAN = Total_retu Elev_minim Elev_maxim Elev_mean Elev_mode Elev_stdde Elev_varia Elev_CV Elev_IQ Elev_kurto Elev_AAD Elev_MAD_m Elev_MAD_1 Elev_L1 Elev_L2 Elev_L_CV Elev_P01 Elev_P05 Elev_P10 Elev_P20 Elev_P25 Elev_P30 Elev_P40 Elev_P50 Elev_P60 Elev_P70 Elev_P75 Elev_P80 Elev_P90 Elev_P95 Elev_P99 Canopy_rel Elev_SQRT_ Elev_CURT_ 

    / SELECTION = STEPWISE;

    RUN;

    R square selection (RSQUARE) always identifies the model with the largest R square for each number of variables considered. It requires much more computer time than the other selection methods. We can fix this problem by dividing the data into subgroups, find the largest R square in subgroup firstly, then compare the best ones. However, this only applies to 1 variables.

    PROC REG DATA = PLOTlog;

    MODEL BIO_MG_HAN = Total_retu Elev_minim Elev_maxim Elev_mean Elev_mode Elev_stdde Elev_varia Elev_CV Elev_IQ Elev_kurto Elev_AAD Elev_MAD_m Elev_MAD_1 Elev_L1 Elev_L2 Elev_L_CV Elev_P01 Elev_P05 Elev_P10 Elev_P20 Elev_P25 Elev_P30 Elev_P40 Elev_P50 Elev_P60 Elev_P70 Elev_P75 Elev_P80 Elev_P90 Elev_P95 Elev_P99 Canopy_rel Elev_SQRT_ Elev_CURT_

    / SELECTION = RSQUARE STOP=1;

    RUN;


    Results from the stepwise selection:

    Summary of Stepwise Selection

    Results from R square selection:

    Summary of RSQUARE Selection

    Comparison of two models:

    PROC REG DATA = PLOTlog OUTEST=OUT1;

    MODEL BIO_MG_HAN = Elev_mode / AIC BIC PRESS RSQUARE RMSE;

    PROC PRINT DATA = OUT1;

    PROC REG DATA = PLOTlog OUTEST=OUT2;

    MODEL BIO_MG_HAN = Elev_mode Total_retu Elev_P95/ AIC BIC PRESS RSQUARE RMSE VIF;

    PROC PRINT DATA=OUT2;

    RUN;

    Model comparison

    Model Selection Criteria

    1.  Statistical test on individual coefficients at a given value (0.05). It is desirable to keep all predictor variables in the model significant.

    For RSQUARE selection model, Elev_mode variable is significant.

    For STEWWISE selection model, Elev_mode is significant while Total_retu and Elev_P95 are not significant. 

    RSQUARE selection model parameter estimates STEPWISE selection model parameter estimates

    2. Model coefficient of determination R square. The larger, the better.

    R square increases with the number of variables in the model.

    RSQURE R square: 0.7837

    STEPWISE R square: 0.8223

    3. Adjusted R square. The larger, the better.

    Compared with R square, Adjusted R square does not always increase with number of variables in the model. It removes the impact of degrees of freedom and gives a quantity that is more comparable than R square over models involving different numbers of parameters.

    RSQURE adjusted R square: 0.7786

    STEPWISE adjusted R square: 0.8090

    4. Mallow's Cp. Close to the number of coefficients (including intercept).

    Not considered in here. Mallow's Cp is calculated in the SELECTION process. 

    5. Predicted Sum of Squares (PRESS). The smaller, the better.

    The PRESS statistic gives a good indication of the predictive power of the model. It can be used in combination with RMSE. We get smaller RMSE when the model gets closer to each data point, however, this could cause overfitting problem which gives us not representative and predictive model. The PRESS guards against this by testing how well the current model would predict the points in the dataset.

    RSQUARE PRESS: 2.13015

    STEPWISE PRESS: 1.79648

    6. Model Selection Criteria Based on Information Theory, including AIC, AICC,

    BIC and SBC. The smaller, the better.

    AIC is not a test of the model in the sense of hypothesis testing; rather it is a test between models - a tool for model selection. Akaike's rule of thumb: two models are essentially indistinguishable if the difference of their AICs is less than 2.

    7. Variance Inflation (VIF)

    This is for multicollinearity detection and diagnostics. VIF provide an indication of which regression coefficients are adversely affected and to what extent. It is generally believed that if any VIF exceeds 10, there is a reason for at least some concerns on multicollinearity in the data. 

    The highest VIF in the STEPWISE selection model is 4.99, which is smaller than 10. 

    Summary

    Comparison Summary

    STEPWISE model is better.

    相关文章

      网友评论

          本文标题:Lidar variables selection in SAS

          本文链接:https://www.haomeiwen.com/subject/tqzbrxtx.html