Boosting

作者: Kevin不会创作 | 来源:发表于2020-12-06 08:07 被阅读0次

    Table of Contents

    • AdaBoost
    • Gradient Boosting
    • XGBoost
    • LightGBM
    • CatBoost

    AdaBoost

    • Explain the AdaBoost algorithm.

      1. Initially, all observations are given equal weights.

      2. A model is built on a subset of data.

      3. Using this model, predictions are made on the whole dataset.

      4. Errors are calculated by comparing the predictions and actual values.

      5. While creating the next model, higher weights are given to the data points which were predicted incorrectly.

      6. Weights can be determined using the error value. For instance, the higher the error the more is the weight assigned to the observation.

      7. This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.

      AdaBoost
    • Disadvantages of AdaBoost.

      1. Boosting technique learns progressively, it is important to ensure that you have quality data.

      2. AdaBoost is also extremely sensitive to Noisy data and outliers so if you do plan to use AdaBoost then it is highly recommended to eliminate them.

      3. AdaBoost has also been proven to be slower than XGBoost.

    Gradient Boosting

    • Explain the Gradient Boosting algorithm.

      1. A model is built on a subset of data.
      2. Using this model, predictions are made on the whole dataset.
      3. Errors are calculated by comparing the predictions and actual values.
      4. A new model is created using the errors calculated as target variable. Our objective is to find the best split to minimize the error.
      5. The predictions made by this new model are combined with the predictions of the previous.
      6. New errors are calculated using this predicted value and actual value.
      7. This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.
      Gradient Boosting

    XGBoost

    • What is XGBoost?

      XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework.

    • How XGBoost optimizes standard GBM algorithm?

      • System Optimization

        1. Parallelization: XGBoost approaches the process of sequential tree building using parallelized implementation.

        2. Tree Pruning: The stopping criterion for tree splitting within GBM framework is greedy in nature and depends on the negative loss criterion at the point of split. XGBoost uses ‘max_depth’ parameter as specified instead of criterion first, and starts pruning trees backward.

        3. Hardware Optimization: This algorithm has been designed to make efficient use of hardware resources. This is accomplished by cache awareness by allocating internal buffers in each thread to store gradient statistics.

      • Algorithmic Enhancements

        1. Regularization: It penalizes more complex models through both LASSO (L1) and Ridge (L2) regularization to prevent overfitting.

        2. Sparsity Awareness: XGBoost naturally admits sparse features for inputs by automatically ‘learning’ best missing value depending on training loss and handles different types of sparsity patterns in the data more efficiently.

        3. Weighted Quantile Sketch: XGBoost employs the distributed weighted Quantile Sketch algorithm to effectively find the optimal split points among weighted datasets.

        4. Cross-validation: The algorithm comes with built-in cross-validation method at each iteration, taking away the need to explicitly program this search and to specify the exact number of boosting iterations required in a single run.

    • References

      1. XGBoost Algorithm: Long May She Reign!

    LightGBM

    • What is LightGBM?

      LightGBM is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithm, used for ranking, classification and many other machine learning tasks.

    • What's the difference between LightGBM and XGBoost?

      LightGBM uses a novel technique of Gradient-based One-Side Sampling (GOSS) to filter out the data instances for finding a split value while XGBoost uses pre-sorted algorithm & Histogram-based algorithm for computing the best split. Here instances mean observations/samples.

    CatBoost

    • What is CatBoost?

      CatBoost name comes from two words "Category" and "Boosting". It works well with multiple categories of data, such as audion, text, image including historical data. CatBoost does not require conversion of data set to any specific format like XGBoost and LightGBM.

    • Advantages of CatBoost Library.

      1. Performance
      2. Handling Categorical features automatically
      3. Robust
      4. Easy-to-use
    • References

      1. CatBoost: A machine learning library to handle categorical (CAT) data automatically

    相关文章

      网友评论

          本文标题:Boosting

          本文链接:https://www.haomeiwen.com/subject/rxabwktx.html