REF:
https://neptune.ai/blog/image-segmentation-tips-and-tricks-from-kaggle-competitions#ensembling-methods
【后续出中文版】
一、Loss Functions
- Dice Coefficient because it works well with imbalanced data
- Weighted boundary loss whose aim is to reduce the distance between the predicted segmentation and the ground truth
- MultiLabelSoftMarginLoss that creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input and target
- Balanced cross entropy (BCE) with logit loss that involves weighing the positive and negative examples by a certain coefficient
- Lovasz that performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses
- FocalLoss + Lovasz obtained by summing the Focal and Lovasz losses
- Arc margin loss that incorporates margin in order to maximise face class separability
- Npairs loss that computes the npairs loss between y_true and y_pred.
- A combination of BCE and Dice loss functions
- LSEP – a pairwise ranking that is is smooth everywhere and thus is easier to optimize
- Center loss that simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers
- Ring Loss that augments standard loss functions such as Softmax
- Hard triplet loss that trains a network to embed features of the same class at the same time maximizing the embedding distance of different classes
- 1 + BCE – Dice that involves subtracting the BCE and DICE losses then adding 1
- Binary cross-entropy – log(dice) that is the binary cross-entropy minus the log of the dice loss
- Combinations of BCE, dice and focal
- Lovasz Loss that loss performs direct optimization of the mean intersection-over-union loss
- BCE + DICE -Dice loss is obtained by calculating smooth dice coefficient function
- Focal loss with Gamma 2 that is an improvement to the standard cross-entropy criterion
- BCE + DICE + Focal – this is basically a summation of the three loss functions
- Active Contour Loss that incorporates the area and size information and integrates the information in a dense deep learning model
- 1024 * BCE(results, masks) + BCE(cls, cls_target)
- Focal + kappa – Kappa is a loss function for multi-class classification of ordinal data in deep learning. In this case we sum it and the focal loss
- ArcFaceLoss — Additive Angular Margin Loss for Deep Face Recognition
- soft Dice trained on positives only – Soft Dice uses predicted probabilities
- 2.7 * BCE(pred_mask, gt_mask) + 0.9 * DICE(pred_mask, gt_mask) + 0.1 * BCE(pred_empty, gt_empty) which is a custom loss used by the Kaggler
- nn.SmoothL1Loss()that creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise
- Use of the Mean Squared Error objective function in scenarios where it seems to work better than binary-cross entropy objective function.
二、Training tips
- Try different learning rates
- Try different batch sizes
- Use SDG with momentum with manual rate scheduling
- Too much augmentation will reduce the accuracy
- Train on image crops and predict on full images
- Use of Keras’s ReduceLROnPlateau() to the learning rate
- Train without augmentation until plateau then apply soft and hard augmentation to some epochs
- Freeze all layers except the last one and use 1000 images from Stage1 for tuning
- Make labels more balanced by developing a sampler
- Use of class aware sampling
- Use dropout and augmentation while tuning the last layer
- Pseudo Labeling to improve score
- Use Adam reducing LR on plateau with patience 2–4
- Use Cyclic LR with SGD
- Reduce the learning rate by a factor of two if validation loss does not improve for two consecutive epochs
- Repeat the worst batch out of 10 batches
- Train with default UNET
- Overlap tiles so that each edge pixel is covered twice
- Hyperparameter tuning: learning rate on training, non-maximum suppression and score threshold on inference
- Remove low bounding box with low confidence score
- Train different convolutional neural networks then build an ensemble
- Stop training when the F1 score is decreasing
- Differential learning rate with gradual reducing
- Train ANNs in a stacking way using 5 folds and 30 repeats
- Track of your experiments using Neptune.
三、Evaluation and cross-validation
- Split on non-uniform stratified by classes
- Avoid overfitting by applying cross-validation while tuning the last layer
- 10-fold CV ensemble for classification
- Combination of 5 10-fold CV ensembles for detection
- Sklearn’s stratified K fold function
- 5 KFold Cross-Validation
- Adversarial Validation & Weighting
四、Ensembling methods
- Use simple majority voting for ensemble
- XGBoost on the max malignancy at 3 zoom levels, the z-location and the amount of strange tissue
- LightGBM for models with too many classes. This was done for raw data features only.
- CatBoost for a second-layer model
- Training with 7 features for the gradient boosting classifier
- Use ‘curriculum learning’ to speed up model training. In this technique, models are first trained on simple samples then progressively moving to hard ones.
- Ensemble with ResNet50, InceptionV3, and InceptionResNetV2
- Ensemble method for object detection
- An ensemble of Mask RCNN, YOLOv3, and Faster RCNN architectures n with a classification network — DenseNet-121 architecture
五、Post Processing
- Apply test time augmentation — presenting an image to a model several times with different random transformations and average the predictions you get
- Equalize test prediction probabilities instead of only using predicted classes
- Apply geometric mean to the predictions
- Overlap tiles during inferencing so that each edge pixel is covered at least thrice because UNET tends to have bad predictions around edge areas.
- Non-maximum suppression and bounding box shrinkage
- Watershed post processing to detach objects in instance segmentation problems.
网友评论