一、Loss Functions

Dice Coefficient because it works well with imbalanced data
Weighted boundary loss whose aim is to reduce the distance between the predicted segmentation and the ground truth
MultiLabelSoftMarginLoss that creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input and target
Balanced cross entropy (BCE) with logit loss that involves weighing the positive and negative examples by a certain coefficient
Lovasz that performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses
FocalLoss + Lovasz obtained by summing the Focal and Lovasz losses
Arc margin loss that incorporates margin in order to maximise face class separability
Npairs loss that computes the npairs loss between y_true and y_pred.
A combination of BCE and Dice loss functions
LSEP – a pairwise ranking that is is smooth everywhere and thus is easier to optimize
Center loss that simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers
Ring Loss that augments standard loss functions such as Softmax
Hard triplet loss that trains a network to embed features of the same class at the same time maximizing the embedding distance of different classes
1 + BCE – Dice that involves subtracting the BCE and DICE losses then adding 1
Binary cross-entropy – log(dice) that is the binary cross-entropy minus the log of the dice loss
Combinations of BCE, dice and focal
Lovasz Loss that loss performs direct optimization of the mean intersection-over-union loss
BCE + DICE -Dice loss is obtained by calculating smooth dice coefficient function
Focal loss with Gamma 2 that is an improvement to the standard cross-entropy criterion
BCE + DICE + Focal – this is basically a summation of the three loss functions
Active Contour Loss that incorporates the area and size information and integrates the information in a dense deep learning model
1024 * BCE(results, masks) + BCE(cls, cls_target)
Focal + kappa – Kappa is a loss function for multi-class classification of ordinal data in deep learning. In this case we sum it and the focal loss
ArcFaceLoss — Additive Angular Margin Loss for Deep Face Recognition
soft Dice trained on positives only – Soft Dice uses predicted probabilities
2.7 * BCE(pred_mask, gt_mask) + 0.9 * DICE(pred_mask, gt_mask) + 0.1 * BCE(pred_empty, gt_empty) which is a custom loss used by the Kaggler
nn.SmoothL1Loss()that creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise
Use of the Mean Squared Error objective function in scenarios where it seems to work better than binary-cross entropy objective function.

二、Training tips

Try different learning rates
Try different batch sizes
Use SDG with momentum with manual rate scheduling
Too much augmentation will reduce the accuracy
Train on image crops and predict on full images
Use of Keras’s ReduceLROnPlateau() to the learning rate
Train without augmentation until plateau then apply soft and hard augmentation to some epochs
Freeze all layers except the last one and use 1000 images from Stage1 for tuning
Make labels more balanced by developing a sampler
Use of class aware sampling
Use dropout and augmentation while tuning the last layer
Pseudo Labeling to improve score
Use Adam reducing LR on plateau with patience 2–4
Use Cyclic LR with SGD
Reduce the learning rate by a factor of two if validation loss does not improve for two consecutive epochs
Repeat the worst batch out of 10 batches
Train with default UNET
Overlap tiles so that each edge pixel is covered twice
Hyperparameter tuning: learning rate on training, non-maximum suppression and score threshold on inference
Remove low bounding box with low confidence score
Train different convolutional neural networks then build an ensemble
Stop training when the F1 score is decreasing
Differential learning rate with gradual reducing
Train ANNs in a stacking way using 5 folds and 30 repeats
Track of your experiments using Neptune.

三、Evaluation and cross-validation

Split on non-uniform stratified by classes
Avoid overfitting by applying cross-validation while tuning the last layer
10-fold CV ensemble for classification
Combination of 5 10-fold CV ensembles for detection
Sklearn’s stratified K fold function
5 KFold Cross-Validation
Adversarial Validation & Weighting

四、Ensembling methods

Use simple majority voting for ensemble
XGBoost on the max malignancy at 3 zoom levels, the z-location and the amount of strange tissue
LightGBM for models with too many classes. This was done for raw data features only.
CatBoost for a second-layer model
Training with 7 features for the gradient boosting classifier
Use ‘curriculum learning’ to speed up model training. In this technique, models are first trained on simple samples then progressively moving to hard ones.
Ensemble with ResNet50, InceptionV3, and InceptionResNetV2
Ensemble method for object detection
An ensemble of Mask RCNN, YOLOv3, and Faster RCNN architectures n with a classification network — DenseNet-121 architecture

五、Post Processing

Apply test time augmentation — presenting an image to a model several times with different random transformations and average the predictions you get
Equalize test prediction probabilities instead of only using predicted classes
Apply geometric mean to the predictions
Overlap tiles during inferencing so that each edge pixel is covered at least thrice because UNET tends to have bad predictions around edge areas.
Non-maximum suppression and bounding box shrinkage
Watershed post processing to detach objects in instance segmentation problems.