Model Training Comparison Results

Model Encoder and Weights

I tested the model encoder and encoder weight pairs with the following fixed choices: Architecture: Unet, Optimizer: Adam, LR: 1e-4, Loss Function: Dice Loss

From inspection, the best performing model encoder / weights pairs were:

senet154 / imagenet - good results initially but surpassed with further training by other encoder/weight pairs
efficientnet-b7 / advprop - a slower initial curve than the pair efficientnet-b7 / imagenet, but surpassed with enough epochs. Performed worse initially for accuracy and F1 scores.
timm_efficientnet-b8 / advprop - some of the best results but with a curve that slows, possible with further epochs would be surpassed by efficientnet-b7 / advprop
mit_b5 - performed well across the board, but surpassed at later epochs. Of particular note, the training and validation scores stayed consistently close.

Further observe:

resnet34 / imagenet, resnet152 / imagenet, resnest101_32x8d / imagenet - The validation scores for multiple metrics decreases with further epochs, suggestive of overfitting

The two best performers appeared to be timm_ifficientnet-b8 / advprop and mit_b5 / imagenet. From the first pass, mit_b5 was the most consistent baseline, so I will use that to test the other model choices and perhaps revisit after further testing.

Model Architecture

I tested the model architecture with the following fixed choices: Model encoder / weights: mit_b5 / imagenet, Optimizer: Adam, LR: 1e-4, Loss Function: Dice Loss

I observe:

The best performers were FPN (although a bit noisy), UPerNet, Segformer
DeepLabV3 outperformed DeepLabV3Plus slightly.

Overall best appears to be UPerNet. In particular, for the given choices the curves were generally smoother with less variance.

Optimizers

I tested the optimizers with the following fixed choices: Model Architecture: UPerNet, Model encoder / weights: mit_b5 / imagenet, LR: 1e-4, Loss Function: Dice Loss

Observe:

Adadelta, ASGD, SGD - the validation curves showed minimal improvement and were noisy. I would suspect further hyperparameter tweaks would be necessary for these optimizers.
Adam, AdamW - Quite comparable curves and performance, with the edge to Adam.

The best optimizers appeared to be Adam and Rprop.

Loss Function

I tested the model architecture with the following fixed choices: Model encoder / weights: mit_b5 / imagenet, Model Architecture: UPerNet, Optimizer: Adam, LR: 1e-4, Loss Function: Dice Loss

The loss functions generally yielded similar results, without much differentiaton. This may in part be due to the fact that the other optimizations have made it hard to distinguish differences at this scale.

As it stands, I think that the combined loss function is a very reasonable choice, given that its successful usage in image segmentation tasks.