Competition Leaderboard: https://www.kaggle.com/c/siim-isic-melanoma-classification/leaderboard
Solution overview: https://www.kaggle.com/c/siim-isic-melanoma-classification/discussion/175633
Model training and predictions were performed in Kaggle TPU Kernels.
- Python 3.7.6 (Python packages are detailed separately in
requirements.txt
) - Kaggle's TPU v3-8 (8 cores)
- SIIM-ISIC Melanoma Classification
- ISIC 2019 TFRecords 256x256
- ISIC 2019 TFRecords 384x384
- ISIC 2019 TFRecords 512x512
- ISIC 2019 TFRecords 768x768
- Melanoma TFRecords 256x256
- Melanoma TFRecords 384x384
- Melanoma TFRecords 512x512
- Melanoma TFRecords 768x768
- melanoma_hairs
The source code for model training is published in this Kaggle notebook. It can also be found in "./kaggle_notebooks"
.
We trained 8 image models, as shown below:
Model | Data | Image size | Epochs | Hair augmentation |
---|---|---|---|---|
1 | 2020 | 256x256 | 13 | |
2 | 2020 | 384x384 | 15 | |
3 | 2020 | 512x512 | 15 | |
4 | 2020 | 768x768 | 15 | |
5 | 2019-2020 | 256x256 | 25 | X |
6 | 2019-2020 | 384x384 | 25 | X |
7 | 2019-2020 | 512x512 | 12 | X |
8 | 2019-2020 | 768x768 | 10 |
To reproduce our results, the Kaggle notebook must be forked and executed 8 times, one for each model, changing only the content of the first cell (input) each time.
For example, for model 1, the content of the input cell should be:
tfrec_shape = 256
comp_data = "2020"
For model 6, however, it should be:
tfrec_shape = 384
comp_data = "2019-2020"
Training time for a single model ranges from ~ 0.5 to ~ 3 hours (all models can be fitted within 3 hours Kaggle's TPU limit for a single session).
Our trained weights can be found here.
The source code for making predictions is published in this Kaggle notebook. It can also be found in "./kaggle_notebooks"
.
Again, to reproduce our results, the Kaggle notebook must be forked and executed 8 times, one for each model, changing only the content of the first cell (input) each time.
Our image-models submission files can be found in "./submissions/image_data"
.
For metadata, we used a simple baseline model proposed by @titericz, which can be found here.
Our metadata-model submission file can be found in "./submissions/metadata"
.
The script ensembling.py
ensembles the image-models submission files located in "./submissions/image_data"
whit a single metadata-model submission file located in "./submissions/metadata"
. The script allows you to choose whether to include external data (e.g. from other public notebooks) or not. External image-models submissions must be located in "./submissions/external_image_data"
. The resulting ensemble file is saved in the destination path: "./ensemble"
.
Shell commands, as run from the top level directory:
python ensembling.py [-h] [--include_external EXTERNAL] [--metadata_weight WEIGHT] [--ensemble_filename FILENAME]
Argument | Function | Default |
---|---|---|
-h, --help | Shows help message and exits | |
--include_external EXTERNAL | Whether to include external data (1) or not (0) | 0 |
--metadata_weight WEIGHT | Weight assigned to metadata (0-1) | 0.2 |
--ensemble_filename FILENAME | The desired name for the ensemble file | 'ensemble' |
Our winning solution, which scored 0.9484 private LB, 0.9620 public LB, was obtained by the following parameterization:
python ensembling.py --include_external 1 --metadata_weight 0.3 --ensemble_filename 'ensemble_external'
In this solution, we decided to ensemble two public notebooks to our image-models, in order to add some diversity (this one from @rajnishe and this one from @ajaykumar7778). The corresponding ensemble file can be found in "./ensemble/ensemble_external.csv"
.
However, a higher score can be obtained without using external data, with the following parameterization:
python ensembling.py --include_external 0 --metadata_weight 0.2 --ensemble_filename 'ensemble'
The last ensemble scored 0.9489 private LB, 0.9592 public LB, but was not selected as a final submission for judging. It can be found in "./ensemble/ensemble.csv"
.