Git Product home page Git Product logo

severstal-steel-defect-detection's Introduction

Severstal is leading the charge in efficient steel mining and production. The company recently created the country’s largest industrial data lake, with petabytes of data that were previously discarded. Severstal is now looking to machine learning to improve automation, increase efficiency, and maintain high quality in their production.

In this competition, you’ll help engineers improve the algorithm by localizing and classifying surface defects on a steel sheet.

Competition Report

Results

Winning submission:

Public LB

Private LB

0.92124

0.90883

My best submission:

Public LB

Private LB

0.91817

0.91023

My chosen submission:

Public LB Private LB

0.91844

0.90274

I chose my submission according to public LB score, and ended up rank 55/2436. Silly me!

Models

I used segmentation_models.pytorch (SMP) as a framework for all of my models. It's a really nice package and easy to extend, so I implemented a few of my own encoder and decoder modules.

I used an ensemble of models for my submissions, covered below.

Encoders

I ported EfficientNet to the above framework and had great results. I was hoping this would be a competitive advantage, but during the competition someone added an EfficientNet encoder to SMP and many others started using it. I used the b5 model for most of the competition, and found the smaller models didn't work as well.

I also ported InceptionV4 late in the competition and had pretty good results.

I ported a few others that didn't yield good results:

I had good results using se_resnext50_32x4d too. I found that because it didn't consume as much memory as the efficientnet-b5, I could use larger batch and image sizes which led to improvements.

Decoders

I used Unet + FPN from SMP. I added Dropout to the Unet implementation.

I implemented Nested Unet such that it could use pretrained encoders, but it didn't yield good results.

Other

I ported DeepLabV3 to SMP but didn't get good results.

Scores

These are the highest (private) scoring single models of each architecture.

Encoder Decoder Public LB Private LB

efficientnet-b5

FPN

0.91631

0.90110

efficientnet-b5

Unet

0.91665

0.89769

se_resnext50_32x4d

FPN

0.91744

0.90038

se_resnext50_32x4d

Unet

0.91685

0.89647

inceptionv4

FPN

0.91667

0.89149

Training

GPU

Early on I used a 2080Ti at home. For the final stretch I rented some Tesla V100's in the cloud. I found being able to increase the batch size using the V100 (16GB) gave a significant improvement over the 2080Ti (11GB).

Loss

I used (0.6 * BCE) + (0.4 * (1 - Dice)).

Targets

I treated this as 4-class classification (no background class). If a pixel was predicted to have two kinds of detects, the lower confidence predictions were removed in post-processing.

Optimizer

  • RAdam
  • Encoder
    • learning rate 7e-5
    • weight decay: 3e-5
  • Decoders
    • learning rate 3e-3
    • weight decay: 3e-4

LR Schedule

Flat for 30 epochs, then cosine anneal over 220 epochs. Typically I stopped training around 150-200 epochs.

Image Sizes

256x384, 256x416, 256x448, 256x480

Larger image sizes gave better results, but so did larger batch sizes. The se_resnext50_32x4d encoders could use a batch size of 32-36, while the efficientnet-b5 encoders typically used a batch size of 16-20.

Grayscale Input

The images were provided as 3-channel duplicated grayscale. I modified the models to accept 1 channel input, by recycling pretrained weights. I did a bunch of testing around this as I was worried it might hurt convergence, but using 3-channel input didn't give better results.

I parameterised the recycling of the weights so I could train models using the R, G, or B pretrained weights for the first conv layer. My hope was that this would produce a more diverse model ensemble.

Augmentation

I used the following Albumentations:

Compose([
    OneOf([
        CropNonEmptyMaskIfExists(self.height, self.width),
        RandomCrop(self.height, self.width)
    ], p=1),
    OneOf([
        CLAHE(p=0.5),  # modified source to get this to work with grayscale
        GaussianBlur(3, p=0.3),
        IAASharpen(alpha=(0.2, 0.3), p=0.3),
    ], p=1),
    Flip(p=0.5),
    Normalize(mean=[0.3439], std=[0.0383]),
    ToTensor(),
])

I found the mean and std from the training images.

It would have been nice to experiment with more of these, but it took so long to train the models it was difficult. I found these augs worked better than simple crops/flips and stuck with them.

Validation

I used a random 20% of the training data for validation with each run.

Models were largely selected based on their Mean Dice Coefficient. Where a few models had similar performance I would look at the Dice Coefficient for the most common class and the loss.

High scoring models I trained had a Mean Dice Coefficient around 0.951 - 0.952. Here's an example validation score:

val_dice_0     : 0.9680132865905762
val_dice_1     : 0.9881579875946045
val_dice_2     : 0.8649587631225586
val_dice_3     : 0.9835753440856934
val_dice_mean  : 0.9511765241622925

Pseudo Labels

I used the ensemble outputs of models as pseudo labels, which gave a huge performance boost. I used a custom BatchSampler to undersample (sample rate ~60%) from the pseudo-labelled data, and fix the number of pseudo-labelled samples per batch (each batch would contain 12% pseudo-labelled samples).

Some other people had poor results with pseudo-labels. Perhaps the technique above helped mitigate whatever downsides they faced.

I tried to get this to work for so long in order to take advantage of the larger batch sizes it enables. However, now matter what I tried, I had worse convergence using it. Eventually I gave up.

It's possible I was doing something wrong - but I invested a lot of time into trying this, and from talking to others at work it seems like they've had similar issues.

Post Processing & Submission

TTA

Only flip along dim 3 (W). I found TTA wasn't very useful in this competition, and consumed valuable submission time.

Prediction Thresholds

I used 0.5 for each class ie. if the output was > 0.5, the output was positive for that defect.

I was worried that tweaking these would risk overfitting public LB.

Defect Pixel Thresholds

I used 600, 600, 1000, 2000. If an image had fewer than this number of defect pixels for a class, all predictions for that class were set to zero.

Small changes to these values had little effect on the predictions. I was reluctant to make large changes because of the risk I would overfit public LB.

Component Domination

Since my models were set up to predict 4 classes, I was using sigmoid rather than softmax on their outputs, which meant sometimes I got overlapping defect predictions. I had an idea to look at the size of each component, and have the larger components "dominate" (remove) smaller overlapping components. I got a tiny boost from this, but I think it may simply be because at that stage I didn't have another way of ensuring there was only 1 defect prediction at each pixel.

I stopped using this technique in favour of simply taking the highest defect prediction for each pixel.

Dilation

I tried dilating the output prediction masks. Sometimes I got a small improvement, and sometimes got worse results so I stopped using it.

Ensemble Averaging

Here is where I made the mistake that cost me 1st place.

I had been using mean averaging (eg. train 5 models, take the mean prediction for each class for each pixel), and was struggling to break into the gold medal bracket. On the last day, I was reading the discussion forums and started comparing the defect distributions of my output with what others had probed to be the true defect distribution.

It looked like my models were overly conservative, as the number of defects I was detecting was lower than other people and much lower than the probed LB distribution. So, I started thinking about how I could increase the number of defect predictions. I had done some experimentation with pixel thresholds, and found that changing them didn't have much of an effect. I knew that the score was very sensitive to the prediction thresholds, so I was worried about fiddling with that and potentially overfitting to the public LB. Then, I had an idea:

I'd noticed that sometimes I would add new, high-performing models to my ensemble, and my LB score would decrease. I wondered if this might be explained by a majority of models mean averaging out positive predictions too often. If we're detecting faults, maybe we should weight positive predictions more than negative ones? I decided to try Root Mean Square averaging, as this would hug the higher values. For example:

input: [0.2 0.3 0.7]
Mean:  0.40
RMS:   0.45

input: [0.1 0.2 0.9]
Mean:  0.40
RMS:   0.54

input: [0.4 0.5 0.6]
Mean:  0.50
RMS:   0.51

input: [0.3 0.3 0.8]
Mean:  0.47
RMS:   0.52

input: [0.1 0.8 0.8]
Mean:  0.57
RMS:   0.66

This looks good. If one model prediction is a 9, and the others are 1 and 2, shouldn't we consider that a defect? (No, no we shouldn't. I was wrong.)

But when I tried it, I got a significant improvement on the LB! I went from 0.91809 to 0.91854, which was my best (public) score yet. Unknown to me, my private LB score had just dropped from 0.90876 (winning score) to 0.90259 (rank 55).

I'm pretty new to Kaggle, and while I'd heard about leaderboard "shakeup", I didn't know it could be this severe. I should have selected a 2nd submission from before I started using RMS to average the results - and if I'd picked any of the recent submissions, I would have taken 1st place.

Classification Model

Others on the discussion forums were advocating use of a two-step submission:

  1. Use a classifier to determine whether an image contains a each fault anywhere
  2. Ignore segmentation predictions for those ruled out by the classifier

The rationale was that false positives were very expensive, due to the way the Dice metric is calculated. By doing this, you could reduce FP.

I was pretty skeptical of this approach, and thought it would only be useful early in the competition while the precision of people's convolutional models was poor. But, as the competition progressed and I was struggling to climb the LB, I thought I'd better give it a go.

Since I'd spent so long tuning my fully convolutional segmentation ensemble, I was worried about allowing an "untuned" classifier to veto my segmentation predictions (and tuning it takes time). I decided on a strategy to use the classification prediction to amplify the defect pixel thresholds:

  1. When the classifier output is high (fault), we leave the pixel thresholds at their normal level.
  2. When the classifier output is low (no fault), we raise the pixel threshold by some factor.

The idea was that this would allow a false negative from the classifier to be overruled by a strong segmentation prediction.

def compute_threshold(t0, c_factor, classification_output):
    """
    t0 : numeric
        The original pixel threshold
    c_factor : numeric
        The amount a negative classification output will scale the pixel threshold.
    classification_output : numeric
        The output from a classifier in [0, 1]
    """
    return (t0 * c_factor) - (t0 * (c_factor - 1) * classification_output)

Here's an example illustrating how the threshold is scaled with different factors. I tried values 5, 10, and 20.

image

Here's a table comparing the results of my submissions with a classifier, to my previous ones. Note I ran it twice with c_factor = 5 and changed some weights in my ensemble.

Config Public LB Private LB
No classifier

0.91817

0.90612

c_factor = 5

0.91817

0.91023

c_factor = 5

0.91832

0.90951

c_factor = 10

0.91782

0.90952

c_factor = 20

0.91763

0.90911

From looking at my public LB score, I got zero and tiny improvements using a classifier and c_factor=5. When I tried increasing it, it looked like the results got much worse. Unknown to me, this was actually taking my private LB score from rank 11 to significantly better than rank 1! The first result, where my public LB score didn't increase at all, was actually the highest scoring submission I made all competition. As far as I know, no one on the discussion board has reported scoring this high on any of their submissions.

I gave up on using a classifier after this, and for the rest of my submissions I used only fully convolutional models. I managed to get similar Private LB scores with a fully convolutional ensemble, but using a classifier may have improved this even further.

Final Ensemble

I used the following fully convolutional ensemble for my final submissions:

  • Unet
    • 2x se_resnext50_32x4d
    • 1x efficientnet-b5
  • FPN
    • 3x se_resnext50_32x4d
    • 1x efficientnet-b5
    • 1x inceptionv4
Averaging Technique Public LB Private LB

RMS

0.91844

0.90274

Mean^

0.91699

0.90975

^I re-ran my final submission with mean-averaging after the deadline to check its performance.

Submission Scores

Visualisation of scores in the final week of the competition:

image

The dip at the end is when I started using RMS averaging.

Submission Kernels

Here are some public kernels showing the scores. There's a lot of copy-pasted code because of the kernel requirement of this competition - no easy way around it!

  1. Private LB 0.91023 | Classification + Segmentation Ensemble
  2. Private LB 0.90975 | Fully Convolutional Segmentation Ensemble

Discussion

Improvements

Next time I would like to:

  • Softmax w/ background class
  • Lovasz Loss
  • Inplace BatchNorm (potentially huge memory saving)

And of course, manually choose two submissions that are appropriately diverse.

Usage

Folder Structure

severstal-steel-defect-detection/
│
├── sever/
│    │
│    ├── cli.py - command line interface
│    ├── main.py - top level entry point to start training
│    │
│    ├── base/ - abstract base classes
│    │   ├── base_model.py - abstract base class for models
│    │   └── base_trainer.py - abstract base class for trainers
│    │
│    ├── data_loader/ - anything about data loading goes here
│    │   ├── augmentation.py
│    │   ├── data_loaders.py
│    │   ├── datasets.py
│    │   ├── process.py - pre/post processing, RLE conversion, etc
│    │   └── sampling.py - class balanced sampling, used for pseudo labels
│    │
│    ├── model/ - anything to do with nn.Modules, metrics, learning rates, etc
│    │   ├── loss.py
│    │   ├── metric.py
│    │   ├── model.py
│    │   ├── optimizer.py
│    │   └── scheduler.py
│    │
│    ├── trainer/ - training loop
│    │   └── trainer.py
│    │
│    └── utils/
│        .
│
├── logging.yml - logging configuration
├── data/ - training data goes here
├── experiments/ - configuration files for training
├── saved/ - checkpoints, logging, and tensorboard records will be saved here
└── tests/

Environment

Create and activate the Anaconda environment using:

$ conda env create --file environment.yml
$ conda activate sever

Note that the models used here are in a mirror/fork of SMP. If you want to use the same models, you'll need to clone this and install it into the conda environment using

$ git clone [email protected]:khornlund/segmentation-models-pytorch.git
$ cd segmentation-models-pytorch/
$ git checkout efficietnet
$ pip install -e .

Note there are some slight differences between my EfficientNet implementation, and the one that is now in SMP upstream. The key difference is I modified the encoders to support a configurable number of input channels, so I could use 1 channel grayscale input.

Download

You can download the data using download.sh. Note this assumes you have your kaggle.json token set up to use the Kaggle API.

Training

Setup your desired configuration file, and point to it using:

$ sever train -c experiments/config.yml

Upload Trained Model

Checkpoints can be uploaded to Kaggle using:

$ sever upload -r <path-to-saved-run> -e <epoch-num>

The checkpoint is inferred from the epoch number. You can select multiple epochs to upload, eg.

$ sever upload -r saved/sever-unet-b5/1026-140000 -e 123 -e 234

Tensorboard Visualization

This project supports https://pytorch.org/docs/stable/tensorboard.html visualization.

  1. Run training

    Set tensorboard option in config file true.

  2. Open tensorboard server

    Type tensorboard --logdir saved/ at the project root, then server will open at http://localhost:6006

Acknowledgments

This project uses the Cookiecutter PyTorch template.

Various code has been copied from Github or Kaggle. In general I put in the docstring where I copied it from, but if I haven't referenced it properly I apologise. I know for a bunch of the loss I functions took code from Catalyst.

severstal-steel-defect-detection's People

Contributors

khornlund avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

severstal-steel-defect-detection's Issues

Segmentation-Models-Pytorch package not found in environment

After installing the segmentation-models-pytorch, the package is not found in the environment's site-packages. Even the sever package is not present. Installing segmentation-models-pytorch separately gives the package but compatibility errors with other packges arise while installing separately.
How we can sort the issue?

aoi检测

在机器视觉领域,**一直被halcon,康耐视和基恩士垄断。良心的我自主研发机器视觉自动化软件。在缺损检测领域有着独到的经验。
目前我的软件优势:
1、定位技术上不输halcon。
2、专利检测算法pww特征提取。可以将颜色纹理量化后提取区域轮廓计算量化的面积。
3、图像制程采用多层次定位+pww特征提取检测。比深度学习更可靠。
4、采用流程图和决策图的全中文运动制程。比plc更简单。
5、保留着halcon接口。支持halcon工程师的二次开发使用。

https://download.csdn.net/download/pww71/85093101
https://download.csdn.net/download/pww71/62047145
链接:https://pan.baidu.com/s/1vsTptn_pvtbK2sDhWVCZJg
提取码:1234

当前市场上很多类似软件和我的比差距很大 。首先他们的功能过于庞大,而且不够通用。学习和操作不是普通人能短时间掌握的。而我的软件优势明显。 就是定位和检测。其他的任何算子不论是halcon还是其他厂商的算子都可以定制。从外部接口导入到框架内。定位和检测都是自主研发,检测直接量化颜色纹理和区域轮廓进行分析,是我申报专利的算法。因此参数固定和简单。当然比深度学习参数还是麻烦一点点。但是效果比深度学习更稳定。
一般情况下,人眼识别都是颜色纹理和区域轮廓这些基本特征。所以人眼能识别的,基本上我的检测就能识别。而且定位采用多层次定位,在固定位的颜色纹理和区域轮廓上分析基本上可以满足市场上百分之90的缺损检测需求。
另外 我的框架是仿照操作系统的架构,支持任何软件和硬件,只要按照我的接口标准写驱动就可以融入框架。
所以 对于任何高速电机和板卡,3d相机,激光检测设备等等。 都可以融入我的框架。我的内部就只负责客户制程和指挥调度各个模块。
现在操作系统对各个硬件软件的支持也是通过开放的接口。我也是这样做的,对于高级开发应用还是需要定制的。而对于大量通用的检测。直接可以让工人制程就可以完成。

Training not started and read error

I have been experiencing problems while following the training procedures. Please suggest me how to solve the following error.

(sever) D:\RCNN\ssdd>sever train -c experiments/fpn-b5.yml
4608 - Runner - INFO - Using random seed: 447676
4608 - Runner - DEBUG - Building model architecture
Load result: None
4608 - Runner - DEBUG - Using device 0 of [0]
4608 - Runner - DEBUG - Building optimizer and lr scheduler
4608 - Runner - INFO - Found 308 encoder weight params
4608 - Runner - INFO - Found 193 encoder bias params
4608 - Runner - INFO - Found 19 decoder weight params
4608 - Runner - INFO - Found 12 decoder bias params
4608 - Runner - DEBUG - Getting augmentations

demo--------->>>
4608 - Runner - DEBUG - Getting data_loader instance
path: data\raw\severstal-steel-defect-detection
file: train.csv
path: data\raw\severstal-steel-defect-detection
file: pseudo.csv
4608 - SamplerFactory - INFO - Creating type...
4608 - SamplerFactory - INFO - Sample population absolute class sizes: [5332 1801]
4608 - SamplerFactory - INFO - Sample population relative class sizes: [0.74751157 0.25248843]
4608 - SamplerFactory - INFO - Target batch class distribution [0.7846383 0.2153617] using alpha=-0.15
4608 - SamplerFactory - INFO - Rounded batch class distribution [0.77777778 0.22222222]
4608 - SamplerFactory - INFO - Expecting [14 4] samples of each class per batch, over 396 batches of size 18
4608 - SamplerFactory - INFO - Sampling rates: [1.03975994 0.87951138]

Ademo--------->>>
4608 - Runner - DEBUG - Getting loss and metric function handles
4608 - Runner - DEBUG - Initialising trainer
4608 - Trainer - INFO - Freezing encoder weights
4608 - Trainer - INFO - Starting training...
4608 - Trainer - INFO - Unfreezing encoder weights
Traceback (most recent call last):
File "C:\Users\USER\anaconda3\envs\sever\Scripts\sever-script.py", line 33, in
sys.exit(load_entry_point('sever', 'console_scripts', 'sever')())
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\click\core.py", line 829, in call
return self.main(*args, **kwargs)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "d:\rcnn\ssdd\sever\cli.py", line 37, in train
Runner(config).train(resume)
File "d:\rcnn\ssdd\sever\main.py", line 77, in train
trainer.train()
File "d:\rcnn\ssdd\sever\base\base_trainer.py", line 59, in train
result = self._train_epoch(epoch)
File "d:\rcnn\ssdd\sever\trainer\trainer.py", line 67, in _train_epoch
for batch_idx, (data, target) in enumerate(self.data_loader):
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\torch\utils\data\dataloader.py", line 819, in next
return self._process_data(data)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\torch\utils\data\dataloader.py", line 846, in _process_data
data.reraise()
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\torch_utils.py", line 369, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\torch\utils\data_utils\worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "d:\rcnn\ssdd\sever\data_loader\datasets.py", line 56, in getitem
augmented = self.transforms(image=img, mask=mask)
File "d:\rcnn\ssdd\sever\data_loader\augmentation.py", line 61, in call
return self.transform(*args, **kwargs)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\albumentations\core\composition.py", line 176, in call
data = t(force_apply=force_apply, **data)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\albumentations\core\composition.py", line 223, in call
data = t(force_apply=True, **data)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\albumentations\core\transforms_interface.py", line 87, in call
return self.apply_with_params(params, **kwargs)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\albumentations\core\transforms_interface.py", line 94, in apply_with_params
params = self.update_params(params, **kwargs)
File "C:\Users\USER\anaconda3\envs\sever\lib\site-packages\albumentations\core\transforms_interface.py", line 142, in update_params
params.update({"cols": kwargs["image"].shape[1], "rows": kwargs["image"].shape[0]})
AttributeError: 'NoneType' object has no attribute 'shape'

unexpected keyword argument

Hello,
when I run "sever train -c experiments/unet-b5.yml", there is an error:
TypeError: init() got an unexpected keyword argument 'dropout'
TypeError: init() got an unexpected keyword argument 'weight_std'
Then I delete keywords and it is able to run successfully, Why does this happen, have you deleted some modules ?

Resume training is not working

I've successfully started training but when I tried resuming from the last epoch it is starting the training from epoch 0.

(steel) C:\Users\User\Documents\sdd>sever train --resume "C:\Users\User\Documents\sdd\saved\sever-FPN-efficientnet-b5-BCEDiceLoss-RAdam\1008_124332\checkpoints\checkpoint-epoch179.pth"
5740 - Runner - INFO - Using random seed: 447676
5740 - Runner - DEBUG - Building model architecture
Load result: None
5740 - Runner - DEBUG - Using device 0 of [0]
5740 - Runner - DEBUG - Building optimizer and lr scheduler
5740 - Runner - INFO - Found 308 encoder weight params
5740 - Runner - INFO - Found 193 encoder bias params
5740 - Runner - INFO - Found 19 decoder weight params
5740 - Runner - INFO - Found 12 decoder bias params
5740 - Runner - INFO - Loading checkpoint: C:\Users\User\Documents\sdd\saved\sever-FPN-efficientnet-b5-BCEDiceLoss-RAdam\1008_124332\checkpoints\checkpoint-epoch179.pth
5740 - Runner - INFO - Checkpoint "C:\Users\User\Documents\sdd\saved\sever-FPN-efficientnet-b5-BCEDiceLoss-RAdam\1008_124332\checkpoints\checkpoint-epoch179.pth" loaded
5740 - Runner - DEBUG - Getting augmentations
5740 - Runner - DEBUG - Getting data_loader instance
5740 - SamplerFactory - INFO - Creating type...
5740 - SamplerFactory - INFO - Sample population absolute class sizes: [5332 1801]
5740 - SamplerFactory - INFO - Sample population relative class sizes: [0.74751157 0.25248843]
5740 - SamplerFactory - INFO - Target batch class distribution [0.7846383 0.2153617] using alpha=-0.15
5740 - SamplerFactory - INFO - Rounded batch class distribution [0.75 0.25]
5740 - SamplerFactory - INFO - Expecting [6 2] samples of each class per batch, over 891 batches of size 8
5740 - SamplerFactory - INFO - Sampling rates: [1.00262566 0.98945031]
5740 - Runner - DEBUG - Getting loss and metric function handles
5740 - Runner - DEBUG - Initialising trainer
5740 - Trainer - INFO - Freezing encoder weights
5740 - Trainer - INFO - Starting training...
5740 - Trainer - INFO - Unfreezing encoder weights
C:\Users\User\Anaconda3\envs\steel\lib\site-packages\torch\nn\functional.py:1350: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
5740 - Trainer - DEBUG - Train Epoch: 0 [0/7128 (0%)] Loss: 0.049221
5740 - Trainer - DEBUG - Train Epoch: 0 [128/7128 (2%)] Loss: 0.051688

train

hello ,how can i training your model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.