Hi, Thanks for building this cool library. I've been trying to run s

Training is from scratch with an lr = 0.01 and <code

Any update <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Unable to reproduce PSPNet/FCN8 results about pytorch-segmentation HOT 23 CLOSED

aicaffeinelife commented on September 12, 2024

Unable to reproduce PSPNet/FCN8 results

from pytorch-segmentation.

Comments (23)

yassouali commented on September 12, 2024

Hi,

Yes, you're right, the results needs to be higher, the last time I run some experiments, PSPNet with resnet50 (the checkpoint provided), I got 82.5 mIoU (you can see it in the checkpoint). The configs used are as follows :

{'name': 'PSPNet',
 'n_gpu': 2,
 'arch': {'type': 'PSPNet',
  'args': {'backbone': 'resnet50',
   'freeze_bn': False,
   'freeze_backbone': False}},
 'train_loader': {'type': 'VOC',
  'args': {'data_dir': '-----',
   'batch_size': 24,
   'base_size': 420,
   'crop_size': 380,
   'augment': True,
   'shuffle': True,
   'scale': True,
   'flip': True,
   'rotate': True,
   'blur': False,
   'split': 'train_aug',
   'num_workers': 8}},
 'val_loader': {'type': 'VOC',
  'args': {'data_dir': '------',
   'crop_size': 480,
   'batch_size': 24,
   'val': True,
   'split': 'val',
   'num_workers': 8}},
 'optimizer': {'type': 'SGD',
  'differential_lr': True,
  'args': {'lr': 0.01, 'weight_decay': 0.0001, 'momentum': 0.9}},
 'loss': 'CrossEntropyLoss2d',
 'ignore_index': 255,
 'lr_scheduler': {'type': 'Poly', 'args': {}},
 'trainer': {'epochs': 100,
  'save_dir': 'saved/',
  'save_period': 5,
  'monitor': 'max Mean_IoU',
  'early_stop': 10,
  'tensorboard': True,
  'log_dir': 'saved/runs',
  'log_per_iter': 20,
  'val': True,
  'val_per_epochs': 5}}

So yeah, you can push the batch size a lot more with 4 GPUs (I don't know yours, but the config above was used for two p100 with 24 GB GRAM), you can test with a batch of 32.

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

Alright, I can try that and get back to you.

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

So, I tried with a batch size of 24 (idea was to go up until atleast 79% mIoU) was obtained but this is where I was at after 100 epochs:

Reading your config again, I see that you do not use sync_bn = True in it, could synchronized batchnorm be the issue?

from pytorch-segmentation.

yassouali commented on September 12, 2024

Actually yes, I added synch bn after those tests, can you try and see if that solves the issue? Thanks

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

Hey, thanks for the quick response. Let me try to turn that to False and get back.

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

I tried doing what you suggested. However, it seems that that didn't help

Here is the screenshot of last two evaluations:

And here is the validation characteristics

You can see that the val loss begins diverging after about 4k iterations. I've checked the learning rate and both follow the poly decay. I also recall that the authors reported their results with a Resnet101. What do you think?

from pytorch-segmentation.

yassouali commented on September 12, 2024

The results certainly need to be higher. Do you start training from scratch ? What is the learning rate you used ? What GPUs ? can you post the last config you tried.

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

Training is from scratch with an lr = 0.01 and differential_lr =True. The GPUs are Volta V100 connected over NVLink. This is the config which I tried:

{
    "name": "PSPNet",
    "n_gpu": 4,
    "use_synch_bn": false,

    "arch": {
        "type": "PSPNet",
        "args": {
            "backbone": "resnet50",
            "freeze_bn": false,
            "freeze_backbone": false
        }
    },

    "train_loader": {
        "type": "VOC",
        "args":{
            "data_dir": "/path/to/VOC/",
            "batch_size": 24,
            "base_size": 400,
            "crop_size": 380,
            "augment": true,
            "shuffle": true,
            "scale": true,
            "flip": true,
            "rotate": true,
            "blur": false,
            "split": "train_aug",
            "num_workers": 8
        }
    },

    "val_loader": {
        "type": "VOC",
        "args":{
            "data_dir": "/path/to/VOC",
            "batch_size": 24,
            "crop_size": 480,
            "val": true,
            "split": "val_aug",
            "num_workers": 4
        }
    },

    "optimizer": {
        "type": "SGD",
        "differential_lr": true,
        "args":{
            "lr": 0.01,
            "weight_decay": 1e-4,
            "momentum": 0.9
        }
    },

    "loss": "CrossEntropyLoss2d",
    "ignore_index": 255,
    "lr_scheduler": {
        "type": "Poly",
        "args": {}
    },

    "trainer": {
        "epochs": 120,
        "save_dir": "saved/",
        "save_period": 10,
        "monitor": "max Mean_IoU",
        "early_stop": 10,
        "tensorboard": true,
        "log_dir": "saved/runs",
        "log_per_iter": 20,

        "val": true,
        "val_per_epochs": 5
    }
}

from pytorch-segmentation.

yassouali commented on September 12, 2024

I'll launch some experiments and I'll get back to you

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

Any update @yassouali?

from pytorch-segmentation.

yassouali commented on September 12, 2024

I run some tests using your config, and I got the same results, I'll try to find the problem ASAP

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

I'm running some experiments with some of my own intuitions and ideas. I'll get back to you with them and if I can reproduce the results, I'll share the config.

from pytorch-segmentation.

mingcv commented on September 12, 2024

Mine is worse than that with the following config trained from scratch.
The GPU is RTX 2080Ti and my available device is only one.

{
  "name": "PSPNet",
  "n_gpu": 1,
  "arch": {
    "type": "PSPNet",
    "args": {
      "backbone": "resnet50",
      "freeze_bn": false,
      "freeze_backbone": false
    }
  },
  "train_loader": {
    "type": "VOC",
    "args": {
      "data_dir": "/home/huqiming/datasets",
      "batch_size": 8,
      "base_size": 420,
      "crop_size": 380,
      "augment": true,
      "shuffle": true,
      "scale": true,
      "flip": true,
      "rotate": true,
      "blur": false,
      "split": "trainval_aug",
      "num_workers": 8
    }
  },
  "val_loader": {
    "type": "VOC",
    "args": {
      "data_dir": "/home/huqiming/datasets",
      "crop_size": 480,
      "batch_size": 8,
      "val": true,
      "split": "val",
      "num_workers": 8
    }
  },
  "optimizer": {
    "type": "SGD",
    "differential_lr": true,
    "args": {
      "lr": 0.01,
      "weight_decay": 1e-4,
      "momentum": 0.9
    }
  },
  "loss": "CrossEntropyLoss2d",
  "ignore_index": 255,
  "lr_scheduler": {
    "type": "Poly",
    "args": {}
  },
  "trainer": {
    "val_on_start": false,
    "epochs": 200,
    "save_dir": "saved/",
    "save_period": 10,
    "monitor": "max Mean_IoU",
    "if_early_stop": false,
    "early_stop": 10,
    "tensorboard": true,
    "log_dir": "saved/runs",
    "log_per_iter": 20,
    "show_per_batch": 5,
    "val": true,
    "val_per_epochs": 1
  }
}

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

Hey @huqiming513, try with a bigger batch size

from pytorch-segmentation.

mingcv commented on September 12, 2024

Hi @aicaffeinelife, I used the pretrained backbone and the results were hugely improved, the config of which is the same as above but the training batch size is only 4. If you also used the pretrained backbone, the resnet50 backbone may be a better choice. :)

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

Which pre-trained backbone? The one provided by official PyTorch one?

from pytorch-segmentation.

aicaffeinelife commented on September 12, 2024

Okay some interesting results, I got substantially high results when I trained with trainval set and evaluated on Pascal VOC. Tested on only one GPU with a batch size of 4. I'm gonna keep the thread open with some other results.

from pytorch-segmentation.

Dongshengjiang commented on September 12, 2024

I got similar accuracy of 0.71 Mean_IoU for the default config provided by gitgub code. Could you please attach the config you used to get the highest accurcy? Another issue is how about the results of using UNet and other networks. It would be nice to show their results in your readme.

from pytorch-segmentation.

yassouali commented on September 12, 2024

Unfortunately right now I am quite busy. Hopefully soon I will retrain and post the results for the other model / datasets and the optimal config for each model. Until then, let's keep this issue open.

from pytorch-segmentation.

Dongshengjiang commented on September 12, 2024

thanks dongshengjiang 邮箱：[email protected] 签名由网易邮箱大师定制 On 01/20/2020 23:53, Yassine wrote: Unfortunately right now I am quite busy. Hopefully soon I will retrain and post the results for the other model / datasets and the optimal config for each model. Until then, let's keep this issue open. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

from pytorch-segmentation.

yassouali commented on September 12, 2024

Hi, sorry I still don't have time to investigate this (a lot of conference deadlines lately), if anyone finds a solution (or some errors in the code), please feel free to make a pull request to fix this.

Hopefully I'll have sometime in the comming weeks.

from pytorch-segmentation.

iwyoo commented on September 12, 2024

@yassouali
Hi, is there any update on this issue? If PSPNet performance was reproduced well before, it seems that we can compare the old code commits. Is the performance table in the README reproduced by using this repository?

from pytorch-segmentation.

yassouali commented on September 12, 2024

closed for now, can be opened later

from pytorch-segmentation.

Unable to reproduce PSPNet/FCN8 results about pytorch-segmentation HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent