Git Product home page Git Product logo

Comments (23)

yassouali avatar yassouali commented on September 12, 2024

Hi,

Yes, you're right, the results needs to be higher, the last time I run some experiments, PSPNet with resnet50 (the checkpoint provided), I got 82.5 mIoU (you can see it in the checkpoint). The configs used are as follows :

{'name': 'PSPNet',
 'n_gpu': 2,
 'arch': {'type': 'PSPNet',
  'args': {'backbone': 'resnet50',
   'freeze_bn': False,
   'freeze_backbone': False}},
 'train_loader': {'type': 'VOC',
  'args': {'data_dir': '-----',
   'batch_size': 24,
   'base_size': 420,
   'crop_size': 380,
   'augment': True,
   'shuffle': True,
   'scale': True,
   'flip': True,
   'rotate': True,
   'blur': False,
   'split': 'train_aug',
   'num_workers': 8}},
 'val_loader': {'type': 'VOC',
  'args': {'data_dir': '------',
   'crop_size': 480,
   'batch_size': 24,
   'val': True,
   'split': 'val',
   'num_workers': 8}},
 'optimizer': {'type': 'SGD',
  'differential_lr': True,
  'args': {'lr': 0.01, 'weight_decay': 0.0001, 'momentum': 0.9}},
 'loss': 'CrossEntropyLoss2d',
 'ignore_index': 255,
 'lr_scheduler': {'type': 'Poly', 'args': {}},
 'trainer': {'epochs': 100,
  'save_dir': 'saved/',
  'save_period': 5,
  'monitor': 'max Mean_IoU',
  'early_stop': 10,
  'tensorboard': True,
  'log_dir': 'saved/runs',
  'log_per_iter': 20,
  'val': True,
  'val_per_epochs': 5}}

So yeah, you can push the batch size a lot more with 4 GPUs (I don't know yours, but the config above was used for two p100 with 24 GB GRAM), you can test with a batch of 32.

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

Alright, I can try that and get back to you.

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

So, I tried with a batch size of 24 (idea was to go up until atleast 79% mIoU) was obtained but this is where I was at after 100 epochs:

pspnet_modified

Reading your config again, I see that you do not use sync_bn = True in it, could synchronized batchnorm be the issue?

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

Actually yes, I added synch bn after those tests, can you try and see if that solves the issue? Thanks

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

Hey, thanks for the quick response. Let me try to turn that to False and get back.

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

I tried doing what you suggested. However, it seems that that didn't help

Here is the screenshot of last two evaluations:

pspnet_no_sbn

And here is the validation characteristics
val_characterstics_psp

You can see that the val loss begins diverging after about 4k iterations. I've checked the learning rate and both follow the poly decay. I also recall that the authors reported their results with a Resnet101. What do you think?

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

The results certainly need to be higher. Do you start training from scratch ? What is the learning rate you used ? What GPUs ? can you post the last config you tried.

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

Training is from scratch with an lr = 0.01 and differential_lr =True. The GPUs are Volta V100 connected over NVLink. This is the config which I tried:

{
    "name": "PSPNet",
    "n_gpu": 4,
    "use_synch_bn": false,

    "arch": {
        "type": "PSPNet",
        "args": {
            "backbone": "resnet50",
            "freeze_bn": false,
            "freeze_backbone": false
        }
    },

    "train_loader": {
        "type": "VOC",
        "args":{
            "data_dir": "/path/to/VOC/",
            "batch_size": 24,
            "base_size": 400,
            "crop_size": 380,
            "augment": true,
            "shuffle": true,
            "scale": true,
            "flip": true,
            "rotate": true,
            "blur": false,
            "split": "train_aug",
            "num_workers": 8
        }
    },

    "val_loader": {
        "type": "VOC",
        "args":{
            "data_dir": "/path/to/VOC",
            "batch_size": 24,
            "crop_size": 480,
            "val": true,
            "split": "val_aug",
            "num_workers": 4
        }
    },

    "optimizer": {
        "type": "SGD",
        "differential_lr": true,
        "args":{
            "lr": 0.01,
            "weight_decay": 1e-4,
            "momentum": 0.9
        }
    },

    "loss": "CrossEntropyLoss2d",
    "ignore_index": 255,
    "lr_scheduler": {
        "type": "Poly",
        "args": {}
    },

    "trainer": {
        "epochs": 120,
        "save_dir": "saved/",
        "save_period": 10,
        "monitor": "max Mean_IoU",
        "early_stop": 10,
        "tensorboard": true,
        "log_dir": "saved/runs",
        "log_per_iter": 20,

        "val": true,
        "val_per_epochs": 5
    }
}

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

I'll launch some experiments and I'll get back to you

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

Any update @yassouali?

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

I run some tests using your config, and I got the same results, I'll try to find the problem ASAP

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

I'm running some experiments with some of my own intuitions and ideas. I'll get back to you with them and if I can reproduce the results, I'll share the config.

from pytorch-segmentation.

mingcv avatar mingcv commented on September 12, 2024

Mine is worse than that with the following config trained from scratch.
The GPU is RTX 2080Ti and my available device is only one.

{
  "name": "PSPNet",
  "n_gpu": 1,
  "arch": {
    "type": "PSPNet",
    "args": {
      "backbone": "resnet50",
      "freeze_bn": false,
      "freeze_backbone": false
    }
  },
  "train_loader": {
    "type": "VOC",
    "args": {
      "data_dir": "/home/huqiming/datasets",
      "batch_size": 8,
      "base_size": 420,
      "crop_size": 380,
      "augment": true,
      "shuffle": true,
      "scale": true,
      "flip": true,
      "rotate": true,
      "blur": false,
      "split": "trainval_aug",
      "num_workers": 8
    }
  },
  "val_loader": {
    "type": "VOC",
    "args": {
      "data_dir": "/home/huqiming/datasets",
      "crop_size": 480,
      "batch_size": 8,
      "val": true,
      "split": "val",
      "num_workers": 8
    }
  },
  "optimizer": {
    "type": "SGD",
    "differential_lr": true,
    "args": {
      "lr": 0.01,
      "weight_decay": 1e-4,
      "momentum": 0.9
    }
  },
  "loss": "CrossEntropyLoss2d",
  "ignore_index": 255,
  "lr_scheduler": {
    "type": "Poly",
    "args": {}
  },
  "trainer": {
    "val_on_start": false,
    "epochs": 200,
    "save_dir": "saved/",
    "save_period": 10,
    "monitor": "max Mean_IoU",
    "if_early_stop": false,
    "early_stop": 10,
    "tensorboard": true,
    "log_dir": "saved/runs",
    "log_per_iter": 20,
    "show_per_batch": 5,
    "val": true,
    "val_per_epochs": 1
  }
}

image

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

Hey @huqiming513, try with a bigger batch size

from pytorch-segmentation.

mingcv avatar mingcv commented on September 12, 2024

Hi @aicaffeinelife, I used the pretrained backbone and the results were hugely improved, the config of which is the same as above but the training batch size is only 4. If you also used the pretrained backbone, the resnet50 backbone may be a better choice. :)
image

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

Which pre-trained backbone? The one provided by official PyTorch one?

from pytorch-segmentation.

aicaffeinelife avatar aicaffeinelife commented on September 12, 2024

Okay some interesting results, I got substantially high results when I trained with trainval set and evaluated on Pascal VOC. Tested on only one GPU with a batch size of 4. I'm gonna keep the thread open with some other results.

from pytorch-segmentation.

Dongshengjiang avatar Dongshengjiang commented on September 12, 2024

I got similar accuracy of 0.71 Mean_IoU for the default config provided by gitgub code. Could you please attach the config you used to get the highest accurcy? Another issue is how about the results of using UNet and other networks. It would be nice to show their results in your readme.

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

Unfortunately right now I am quite busy. Hopefully soon I will retrain and post the results for the other model / datasets and the optimal config for each model. Until then, let's keep this issue open.

from pytorch-segmentation.

Dongshengjiang avatar Dongshengjiang commented on September 12, 2024

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

Hi, sorry I still don't have time to investigate this (a lot of conference deadlines lately), if anyone finds a solution (or some errors in the code), please feel free to make a pull request to fix this.

Hopefully I'll have sometime in the comming weeks.

from pytorch-segmentation.

iwyoo avatar iwyoo commented on September 12, 2024

@yassouali
Hi, is there any update on this issue? If PSPNet performance was reproduced well before, it seems that we can compare the old code commits. Is the performance table in the README reproduced by using this repository?

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

closed for now, can be opened later

from pytorch-segmentation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.