Git Product home page Git Product logo

Comments (17)

VladislavAD avatar VladislavAD commented on September 12, 2024

I make custom data_loader and now struggle to make it accept my data.

from base import BaseDataSet, BaseDataLoader
from utils import palette
import numpy as np
import os
import torch
import cv2
import psycopg2
from PIL import Image
from glob import glob
from torch.utils.data import Dataset
from torchvision import transforms
from utils.dbDataUtils import get_data


class DefectDataset(BaseDataSet):

    def __init__(self, train_shape = 10, db_name = '', **kwargs):
        self.num_classes = 1
        self.train_shape = train_shape
        self.db_name = db_name
        super(DefectDataset, self).__init__(**kwargs)

    def _set_files(self):
        self.train_set = get_data(512, train_db_name=self.db_name,data_shape=(self.train_shape,self.train_shape),train_type=1, classes=1)
        self.test_set = get_data(512, train_db_name=self.db_name,data_shape=(self.train_shape,self.train_shape),train_type=1, classes=1)
            
    
    def _load_data(self, index):
        image = self.train_set[0][index].astype(np.float)
        label = self.train_set[1][index].astype(np.float)
        return image, label, str(index)

    def __len__(self):
        return len(self.train_set[0])

class BoardDefects(BaseDataLoader):
    def __init__(self, data_dir, batch_size, split, train_shape = 10, db_name = '', crop_size=None, base_size=None, scale=True, num_workers=1, val=False,
                    shuffle=False, flip=False, rotate=False, blur= False, augment=False, val_split= None, return_id=False):

        self.MEAN = [0, 0, 0]
        self.STD = [255, 255, 255]

        kwargs = {
            'root': data_dir,
            'split': split,
            'mean': self.MEAN,
            'std': self.STD,
            'augment': augment,
            'crop_size': crop_size,
            'base_size': base_size,
            'scale': scale,
            'flip': flip,
            'blur': blur,
            'rotate': rotate,
            'return_id': return_id,
            'val': val,
            'train_shape': train_shape,
            'db_name': db_name
        }

        self.dataset = DefectDataset(**kwargs)
        super(BoardDefects, self).__init__(self.dataset, batch_size, shuffle, num_workers, val_split)

I find out that other datasets use int32 type for labels, but I'm using float and get error ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

Hi,

I guess if the labels are of size imwidth, imheight, classes, each vector is one hot, so you'll need to convert it into 2D array of size imwidth, imheight (the CE in pytorch accepts labels of this form) where each element = the class of the given pixel, taking the argmax works I guess.

For the dataloader you created, it looks good just make sure your labels are of size imwidth, imheight. I don't think having float labels is problematic given that they're converted into long tensors in basedataset. One thing I need to mention is that you need to have: self.STD = [1, 1, 1] given that the normalization is done to the tensors in range [0, 1].

from the error you posted it seems that there is an error in the size of the feature maps, here I see two problems:

  • One element in the channel dimension torch.Size([1, 512, 1, 1], so I guess you're using 1 image per batch, which can give an error when using batchnorm, try freezing the batchnorm layers in this case.

  • The second is that your spatial dimensions are 1x1, I don't know which model or backbone you're using, but I think the images are quite small, given that after some pooling layers (or strided convs) no dimensions are left.

I hope I answered your question

from pytorch-segmentation.

VladislavAD avatar VladislavAD commented on September 12, 2024

Thanks for answer.

I guess if the labels are of size imwidth, imheight, classes, each vector is one hot, so you'll need to convert it into 2D array of size imwidth, imheight (the CE in pytorch accepts labels of this form)

Is this also applies to other loss functions or only to CrossEntropy (wish I guess it is CE)?

One thing I need to mention is that you need to have: self.STD = [1, 1, 1] given that the normalization is done to the tensors in range [0, 1].

My images represented in raw bitmap data, I dig into normilize code and find that each pixel value is derived by passed STD, so to make [0, 255] into [0, 1] I pass the values [255, 255, 255]. Am I right?

The second is that your spatial dimensions are 1x1, I don't know which model or backbone you're using, but I think the images are quite small, given that after some pooling layers (or strided convs) no dimensions are left.

So my guess is that the model awaits to have [batchsize, imwidth, imheight, 3] as input and [batchsize, imwidth, imheight] as output but have my [batchsize, imwidth, imheight, classes] and treat it from the end and we lose height 512 value with classes (which is 1 class for now) so the most odd number here is this [1, 512, 1, 1]. I will try to pass integer [imwidth, imheight] label. I copied the model and backbone from provided config.json so it is PSPNet with resnet50.

from pytorch-segmentation.

VladislavAD avatar VladislavAD commented on September 12, 2024

I was able to "start" training with corrections mentioned above without errors but it freeze before any train with 100% CPU usage after memory allocation.

Edit: this is last printed info

Detected GPUs: 2 Requested: 1


  0%|                                                                                                      | 0/10 [00:00<?, ?it/s]

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

For the normalization, first we define a transform: transforms.Normalize(mean, std) and then apply it after the image is converted into a tensor (so values between 0 and 1) self.normalize(self.to_tensor(image)), so the values of STD must normalized too.

In Pytorch, the inputs are represented as [B, C, H, W] so the NCHW format and not like the NHWC in tensorflow, so [1, 512, 1, 1] refers to having one image per batch, 512 channels and 1x1 spatial dimensions, so the labels, the code is written with the idea in mind that all the labels are of size [H,W] where each element is the class assigned to the pixel, as I said, if you have one-hots, just take the argmax over the last dimension and you're set.

If you training is frozen, I think there is some problem with your dataloader, try to print the labels and the image sizes, and make sure that your labels are not all zeros (in this case you'll be stuck in the preprecessing step in the _augment function.

from pytorch-segmentation.

VladislavAD avatar VladislavAD commented on September 12, 2024

I disabled augmentation with "augment": false in config and make my images (3, 512, 512)(using reshape) and labeles (512, 512)(using argmax to last channel) and now I get error:

Traceback (most recent call last):
  File "train.py", line 61, in <module>
    main(config, args.resume)
  File "train.py", line 42, in main
    trainer.train()
  File "D:\xxx\pytorch_segmentation\base\base_trainer.py", line 98, in train
    results = self._train_epoch(epoch)
  File "D:\xxx\pytorch_segmentation\trainer.py", line 48, in _train_epoch
    for batch_idx, (data, target) in enumerate(tbar):
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\tqdm\_tqdm.py", line 1017, in __iter__
    for obj in iterable:
  File "D:\xxx\pytorch_segmentation\base\base_dataloader.py", line 76, in __iter__
    self.preload()
  File "D:\xxx\pytorch_segmentation\base\base_dataloader.py", line 64, in preload
    self.next_input, self.next_target = next(self.loaditer)
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\utils\data\dataloader.py", line 582, in __next__
    return self._process_next_batch(batch)
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\utils\data\dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
  File "C:\xxx\AppData\Roaming\Python\Python37\site-packages\PIL\Image.py", line 2533, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 512), '|u1')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\utils\data\_utils\worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\utils\data\_utils\worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "D:\xxx\pytorch_segmentation\base\base_dataset.py", line 141, in __getitem__
    image = Image.fromarray(np.uint8(image))
  File "C:\xxx\AppData\Roaming\Python\Python37\site-packages\PIL\Image.py", line 2535, in fromarray
    raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

I guess I wasn't clear, for images, the default shape of WHC when loading the image in the form of numpy array is correct, the conversion to CHW is done automatically by pytorch, so for the image, no need to apply a reshape in the _load_data, the reshaping is done in base_dataset when calling self.to_tensor(image)

If you still have problems, you can share your custom dataset & the dbDataUtils script that loads the data and I'll take a quick look.

from pytorch-segmentation.

VladislavAD avatar VladislavAD commented on September 12, 2024

I finally made it train but the result is questionable.

TRAIN (1) | Loss: 0.000 | Acc 1.00 mIoU 1.00 | B 0.95 D 0.13 |: 100%|█████████████████████████████| 50/50 [00:47<00:00,  1.43it/s]


TRAIN (2) | Loss: 0.000 | Acc 1.00 mIoU 1.00 | B 0.67 D 0.13 |: 100%|█████████████████████████████| 50/50 [00:33<00:00,  1.44it/s]


TRAIN (3) | Loss: 0.000 | Acc 1.00 mIoU 1.00 | B 0.67 D 0.13 |: 100%|█████████████████████████████| 50/50 [00:33<00:00,  1.45it/s]


TRAIN (4) | Loss: 0.000 | Acc 1.00 mIoU 1.00 | B 0.67 D 0.13 |: 100%|█████████████████████████████| 50/50 [00:33<00:00,  1.44it/s]


TRAIN (5) | Loss: 0.000 | Acc 1.00 mIoU 1.00 | B 0.67 D 0.13 |: 100%|█████████████████████████████| 50/50 [00:33<00:00,  1.44it/s]

###### EVALUATION ######
EVAL (5) | Loss: 0.000, PixelAcc: 1.00, Mean IoU: 1.00 |: 100%|███████████████████████████████████| 50/50 [00:19<00:00,  2.87it/s]

         ## Info for epoch 5 ##
         val_loss       : 0.0
         Pixel_Accuracy : 1.0
         Mean_IoU       : 1.0
         Class_IoU      : {0: 1.0}

Loss, accuracy and mean IoU looks like something is still wrong. What does B and D mean? Do I need to specify "empty" class?

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

Yeah, there is definitely some problems with your training data, maybe the images or/and labels are all zeros, you try to create a dataloader and fetch some examples with their labels and try visualizing them to make sure the data loading works correctly.

from pytorch-segmentation.

VladislavAD avatar VladislavAD commented on September 12, 2024

I think I finally made it:

###### EVALUATION ######
EVAL (80) | Loss: 0.007, PixelAcc: 1.00, Mean IoU: 0.74 |: 100%|██████████████████████████████████| 50/50 [00:19<00:00,  2.88it/s]

         ## Info for epoch 80 ##
         val_loss       : 0.00692
         Pixel_Accuracy : 0.998
         Mean_IoU       : 0.741
         Class_IoU      : {0: 0.998, 1: 0.485}

There were two problems with data: argmax returned all zeros and I need to setup two classes instead of 1 (for my single class and for not-my-class I guess).
The remain questions are:

  • pixel accuracy is always around 1.0, what conditions this value requare to show some difference?

  • how can I get all possible metrics/losses/models/backbones names to apply in config json? I tried several approaches like filenames but have no luck. It would be good to have full list of them.

from pytorch-segmentation.

VladislavAD avatar VladislavAD commented on September 12, 2024

I tried to predict using inference script and got lots of errors.
First of all, assertion assert dataset_type in ['VOC', 'COCO', 'CityScapes', 'ADE20K'] fails with my custom dataset, that's obvious so I commented it out. Next is issue with cpu/cuda. I got this error:

Traceback (most recent call last):
  File "inference.py", line 166, in <module>
    main()
  File "inference.py", line 124, in main
    model.load_state_dict(checkpoint)
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\nn\modules\module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
        Missing key(s) in state_dict: "module.initial.0.0.weight", "module.initial.0.1.weight", "module.initial.0.1.bias", "module.initial.0.1.running_mean", "module.initial.0.1.running_var", "module.initial.0.3.weight", "module.initial.0.4.weight", "module.initial.0.4.bias", "module.initial.0.4.running_mean", "module.initial.0.4.running_var", "module.initial.0.6.weight", "module.initial.1.weight", "module.initial.1.bias", "module.initial.1.running_mean", "module.initial.1.running_var", "module.layer1.0.conv1.weight", "module.layer1.0.bn1.weight", "module.layer1.0.bn1.bias", "module.layer1.0.bn1.running_mean", "module.layer1.0.bn1.running_var", "module.layer1.0.conv2.weight", "module.layer1.0.bn2.weight", "module.layer1.0.bn2.bias", "module.layer1.0.bn2.running_mean", "module.layer1.0.bn2.running_var", "module.layer1.0.conv3.weight", "module.layer1.0.bn3.weight", "module.layer1.0.bn3.bias", "module.layer1.0.bn3.running_mean", "module.layer1.0.bn3.running_var",
"module.layer1.0.downsample.0.weight", "module.layer1.0.downsample.1.weight", "module.layer1.0.downsample.1.bias", "module.layer1.0.downsample.1.running_mean", "module.layer1.0.downsample.1.running_var", "module.layer1.1.conv1.weight", "module.layer1.1.bn1.weight", "module.layer1.1.bn1.bias", "module.layer1.1.bn1.running_mean", "module.layer1.1.bn1.running_var", "module.layer1.1.conv2.weight", "module.layer1.1.bn2.weight", "module.layer1.1.bn2.bias", "module.layer1.1.bn2.running_mean", "module.layer1.1.bn2.running_var", "module.layer1.1.conv3.weight", "module.layer1.1.bn3.weight", "module.layer1.1.bn3.bias", "module.layer1.1.bn3.running_mean", "module.layer1.1.bn3.running_var", "module.layer1.2.conv1.weight", "module.layer1.2.bn1.weight", "module.layer1.2.bn1.bias", "module.layer1.2.bn1.running_mean", "module.layer1.2.bn1.running_var", "module.layer1.2.conv2.weight", "module.layer1.2.bn2.weight", "module.layer1.2.bn2.bias", "module.layer1.2.bn2.running_mean", "module.layer1.2.bn2.running_var", "module.layer1.2.conv3.weight", "module.layer1.2.bn3.weight", "module.layer1.2.bn3.bias", "module.layer1.2.bn3.running_mean", "module.layer1.2.bn3.running_var",
"module.layer2.0.conv1.weight", "module.layer2.0.bn1.weight", "module.layer2.0.bn1.bias", "module.layer2.0.bn1.running_mean", "module.layer2.0.bn1.running_var", "module.layer2.0.conv2.weight", "module.layer2.0.bn2.weight", "module.layer2.0.bn2.bias", "module.layer2.0.bn2.running_mean", "module.layer2.0.bn2.running_var", "module.layer2.0.conv3.weight", "module.layer2.0.bn3.weight", "module.layer2.0.bn3.bias", "module.layer2.0.bn3.running_mean", "module.layer2.0.bn3.running_var", "module.layer2.0.downsample.0.weight", "module.layer2.0.downsample.1.weight", "module.layer2.0.downsample.1.bias", "module.layer2.0.downsample.1.running_mean", "module.layer2.0.downsample.1.running_var", "module.layer2.1.conv1.weight", "module.layer2.1.bn1.weight", "module.layer2.1.bn1.bias", "module.layer2.1.bn1.running_mean", "module.layer2.1.bn1.running_var", "module.layer2.1.conv2.weight", "module.layer2.1.bn2.weight", "module.layer2.1.bn2.bias", "module.layer2.1.bn2.running_mean", "module.layer2.1.bn2.running_var", "module.layer2.1.conv3.weight", "module.layer2.1.bn3.weight", "module.layer2.1.bn3.bias", "module.layer2.1.bn3.running_mean", "module.layer2.1.bn3.running_var",
"module.layer2.2.conv1.weight", "module.layer2.2.bn1.weight", "module.layer2.2.bn1.bias", "module.layer2.2.bn1.running_mean", "module.layer2.2.bn1.running_var", "module.layer2.2.conv2.weight", "module.layer2.2.bn2.weight", "module.layer2.2.bn2.bias", "module.layer2.2.bn2.running_mean", "module.layer2.2.bn2.running_var", "module.layer2.2.conv3.weight", "module.layer2.2.bn3.weight", "module.layer2.2.bn3.bias", "module.layer2.2.bn3.running_mean", "module.layer2.2.bn3.running_var", "module.layer2.3.conv1.weight", "module.layer2.3.bn1.weight", "module.layer2.3.bn1.bias", "module.layer2.3.bn1.running_mean", "module.layer2.3.bn1.running_var", "module.layer2.3.conv2.weight", "module.layer2.3.bn2.weight", "module.layer2.3.bn2.bias", "module.layer2.3.bn2.running_mean", "module.layer2.3.bn2.running_var", "module.layer2.3.conv3.weight", "module.layer2.3.bn3.weight", "module.layer2.3.bn3.bias", "module.layer2.3.bn3.running_mean", "module.layer2.3.bn3.running_var", "module.layer3.0.conv1.weight", "module.layer3.0.bn1.weight", "module.layer3.0.bn1.bias", "module.layer3.0.bn1.running_mean", "module.layer3.0.bn1.running_var", "module.layer3.0.conv2.weight", "module.layer3.0.bn2.weight", "module.layer3.0.bn2.bias", "module.layer3.0.bn2.running_mean", "module.layer3.0.bn2.running_var", "module.layer3.0.conv3.weight", "module.layer3.0.bn3.weight", "module.layer3.0.bn3.bias", "module.layer3.0.bn3.running_mean", "module.layer3.0.bn3.running_var", "module.layer3.0.downsample.0.weight", "module.layer3.0.downsample.1.weight", "module.layer3.0.downsample.1.bias", "module.layer3.0.downsample.1.running_mean", "module.layer3.0.downsample.1.running_var", "module.layer3.1.conv1.weight", "module.layer3.1.bn1.weight", "module.layer3.1.bn1.bias", "module.layer3.1.bn1.running_mean", "module.layer3.1.bn1.running_var", "module.layer3.1.conv2.weight", "module.layer3.1.bn2.weight", "module.layer3.1.bn2.bias", "module.layer3.1.bn2.running_mean", "module.layer3.1.bn2.running_var", "module.layer3.1.conv3.weight", "module.layer3.1.bn3.weight", "module.layer3.1.bn3.bias", "module.layer3.1.bn3.running_mean", "module.layer3.1.bn3.running_var", "module.layer3.2.conv1.weight", "module.layer3.2.bn1.weight", "module.layer3.2.bn1.bias", "module.layer3.2.bn1.running_mean", "module.layer3.2.bn1.running_var", "module.layer3.2.conv2.weight", "module.layer3.2.bn2.weight", "module.layer3.2.bn2.bias", "module.layer3.2.bn2.running_mean", "module.layer3.2.bn2.running_var", "module.layer3.2.conv3.weight", "module.layer3.2.bn3.weight", "module.layer3.2.bn3.bias", "module.layer3.2.bn3.running_mean", "module.layer3.2.bn3.running_var", "module.layer3.3.conv1.weight", "module.layer3.3.bn1.weight", "module.layer3.3.bn1.bias", "module.layer3.3.bn1.running_mean", "module.layer3.3.bn1.running_var", "module.layer3.3.conv2.weight", "module.layer3.3.bn2.weight", "module.layer3.3.bn2.bias", "module.layer3.3.bn2.running_mean", "module.layer3.3.bn2.running_var", "module.layer3.3.conv3.weight", "module.layer3.3.bn3.weight", "module.layer3.3.bn3.bias", "module.layer3.3.bn3.running_mean", "module.layer3.3.bn3.running_var", "module.layer3.4.conv1.weight", "module.layer3.4.bn1.weight", "module.layer3.4.bn1.bias", "module.layer3.4.bn1.running_mean", "module.layer3.4.bn1.running_var", "module.layer3.4.conv2.weight", "module.layer3.4.bn2.weight", "module.layer3.4.bn2.bias", "module.layer3.4.bn2.running_mean", "module.layer3.4.bn2.running_var", "module.layer3.4.conv3.weight", "module.layer3.4.bn3.weight", "module.layer3.4.bn3.bias", "module.layer3.4.bn3.running_mean", "module.layer3.4.bn3.running_var", "module.layer3.5.conv1.weight", "module.layer3.5.bn1.weight", "module.layer3.5.bn1.bias", "module.layer3.5.bn1.running_mean", "module.layer3.5.bn1.running_var", "module.layer3.5.conv2.weight", "module.layer3.5.bn2.weight", "module.layer3.5.bn2.bias", "module.layer3.5.bn2.running_mean", "module.layer3.5.bn2.running_var", "module.layer3.5.conv3.weight", "module.layer3.5.bn3.weight", "module.layer3.5.bn3.bias", "module.layer3.5.bn3.running_mean", "module.layer3.5.bn3.running_var", "module.layer4.0.conv1.weight", "module.layer4.0.bn1.weight", "module.layer4.0.bn1.bias", "module.layer4.0.bn1.running_mean", "module.layer4.0.bn1.running_var", "module.layer4.0.conv2.weight", "module.layer4.0.bn2.weight", "module.layer4.0.bn2.bias", "module.layer4.0.bn2.running_mean", "module.layer4.0.bn2.running_var", "module.layer4.0.conv3.weight", "module.layer4.0.bn3.weight", "module.layer4.0.bn3.bias", "module.layer4.0.bn3.running_mean", "module.layer4.0.bn3.running_var", "module.layer4.0.downsample.0.weight", "module.layer4.0.downsample.1.weight", "module.layer4.0.downsample.1.bias", "module.layer4.0.downsample.1.running_mean", "module.layer4.0.downsample.1.running_var", "module.layer4.1.conv1.weight", "module.layer4.1.bn1.weight", "module.layer4.1.bn1.bias", "module.layer4.1.bn1.running_mean", "module.layer4.1.bn1.running_var", "module.layer4.1.conv2.weight", "module.layer4.1.bn2.weight", "module.layer4.1.bn2.bias", "module.layer4.1.bn2.running_mean", "module.layer4.1.bn2.running_var", "module.layer4.1.conv3.weight", "module.layer4.1.bn3.weight", "module.layer4.1.bn3.bias", "module.layer4.1.bn3.running_mean", "module.layer4.1.bn3.running_var", "module.layer4.2.conv1.weight", "module.layer4.2.bn1.weight", "module.layer4.2.bn1.bias", "module.layer4.2.bn1.running_mean", "module.layer4.2.bn1.running_var", "module.layer4.2.conv2.weight", "module.layer4.2.bn2.weight", "module.layer4.2.bn2.bias", "module.layer4.2.bn2.running_mean", "module.layer4.2.bn2.running_var", "module.layer4.2.conv3.weight", "module.layer4.2.bn3.weight", "module.layer4.2.bn3.bias", "module.layer4.2.bn3.running_mean", "module.layer4.2.bn3.running_var", "module.master_branch.0.stages.0.1.weight", "module.master_branch.0.stages.0.2.weight", "module.master_branch.0.stages.0.2.bias", "module.master_branch.0.stages.0.2.running_mean", "module.master_branch.0.stages.0.2.running_var", "module.master_branch.0.stages.1.1.weight", "module.master_branch.0.stages.1.2.weight", "module.master_branch.0.stages.1.2.bias", "module.master_branch.0.stages.1.2.running_mean", "module.master_branch.0.stages.1.2.running_var", "module.master_branch.0.stages.2.1.weight", "module.master_branch.0.stages.2.2.weight", "module.master_branch.0.stages.2.2.bias", "module.master_branch.0.stages.2.2.running_mean", "module.master_branch.0.stages.2.2.running_var", "module.master_branch.0.stages.3.1.weight", "module.master_branch.0.stages.3.2.weight", "module.master_branch.0.stages.3.2.bias", "module.master_branch.0.stages.3.2.running_mean", "module.master_branch.0.stages.3.2.running_var", "module.master_branch.0.bottleneck.0.weight", "module.master_branch.0.bottleneck.1.weight", "module.master_branch.0.bottleneck.1.bias", "module.master_branch.0.bottleneck.1.running_mean",
"module.master_branch.0.bottleneck.1.running_var", "module.master_branch.1.weight", "module.master_branch.1.bias", "module.auxiliary_branch.0.weight", "module.auxiliary_branch.1.weight", "module.auxiliary_branch.1.bias", "module.auxiliary_branch.1.running_mean", "module.auxiliary_branch.1.running_var", "module.auxiliary_branch.4.weight", "module.auxiliary_branch.4.bias".
        Unexpected key(s) in state_dict: "initial.0.0.weight", "initial.0.1.weight", "initial.0.1.bias", "initial.0.1.running_mean", "initial.0.1.running_var", "initial.0.1.num_batches_tracked",
"initial.0.3.weight", "initial.0.4.weight", "initial.0.4.bias", "initial.0.4.running_mean", "initial.0.4.running_var", "initial.0.4.num_batches_tracked", "initial.0.6.weight", "initial.1.weight", "initial.1.bias", "initial.1.running_mean", "initial.1.running_var", "initial.1.num_batches_tracked", "layer1.0.conv1.weight", "layer1.0.bn1.weight", "layer1.0.bn1.bias", "layer1.0.bn1.running_mean", "layer1.0.bn1.running_var", "layer1.0.bn1.num_batches_tracked", "layer1.0.conv2.weight", "layer1.0.bn2.weight", "layer1.0.bn2.bias", "layer1.0.bn2.running_mean", "layer1.0.bn2.running_var", "layer1.0.bn2.num_batches_tracked", "layer1.0.conv3.weight", "layer1.0.bn3.weight", "layer1.0.bn3.bias", "layer1.0.bn3.running_mean", "layer1.0.bn3.running_var", "layer1.0.bn3.num_batches_tracked", "layer1.0.downsample.0.weight", "layer1.0.downsample.1.weight", "layer1.0.downsample.1.bias", "layer1.0.downsample.1.running_mean", "layer1.0.downsample.1.running_var", "layer1.0.downsample.1.num_batches_tracked", "layer1.1.conv1.weight", "layer1.1.bn1.weight", "layer1.1.bn1.bias", "layer1.1.bn1.running_mean", "layer1.1.bn1.running_var", "layer1.1.bn1.num_batches_tracked", "layer1.1.conv2.weight", "layer1.1.bn2.weight", "layer1.1.bn2.bias", "layer1.1.bn2.running_mean", "layer1.1.bn2.running_var", "layer1.1.bn2.num_batches_tracked", "layer1.1.conv3.weight", "layer1.1.bn3.weight", "layer1.1.bn3.bias", "layer1.1.bn3.running_mean", "layer1.1.bn3.running_var", "layer1.1.bn3.num_batches_tracked", "layer1.2.conv1.weight", "layer1.2.bn1.weight", "layer1.2.bn1.bias", "layer1.2.bn1.running_mean", "layer1.2.bn1.running_var", "layer1.2.bn1.num_batches_tracked", "layer1.2.conv2.weight", "layer1.2.bn2.weight", "layer1.2.bn2.bias", "layer1.2.bn2.running_mean", "layer1.2.bn2.running_var", "layer1.2.bn2.num_batches_tracked", "layer1.2.conv3.weight", "layer1.2.bn3.weight", "layer1.2.bn3.bias", "layer1.2.bn3.running_mean", "layer1.2.bn3.running_var", "layer1.2.bn3.num_batches_tracked", "layer2.0.conv1.weight", "layer2.0.bn1.weight", "layer2.0.bn1.bias", "layer2.0.bn1.running_mean", "layer2.0.bn1.running_var", "layer2.0.bn1.num_batches_tracked", "layer2.0.conv2.weight", "layer2.0.bn2.weight", "layer2.0.bn2.bias", "layer2.0.bn2.running_mean", "layer2.0.bn2.running_var", "layer2.0.bn2.num_batches_tracked", "layer2.0.conv3.weight", "layer2.0.bn3.weight", "layer2.0.bn3.bias", "layer2.0.bn3.running_mean", "layer2.0.bn3.running_var", "layer2.0.bn3.num_batches_tracked", "layer2.0.downsample.0.weight", "layer2.0.downsample.1.weight", "layer2.0.downsample.1.bias", "layer2.0.downsample.1.running_mean", "layer2.0.downsample.1.running_var", "layer2.0.downsample.1.num_batches_tracked", "layer2.1.conv1.weight", "layer2.1.bn1.weight", "layer2.1.bn1.bias", "layer2.1.bn1.running_mean", "layer2.1.bn1.running_var", "layer2.1.bn1.num_batches_tracked", "layer2.1.conv2.weight", "layer2.1.bn2.weight", "layer2.1.bn2.bias", "layer2.1.bn2.running_mean", "layer2.1.bn2.running_var", "layer2.1.bn2.num_batches_tracked", "layer2.1.conv3.weight", "layer2.1.bn3.weight", "layer2.1.bn3.bias", "layer2.1.bn3.running_mean", "layer2.1.bn3.running_var",
"layer2.1.bn3.num_batches_tracked", "layer2.2.conv1.weight", "layer2.2.bn1.weight", "layer2.2.bn1.bias", "layer2.2.bn1.running_mean", "layer2.2.bn1.running_var", "layer2.2.bn1.num_batches_tracked", "layer2.2.conv2.weight", "layer2.2.bn2.weight", "layer2.2.bn2.bias", "layer2.2.bn2.running_mean", "layer2.2.bn2.running_var", "layer2.2.bn2.num_batches_tracked", "layer2.2.conv3.weight", "layer2.2.bn3.weight", "layer2.2.bn3.bias", "layer2.2.bn3.running_mean", "layer2.2.bn3.running_var", "layer2.2.bn3.num_batches_tracked", "layer2.3.conv1.weight", "layer2.3.bn1.weight", "layer2.3.bn1.bias", "layer2.3.bn1.running_mean", "layer2.3.bn1.running_var", "layer2.3.bn1.num_batches_tracked", "layer2.3.conv2.weight", "layer2.3.bn2.weight", "layer2.3.bn2.bias", "layer2.3.bn2.running_mean", "layer2.3.bn2.running_var", "layer2.3.bn2.num_batches_tracked", "layer2.3.conv3.weight", "layer2.3.bn3.weight", "layer2.3.bn3.bias", "layer2.3.bn3.running_mean", "layer2.3.bn3.running_var", "layer2.3.bn3.num_batches_tracked", "layer3.0.conv1.weight", "layer3.0.bn1.weight", "layer3.0.bn1.bias", "layer3.0.bn1.running_mean", "layer3.0.bn1.running_var", "layer3.0.bn1.num_batches_tracked",
"layer3.0.conv2.weight", "layer3.0.bn2.weight", "layer3.0.bn2.bias", "layer3.0.bn2.running_mean", "layer3.0.bn2.running_var", "layer3.0.bn2.num_batches_tracked", "layer3.0.conv3.weight", "layer3.0.bn3.weight", "layer3.0.bn3.bias", "layer3.0.bn3.running_mean", "layer3.0.bn3.running_var", "layer3.0.bn3.num_batches_tracked", "layer3.0.downsample.0.weight", "layer3.0.downsample.1.weight", "layer3.0.downsample.1.bias", "layer3.0.downsample.1.running_mean", "layer3.0.downsample.1.running_var", "layer3.0.downsample.1.num_batches_tracked", "layer3.1.conv1.weight", "layer3.1.bn1.weight", "layer3.1.bn1.bias", "layer3.1.bn1.running_mean", "layer3.1.bn1.running_var", "layer3.1.bn1.num_batches_tracked", "layer3.1.conv2.weight", "layer3.1.bn2.weight", "layer3.1.bn2.bias", "layer3.1.bn2.running_mean", "layer3.1.bn2.running_var", "layer3.1.bn2.num_batches_tracked", "layer3.1.conv3.weight", "layer3.1.bn3.weight", "layer3.1.bn3.bias", "layer3.1.bn3.running_mean", "layer3.1.bn3.running_var", "layer3.1.bn3.num_batches_tracked", "layer3.2.conv1.weight", "layer3.2.bn1.weight", "layer3.2.bn1.bias", "layer3.2.bn1.running_mean", "layer3.2.bn1.running_var", "layer3.2.bn1.num_batches_tracked", "layer3.2.conv2.weight", "layer3.2.bn2.weight", "layer3.2.bn2.bias", "layer3.2.bn2.running_mean", "layer3.2.bn2.running_var", "layer3.2.bn2.num_batches_tracked", "layer3.2.conv3.weight", "layer3.2.bn3.weight", "layer3.2.bn3.bias", "layer3.2.bn3.running_mean", "layer3.2.bn3.running_var", "layer3.2.bn3.num_batches_tracked", "layer3.3.conv1.weight", "layer3.3.bn1.weight", "layer3.3.bn1.bias", "layer3.3.bn1.running_mean", "layer3.3.bn1.running_var", "layer3.3.bn1.num_batches_tracked", "layer3.3.conv2.weight", "layer3.3.bn2.weight", "layer3.3.bn2.bias", "layer3.3.bn2.running_mean", "layer3.3.bn2.running_var", "layer3.3.bn2.num_batches_tracked", "layer3.3.conv3.weight", "layer3.3.bn3.weight", "layer3.3.bn3.bias", "layer3.3.bn3.running_mean", "layer3.3.bn3.running_var", "layer3.3.bn3.num_batches_tracked", "layer3.4.conv1.weight", "layer3.4.bn1.weight", "layer3.4.bn1.bias", "layer3.4.bn1.running_mean", "layer3.4.bn1.running_var", "layer3.4.bn1.num_batches_tracked", "layer3.4.conv2.weight", "layer3.4.bn2.weight", "layer3.4.bn2.bias", "layer3.4.bn2.running_mean", "layer3.4.bn2.running_var", "layer3.4.bn2.num_batches_tracked", "layer3.4.conv3.weight", "layer3.4.bn3.weight", "layer3.4.bn3.bias", "layer3.4.bn3.running_mean", "layer3.4.bn3.running_var", "layer3.4.bn3.num_batches_tracked", "layer3.5.conv1.weight", "layer3.5.bn1.weight", "layer3.5.bn1.bias", "layer3.5.bn1.running_mean", "layer3.5.bn1.running_var", "layer3.5.bn1.num_batches_tracked", "layer3.5.conv2.weight", "layer3.5.bn2.weight", "layer3.5.bn2.bias", "layer3.5.bn2.running_mean", "layer3.5.bn2.running_var", "layer3.5.bn2.num_batches_tracked", "layer3.5.conv3.weight", "layer3.5.bn3.weight", "layer3.5.bn3.bias", "layer3.5.bn3.running_mean", "layer3.5.bn3.running_var", "layer3.5.bn3.num_batches_tracked", "layer4.0.conv1.weight", "layer4.0.bn1.weight", "layer4.0.bn1.bias", "layer4.0.bn1.running_mean", "layer4.0.bn1.running_var", "layer4.0.bn1.num_batches_tracked", "layer4.0.conv2.weight", "layer4.0.bn2.weight", "layer4.0.bn2.bias", "layer4.0.bn2.running_mean", "layer4.0.bn2.running_var", "layer4.0.bn2.num_batches_tracked", "layer4.0.conv3.weight", "layer4.0.bn3.weight", "layer4.0.bn3.bias", "layer4.0.bn3.running_mean", "layer4.0.bn3.running_var", "layer4.0.bn3.num_batches_tracked", "layer4.0.downsample.0.weight", "layer4.0.downsample.1.weight", "layer4.0.downsample.1.bias", "layer4.0.downsample.1.running_mean", "layer4.0.downsample.1.running_var", "layer4.0.downsample.1.num_batches_tracked", "layer4.1.conv1.weight", "layer4.1.bn1.weight", "layer4.1.bn1.bias", "layer4.1.bn1.running_mean", "layer4.1.bn1.running_var", "layer4.1.bn1.num_batches_tracked", "layer4.1.conv2.weight", "layer4.1.bn2.weight", "layer4.1.bn2.bias", "layer4.1.bn2.running_mean", "layer4.1.bn2.running_var", "layer4.1.bn2.num_batches_tracked", "layer4.1.conv3.weight", "layer4.1.bn3.weight", "layer4.1.bn3.bias", "layer4.1.bn3.running_mean", "layer4.1.bn3.running_var", "layer4.1.bn3.num_batches_tracked", "layer4.2.conv1.weight", "layer4.2.bn1.weight", "layer4.2.bn1.bias", "layer4.2.bn1.running_mean", "layer4.2.bn1.running_var", "layer4.2.bn1.num_batches_tracked", "layer4.2.conv2.weight", "layer4.2.bn2.weight", "layer4.2.bn2.bias", "layer4.2.bn2.running_mean", "layer4.2.bn2.running_var", "layer4.2.bn2.num_batches_tracked", "layer4.2.conv3.weight", "layer4.2.bn3.weight", "layer4.2.bn3.bias", "layer4.2.bn3.running_mean", "layer4.2.bn3.running_var", "layer4.2.bn3.num_batches_tracked", "master_branch.0.stages.0.1.weight", "master_branch.0.stages.0.2.weight", "master_branch.0.stages.0.2.bias", "master_branch.0.stages.0.2.running_mean", "master_branch.0.stages.0.2.running_var", "master_branch.0.stages.0.2.num_batches_tracked", "master_branch.0.stages.1.1.weight", "master_branch.0.stages.1.2.weight", "master_branch.0.stages.1.2.bias", "master_branch.0.stages.1.2.running_mean", "master_branch.0.stages.1.2.running_var", "master_branch.0.stages.1.2.num_batches_tracked", "master_branch.0.stages.2.1.weight", "master_branch.0.stages.2.2.weight", "master_branch.0.stages.2.2.bias", "master_branch.0.stages.2.2.running_mean", "master_branch.0.stages.2.2.running_var", "master_branch.0.stages.2.2.num_batches_tracked", "master_branch.0.stages.3.1.weight", "master_branch.0.stages.3.2.weight", "master_branch.0.stages.3.2.bias", "master_branch.0.stages.3.2.running_mean", "master_branch.0.stages.3.2.running_var", "master_branch.0.stages.3.2.num_batches_tracked", "master_branch.0.bottleneck.0.weight",
"master_branch.0.bottleneck.1.weight", "master_branch.0.bottleneck.1.bias", "master_branch.0.bottleneck.1.running_mean", "master_branch.0.bottleneck.1.running_var", "master_branch.0.bottleneck.1.num_batches_tracked", "master_branch.1.weight", "master_branch.1.bias", "auxiliary_branch.0.weight", "auxiliary_branch.1.weight", "auxiliary_branch.1.bias", "auxiliary_branch.1.running_mean", "auxiliary_branch.1.running_var", "auxiliary_branch.1.num_batches_tracked", "auxiliary_branch.4.weight", "auxiliary_branch.4.bias".

I have two gpu (1080 and 750ti) but if I force to use single gpu this error remains.

P.S. Readme file has predict.py instead of inference.py

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024
  • For B and D, these are measuring the elapsed time to create one batch of data (B) and the time needed to do one training iteration (D).
  • For the the possible losses/models/backbones: it refers to the name of the classes, for the losses, in utils.losses.py: CrossEntropyLoss2d - DiceLoss - FocalLoss ... for models you can find all the possible choices in models/__init__.py, fot backbones, if the model uses resnet, you can use any type of resent (resnet18, resnet34, resnet50, resnet101, resnet152), and the same for the orher backbones (these ones are used from either models\resnet.py for renset backbones or directly from torchvision.models for the other types like VGG).
  • For pixel accuracy, it is generally not a good measure for segmentation, it can be easily influenced by the most common class (like the background) and can provide misleading results, in your case you only have two classes, one of which is background and the other is small in size compared to bg, so if the model only predicts bg, you'll get a good accuracy, better focus on mIoU, and especially mIoU of the second class.
  • For the inference, I think your trained model is not of type nn.Dataparallel (trained on a single gpu), in this case can you comment lines 115 and 116 and try again.

from pytorch-segmentation.

VladislavAD avatar VladislavAD commented on September 12, 2024

Thanks for explanation.
I comment lines 115, 116, 117 and got this error:

  0%|                                                                        | 0/10 [00:00<?, ?it/s]Traceback (most recent call last):
  File "inference.py", line 166, in <module>
    main()
  File "inference.py", line 144, in main
    prediction = model(input).squeeze(0).cuda().numpy()
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\xxx\pytorch_segmentation\models\pspnet.py", line 80, in forward
    x = self.initial(x)
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\xxx\.conda\envs\xxx\lib\site-packages\torch\nn\modules\conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'weight'

I tried to replace cpu() with cuda() in lines 144 and 145:

                   prediction = model(input).squeeze(0).cpu().numpy()
            prediction = F.softmax(torch.from_numpy(prediction), dim=0).argmax(0).cpu().numpy()

but that not helped, the error was the same.

from pytorch-segmentation.

VladislavAD avatar VladislavAD commented on September 12, 2024

Now I'm trying to train different models and start with UNet. The problem is that trainer.py requires "freeze_bn" argument in line 40 if self.config['arch']['args']['freeze_bn']: and if there is no this argument I got error:

Traceback (most recent call last):
  File "train.py", line 61, in <module>
    main(config, args.resume)
  File "train.py", line 42, in main
    trainer.train()
  File "D:\darnostup\deep learning\pytorch_segmentation\base\base_trainer.py", line 98, in train
    results = self._train_epoch(epoch)
  File "D:\darnostup\deep learning\pytorch_segmentation\trainer.py", line 40, in _train_epoch
    if self.config['arch']['args']['freeze_bn']:
KeyError: 'freeze_bn'

but if I add it to config.json I got another error:

Traceback (most recent call last):
  File "train.py", line 61, in <module>
    main(config, args.resume)
  File "train.py", line 26, in main
    model = get_instance(models, 'arch', config, train_loader.dataset.num_classes)
  File "train.py", line 16, in get_instance
    return getattr(module, config[name]['type'])(*args, **config[name]['args'])
TypeError: __init__() got an unexpected keyword argument 'freeze_bn'

As I understand, this parameter is not defined in UNet class and many other, but is checked in trainer.

from pytorch-segmentation.

yassouali avatar yassouali commented on September 12, 2024

Hi, for the second error, you're right, I forgot to add them for other models after I changed them for pspnet and deeplab, sorry for this. I just pushed the correct changes, so that if the passed args are not used in the model (like backbone for unet given that unet does not used an external backbone) they will be ignored.

For inference, I see that you're using the normal mode (better use -mo mulstiscale which is the default behavisou), in the normal mode the image needs to be sent to the GPU before forwarding it into the model, which I forgat to do :), I also pushed the changes for inference.

Hope I didn't forget some other cases.

from pytorch-segmentation.

VladislavAD avatar VladislavAD commented on September 12, 2024

I pulled changes and was able to train UNet, thanks. The problem with inference still remain. There is new issue dedicated to inference, so I'm moving discussion there. #4

from pytorch-segmentation.

rurubaobao avatar rurubaobao commented on September 12, 2024

Hello,I also have this error.I want to know how to solve it?
File "train.py", line 61, in
main(config, args.resume)
File "train.py", line 42, in main
trainer.train()
File "/data/fr/segall/base/base_trainer.py", line 98, in train
results = self._train_epoch(epoch)
File "/data/fr/segall/trainer.py", line 56, in _train_epoch
output = self.model(data)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/data/fr/segall/models/pspnet.py", line 86, in forward
output = self.master_branch(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/data/fr/segall/models/pspnet.py", line 36, in forward
align_corners=True) for stage in self.stages])
File "/data/fr/segall/models/pspnet.py", line 36, in
align_corners=True) for stage in self.stages])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 83, in forward
exponential_average_factor, self.eps)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1693, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])

from pytorch-segmentation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.