Git Product home page Git Product logo

light-weight-refinenet's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

light-weight-refinenet's Issues

Training error

Hi, @DrSleep , I am appreciate for your works, but when I directly run the scripts named train/nyu.sh, it run on three NVIDIA 1080ti cards but got blocked when evaluating the trained model after 16 epoches, I tried to change the value of num_workers in src/config.py from 16 to 1, it still occurred. But, when I set it to zero, it works, the training procedure would not be blocked, can you enplane it? Thanks a lot.

About NYUD label

I see you ignore label '255' in your code, and num_classes of NYUD is 40. So the label '0' means 'wall', and '39' means 'otherprop', right? Unlabeled class is '255'?

Reproducing PASCAL VOC

Hi I want to reproduce the results you have on PASCAL VOC by training on PASCAL. If you can guide me on the setup you used. So basically what was your config.py? I am assuming the config.py you're providing now is the one for NYUv2 training. Also if you can add the loader for PASCAL VOC.

Thanks

RuntimeError: Error(s) in loading state_dict for DataParallel

I am so sorry that I am a freshman for it, I just change NUM_CLASSES = [40] * 3 in this file(config.py) to NUM_CLASSES = [2] * 3 because of my train(val) datasets needed to be devided into 2 classes.

but it run,and error:

INFO:main: Loaded Segmenter 50, ImageNet-Pre-Trained=True, #PARAMS=27.31M
Traceback (most recent call last):
File "C:/Users/likun3/Documents/light-weight-refinenet-master/src/train.py", line 432, in
main()
File "C:/Users/likun3/Documents/light-weight-refinenet-master/src/train.py", line 367, in main
best_val, epoch_start = load_ckpt(args.ckpt_path, {'segmenter' : segmenter})
File "C:/Users/likun3/Documents/light-weight-refinenet-master/src/train.py", line 240, in load_ckpt
v.load_state_dict(ckpt[k])
File "D:\Soft\Anaconda_\envs\dp\lib\site-packages\torch\nn\modules\module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.clf_conv.weight: copying a param with shape torch.Size([40, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 256, 3, 3]).
size mismatch for module.clf_conv.bias: copying a param with shape torch.Size([40]) from checkpoint, the shape in current model is torch.Size([2]).

Could someone help me ? please~~~

Reproducing cityscapes results

Hi, thanks for your work. Can you share the batchsize you used, the input resolution and the crop resolution to achieve 72% with resnet-101?

Update: I cannot get past 33% mIoU using your training script, obviously something is wrong. I will give you some details on the training.

This is my config file:

SHORTER_SIDE = [1024] * 3
CROP_SIZE = [769] * 3
NORMALISE_PARAMS = [1./255, # SCALE
                    np.array([0.290, 0.328, 0.286]).reshape((1, 1, 3)), # MEAN
                    np.array([0.182, 0.186, 0.184]).reshape((1, 1, 3))] # STD
BATCH_SIZE = [6] * 3
NUM_WORKERS = 0
NUM_CLASSES = [19] * 3
LOW_SCALE = [0.5] * 3
HIGH_SCALE = [2.0] * 3
IGNORE_LABEL = 255

# ENCODER PARAMETERS
ENC = '101'
ENC_PRETRAINED = True  # pre-trained on ImageNet or randomly initialised

# GENERAL
EVALUATE = False
FREEZE_BN = [True] * 3
NUM_SEGM_EPOCHS = [100] * 3
PRINT_EVERY = 10
RANDOM_SEED = 42
SNAPSHOT_DIR = './ckpt/'
CKPT_PATH = './ckpt/checkpoint.pth.tar'
VAL_EVERY = [1] * 3 # how often to record validation scores

# OPTIMISERS' PARAMETERS
LR_ENC = [5e-4, 2.5e-4, 1e-4]  # TO FREEZE, PUT 0
LR_DEC = [5e-3, 2.5e-3, 1e-3]
MOM_ENC = [0.9] * 3 # TO FREEZE, PUT 0
MOM_DEC = [0.9] * 3
WD_ENC = [1e-5] * 3 # TO FREEZE, PUT 0
WD_DEC = [1e-5] * 3
OPTIM_DEC = 'sgd'

The dataset loader has not been changed. Do you have any idea on how to solve the problem?

evaluation on NYU looks horrible

Hi Vladimir,

Thanks for your great work and generous release. Your tutorial to reproduce the results is very detailed, and I appreciate it a lot.

I am trying to use your code in my research project. First I tried to re-train the network on NYUD with the provided code, the IoU is 0.419 which is fine. Then I tried to visualize the prediction as you did in the notebook. But the result looks pretty bad, and I don't know why the predicted mask image is green.

I tried evaluation with given weight file and the file I trained myself, the difference is huge as follows. The only difference I made in the code between two result figures is loading different weight file.

Do you have an idea why this could happen?

Thank you in advance!

Screen Shot 2019-04-18 at 10 58 36 PM

How to calculate the 'FLOPS' of the given model?

Hi, in this paper, you compare the 'FLOPs' of different models, but I am confused about that how to get the value of one models' FLOPs', is to use some scripts or what? Thank you very much.

How can I get the paper`s supplementary material?

Hi!
I am reproducing the MobilenetV2s performance in PascalVOC but suffer from some difficults(the models performance do not get better with training, the performance in val set is very bad--all classes(except background)`s iou is zero).So, could you please give more detailed info about the trainning strategy?
Thanks

model load error


RuntimeError Traceback (most recent call last)
in ()
9 models = dict()
10 for key,fun in six.iteritems(model_inits):
---> 11 net = fun(n_classes, pretrained=True).eval()
12 if has_cuda:
13 net = net.cuda()

/home/lc/work/light-weight-refinenet/models/resnet.py in rf_lw50(num_classes, pretrained, **kwargs)
241 key = 'rf_lw' + bname
242 url = models_urls[bname]
--> 243 model.load_state_dict(maybe_download(key, url))
244 return model
245

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
719 if len(error_msgs) > 0:
720 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 721 self.class.name, "\n\t".join(error_msgs)))
722
723 def parameters(self):

RuntimeError: Error(s) in loading state_dict for ResNetLW:
Unexpected key(s) in state_dict: "bn1.num_batches_tracked", "layer1.0.bn1.num_batches_tracked", "layer1.0.bn2.num_batches_tracked", "layer1.0.bn3.num_batches_tracked", "layer1.0.downsample.1.num_batches_tracked", "layer1.1.bn1.num_batches_tracked", "layer1.1.bn2.num_batches_tracked", "layer1.1.bn3.num_batches_tracked", "layer1.2.bn1.num_batches_tracked", "layer1.2.bn2.num_batches_tracked", "layer1.2.bn3.num_batches_tracked", "layer2.0.bn1.num_batches_tracked", "layer2.0.bn2.num_batches_tracked", "layer2.0.bn3.num_batches_tracked", "layer2.0.downsample.1.num_batches_tracked", "layer2.1.bn1.num_batches_tracked", "layer2.1.bn2.num_batches_tracked", "layer2.1.bn3.num_batches_tracked", "layer2.2.bn1.num_batches_tracked", "layer2.2.bn2.num_batches_tracked", "layer2.2.bn3.num_batches_tracked", "layer2.3.bn1.num_batches_tracked", "layer2.3.bn2.num_batches_tracked", "layer2.3.bn3.num_batches_tracked", "layer3.0.bn1.num_batches_tracked", "layer3.0.bn2.num_batches_tracked", "layer3.0.bn3.num_batches_tracked", "layer3.0.downsample.1.num_batches_tracked", "layer3.1.bn1.num_batches_tracked", "layer3.1.bn2.num_batches_tracked", "layer3.1.bn3.num_batches_tracked", "layer3.2.bn1.num_batches_tracked", "layer3.2.bn2.num_batches_tracked", "layer3.2.bn3.num_batches_tracked", "layer3.3.bn1.num_batches_tracked", "layer3.3.bn2.num_batches_tracked", "layer3.3.bn3.num_batches_tracked", "layer3.4.bn1.num_batches_tracked", "layer3.4.bn2.num_batches_tracked", "layer3.4.bn3.num_batches_tracked", "layer3.5.bn1.num_batches_tracked", "layer3.5.bn2.num_batches_tracked", "layer3.5.bn3.num_batches_tracked", "layer4.0.bn1.num_batches_tracked", "layer4.0.bn2.num_batches_tracked", "layer4.0.bn3.num_batches_tracked", "layer4.0.downsample.1.num_batches_tracked", "layer4.1.bn1.num_batches_tracked", "layer4.1.bn2.num_batches_tracked", "layer4.1.bn3.num_batches_tracked", "layer4.2.bn1.num_batches_tracked", "layer4.2.bn2.num_batches_tracked", "layer4.2.bn3.num_batches_tracked".

test code

thanks for your great job!
Is there a test code to get the predicted segmentation result?

train dataset

How is the training data set organized? The files you gave are the clipped original diagram and split diagram without annotation

What is the model of cmap?

What is the model of cmap?
for mname, mnet in six.iteritems(models):
print(mnet(img_inp).shape)
segm = cmap[segm.argmax(axis=2).astype(np.uint8)]

torch.Size([1, 40, 117, 157]) , 117 157 Is the length and width of the picture?
I want to draw the border of each category,But I still don't understand the output of the model rf_lw50

what about the highest IoU can arrive in PASCAL VOC in this repositority?

hello, very nice to meet this repositority.I think there are a little difference from paper in your code.I also writer a script to reproduce the paper but I can't make the val mIoU=80.3 test mIoU=82.0 , which are reported by author.so what's the best performance in this repositority.
thankyou for your answer.

try to run train.py meet error

thanks for your work!I want to run your code, but I meet a error,could you help me?thanks very much!

/home/robot/fangyu_pytorch/bin/python /home/robot/PycharmProjects/light-weight-refinenet_train/src/train.py --enc 50 INFO:__main__: Loaded Segmenter 50, ImageNet-Pre-Trained=True, #PARAMS=27.40M /home/robot/fangyu_pytorch/lib/python3.6/site-packages/torch/nn/modules/loss.py:206: UserWarning: NLLLoss2d has been deprecated. Please use NLLLoss instead as a drop-in replacement and see http://pytorch.org/docs/master/nn.html#torch.nn.NLLLoss for more details. warnings.warn("NLLLoss2d has been deprecated. " INFO:__main__: Training Process Starts INFO:__main__: Created train set = 795 examples, val set = 654 examples INFO:__main__: Training Stage 0 INFO:__main__: Enc. parameter: module.conv1.weight INFO:__main__: Enc. parameter: module.bn1.weight INFO:__main__: Enc. parameter: module.bn1.bias INFO:__main__: Enc. parameter: module.layer1.0.conv1.weight INFO:__main__: Enc. parameter: module.layer1.0.bn1.weight INFO:__main__: Enc. parameter: module.layer1.0.bn1.bias INFO:__main__: Enc. parameter: module.layer1.0.conv2.weight INFO:__main__: Enc. parameter: module.layer1.0.bn2.weight INFO:__main__: Enc. parameter: module.layer1.0.bn2.bias INFO:__main__: Enc. parameter: module.layer1.0.conv3.weight INFO:__main__: Enc. parameter: module.layer1.0.bn3.weight INFO:__main__: Enc. parameter: module.layer1.0.bn3.bias INFO:__main__: Enc. parameter: module.layer1.0.downsample.0.weight INFO:__main__: Enc. parameter: module.layer1.0.downsample.1.weight INFO:__main__: Enc. parameter: module.layer1.0.downsample.1.bias INFO:__main__: Enc. parameter: module.layer1.1.conv1.weight INFO:__main__: Enc. parameter: module.layer1.1.bn1.weight INFO:__main__: Enc. parameter: module.layer1.1.bn1.bias INFO:__main__: Enc. parameter: module.layer1.1.conv2.weight INFO:__main__: Enc. parameter: module.layer1.1.bn2.weight INFO:__main__: Enc. parameter: module.layer1.1.bn2.bias INFO:__main__: Enc. parameter: module.layer1.1.conv3.weight INFO:__main__: Enc. parameter: module.layer1.1.bn3.weight INFO:__main__: Enc. parameter: module.layer1.1.bn3.bias INFO:__main__: Enc. parameter: module.layer1.2.conv1.weight INFO:__main__: Enc. parameter: module.layer1.2.bn1.weight INFO:__main__: Enc. parameter: module.layer1.2.bn1.bias INFO:__main__: Enc. parameter: module.layer1.2.conv2.weight INFO:__main__: Enc. parameter: module.layer1.2.bn2.weight INFO:__main__: Enc. parameter: module.layer1.2.bn2.bias INFO:__main__: Enc. parameter: module.layer1.2.conv3.weight INFO:__main__: Enc. parameter: module.layer1.2.bn3.weight INFO:__main__: Enc. parameter: module.layer1.2.bn3.bias INFO:__main__: Enc. parameter: module.layer2.0.conv1.weight INFO:__main__: Enc. parameter: module.layer2.0.bn1.weight INFO:__main__: Enc. parameter: module.layer2.0.bn1.bias INFO:__main__: Enc. parameter: module.layer2.0.conv2.weight INFO:__main__: Enc. parameter: module.layer2.0.bn2.weight INFO:__main__: Enc. parameter: module.layer2.0.bn2.bias INFO:__main__: Enc. parameter: module.layer2.0.conv3.weight INFO:__main__: Enc. parameter: module.layer2.0.bn3.weight INFO:__main__: Enc. parameter: module.layer2.0.bn3.bias INFO:__main__: Enc. parameter: module.layer2.0.downsample.0.weight INFO:__main__: Enc. parameter: module.layer2.0.downsample.1.weight INFO:__main__: Enc. parameter: module.layer2.0.downsample.1.bias INFO:__main__: Enc. parameter: module.layer2.1.conv1.weight INFO:__main__: Enc. parameter: module.layer2.1.bn1.weight INFO:__main__: Enc. parameter: module.layer2.1.bn1.bias INFO:__main__: Enc. parameter: module.layer2.1.conv2.weight INFO:__main__: Enc. parameter: module.layer2.1.bn2.weight INFO:__main__: Enc. parameter: module.layer2.1.bn2.bias INFO:__main__: Enc. parameter: module.layer2.1.conv3.weight INFO:__main__: Enc. parameter: module.layer2.1.bn3.weight INFO:__main__: Enc. parameter: module.layer2.1.bn3.bias INFO:__main__: Enc. parameter: module.layer2.2.conv1.weight INFO:__main__: Enc. parameter: module.layer2.2.bn1.weight INFO:__main__: Enc. parameter: module.layer2.2.bn1.bias INFO:__main__: Enc. parameter: module.layer2.2.conv2.weight INFO:__main__: Enc. parameter: module.layer2.2.bn2.weight INFO:__main__: Enc. parameter: module.layer2.2.bn2.bias INFO:__main__: Enc. parameter: module.layer2.2.conv3.weight INFO:__main__: Enc. parameter: module.layer2.2.bn3.weight INFO:__main__: Enc. parameter: module.layer2.2.bn3.bias INFO:__main__: Enc. parameter: module.layer2.3.conv1.weight INFO:__main__: Enc. parameter: module.layer2.3.bn1.weight INFO:__main__: Enc. parameter: module.layer2.3.bn1.bias INFO:__main__: Enc. parameter: module.layer2.3.conv2.weight INFO:__main__: Enc. parameter: module.layer2.3.bn2.weight INFO:__main__: Enc. parameter: module.layer2.3.bn2.bias INFO:__main__: Enc. parameter: module.layer2.3.conv3.weight INFO:__main__: Enc. parameter: module.layer2.3.bn3.weight INFO:__main__: Enc. parameter: module.layer2.3.bn3.bias INFO:__main__: Enc. parameter: module.layer3.0.conv1.weight INFO:__main__: Enc. parameter: module.layer3.0.bn1.weight INFO:__main__: Enc. parameter: module.layer3.0.bn1.bias INFO:__main__: Enc. parameter: module.layer3.0.conv2.weight INFO:__main__: Enc. parameter: module.layer3.0.bn2.weight INFO:__main__: Enc. parameter: module.layer3.0.bn2.bias INFO:__main__: Enc. parameter: module.layer3.0.conv3.weight INFO:__main__: Enc. parameter: module.layer3.0.bn3.weight INFO:__main__: Enc. parameter: module.layer3.0.bn3.bias INFO:__main__: Enc. parameter: module.layer3.0.downsample.0.weight INFO:__main__: Enc. parameter: module.layer3.0.downsample.1.weight INFO:__main__: Enc. parameter: module.layer3.0.downsample.1.bias INFO:__main__: Enc. parameter: module.layer3.1.conv1.weight INFO:__main__: Enc. parameter: module.layer3.1.bn1.weight INFO:__main__: Enc. parameter: module.layer3.1.bn1.bias INFO:__main__: Enc. parameter: module.layer3.1.conv2.weight INFO:__main__: Enc. parameter: module.layer3.1.bn2.weight INFO:__main__: Enc. parameter: module.layer3.1.bn2.bias INFO:__main__: Enc. parameter: module.layer3.1.conv3.weight INFO:__main__: Enc. parameter: module.layer3.1.bn3.weight INFO:__main__: Enc. parameter: module.layer3.1.bn3.bias INFO:__main__: Enc. parameter: module.layer3.2.conv1.weight INFO:__main__: Enc. parameter: module.layer3.2.bn1.weight INFO:__main__: Enc. parameter: module.layer3.2.bn1.bias INFO:__main__: Enc. parameter: module.layer3.2.conv2.weight INFO:__main__: Enc. parameter: module.layer3.2.bn2.weight INFO:__main__: Enc. parameter: module.layer3.2.bn2.bias INFO:__main__: Enc. parameter: module.layer3.2.conv3.weight INFO:__main__: Enc. parameter: module.layer3.2.bn3.weight INFO:__main__: Enc. parameter: module.layer3.2.bn3.bias INFO:__main__: Enc. parameter: module.layer3.3.conv1.weight INFO:__main__: Enc. parameter: module.layer3.3.bn1.weight INFO:__main__: Enc. parameter: module.layer3.3.bn1.bias INFO:__main__: Enc. parameter: module.layer3.3.conv2.weight INFO:__main__: Enc. parameter: module.layer3.3.bn2.weight INFO:__main__: Enc. parameter: module.layer3.3.bn2.bias INFO:__main__: Enc. parameter: module.layer3.3.conv3.weight INFO:__main__: Enc. parameter: module.layer3.3.bn3.weight INFO:__main__: Enc. parameter: module.layer3.3.bn3.bias INFO:__main__: Enc. parameter: module.layer3.4.conv1.weight INFO:__main__: Enc. parameter: module.layer3.4.bn1.weight INFO:__main__: Enc. parameter: module.layer3.4.bn1.bias INFO:__main__: Enc. parameter: module.layer3.4.conv2.weight INFO:__main__: Enc. parameter: module.layer3.4.bn2.weight INFO:__main__: Enc. parameter: module.layer3.4.bn2.bias INFO:__main__: Enc. parameter: module.layer3.4.conv3.weight INFO:__main__: Enc. parameter: module.layer3.4.bn3.weight INFO:__main__: Enc. parameter: module.layer3.4.bn3.bias INFO:__main__: Enc. parameter: module.layer3.5.conv1.weight INFO:__main__: Enc. parameter: module.layer3.5.bn1.weight INFO:__main__: Enc. parameter: module.layer3.5.bn1.bias INFO:__main__: Enc. parameter: module.layer3.5.conv2.weight INFO:__main__: Enc. parameter: module.layer3.5.bn2.weight INFO:__main__: Enc. parameter: module.layer3.5.bn2.bias INFO:__main__: Enc. parameter: module.layer3.5.conv3.weight INFO:__main__: Enc. parameter: module.layer3.5.bn3.weight INFO:__main__: Enc. parameter: module.layer3.5.bn3.bias INFO:__main__: Enc. parameter: module.layer4.0.conv1.weight INFO:__main__: Enc. parameter: module.layer4.0.bn1.weight INFO:__main__: Enc. parameter: module.layer4.0.bn1.bias INFO:__main__: Enc. parameter: module.layer4.0.conv2.weight INFO:__main__: Enc. parameter: module.layer4.0.bn2.weight INFO:__main__: Enc. parameter: module.layer4.0.bn2.bias INFO:__main__: Enc. parameter: module.layer4.0.conv3.weight INFO:__main__: Enc. parameter: module.layer4.0.bn3.weight INFO:__main__: Enc. parameter: module.layer4.0.bn3.bias INFO:__main__: Enc. parameter: module.layer4.0.downsample.0.weight INFO:__main__: Enc. parameter: module.layer4.0.downsample.1.weight INFO:__main__: Enc. parameter: module.layer4.0.downsample.1.bias INFO:__main__: Enc. parameter: module.layer4.1.conv1.weight INFO:__main__: Enc. parameter: module.layer4.1.bn1.weight INFO:__main__: Enc. parameter: module.layer4.1.bn1.bias INFO:__main__: Enc. parameter: module.layer4.1.conv2.weight INFO:__main__: Enc. parameter: module.layer4.1.bn2.weight INFO:__main__: Enc. parameter: module.layer4.1.bn2.bias INFO:__main__: Enc. parameter: module.layer4.1.conv3.weight INFO:__main__: Enc. parameter: module.layer4.1.bn3.weight INFO:__main__: Enc. parameter: module.layer4.1.bn3.bias INFO:__main__: Enc. parameter: module.layer4.2.conv1.weight INFO:__main__: Enc. parameter: module.layer4.2.bn1.weight INFO:__main__: Enc. parameter: module.layer4.2.bn1.bias INFO:__main__: Enc. parameter: module.layer4.2.conv2.weight INFO:__main__: Enc. parameter: module.layer4.2.bn2.weight INFO:__main__: Enc. parameter: module.layer4.2.bn2.bias INFO:__main__: Enc. parameter: module.layer4.2.conv3.weight INFO:__main__: Enc. parameter: module.layer4.2.bn3.weight INFO:__main__: Enc. parameter: module.layer4.2.bn3.bias INFO:__main__: Dec. parameter: module.p_ims1d2_outl1_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_pool.0.1_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_pool.0.2_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_pool.0.3_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_pool.0.4_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_b3_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.p_ims1d2_outl2_dimred.weight INFO:__main__: Dec. parameter: module.adapt_stage2_b2_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_pool.0.1_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_pool.0.2_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_pool.0.3_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_pool.0.4_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_b3_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.p_ims1d2_outl3_dimred.weight INFO:__main__: Dec. parameter: module.adapt_stage3_b2_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_pool.0.1_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_pool.0.2_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_pool.0.3_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_pool.0.4_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_b3_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.p_ims1d2_outl4_dimred.weight INFO:__main__: Dec. parameter: module.adapt_stage4_b2_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g4_pool.0.1_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g4_pool.0.2_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g4_pool.0.3_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g4_pool.0.4_outvar_dimred.weight INFO:__main__: Dec. parameter: module.clf_conv.weight INFO:__main__: Dec. parameter: module.clf_conv.bias THCudaCheck FAIL file=/pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu line=266 error=59 : device-side assert triggered Traceback (most recent call last): File "/home/robot/PycharmProjects/light-weight-refinenet_train/src/train.py", line 422, in <module> main() File "/home/robot/PycharmProjects/light-weight-refinenet_train/src/train.py", line 406, in main epoch_start, segm_crit,args.freeze_bn[task_idx]) File "/home/robot/PycharmProjects/light-weight-refinenet_train/src/train.py", line 279, in train_segmenter optim_enc.step() File "/home/robot/fangyu_pytorch/lib/python3.6/site-packages/torch/optim/sgd.py", line 93, in step d_p.add_(weight_decay, p.data) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:266

_pickle.UnpicklingError: unpickling stack underflow

hello, i meet some problems now. when i try "To start training on NYU:" this part, i got "Connected to pydev debugger (build 182.3911.33)
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1664, in
main()
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/luo/Documents/code/light-weight-refinenet/src/train.py", line 435, in
main()
File "/Users/luo/Documents/code/light-weight-refinenet/src/train.py", line 364, in main
create_segmenter(args.enc, args.enc_pretrained, args.num_classes[0])
File "/Users/luo/Documents/code/light-weight-refinenet/src/train.py", line 136, in create_segmenter
return rf_lw50(num_classes, imagenet=pretrained)
File "/Users/luo/Documents/code/light-weight-refinenet/models/resnet.py", line 249, in rf_lw50
model.load_state_dict(maybe_download(key, url), strict=False)
File "/Users/luo/Documents/code/light-weight-refinenet/utils/helpers.py", line 22, in maybe_download
return torch.load(cached_file, map_location=map_location)
File "/Users/luo/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 358, in load
return _load(f, map_location, pickle_module)
File "/Users/luo/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 532, in _load
magic_number = pickle_module.load(f)
_pickle.UnpicklingError: unpickling stack underflow

Process finished with exit code 1
"
can you help me to solve this? thx

can you share your training skills?

Thanks for your work,It is very exciting.I try to train this network,But I only get 70 mIoU,and could you share your training skills?thanks very much!

Each validation, traing gets stuck.

INFO:main: Val epoch: 28 [500/654] Mean IoU: 0.329
INFO:main: Val epoch: 28 [510/654] Mean IoU: 0.333
INFO:main: Val epoch: 28 [520/654] Mean IoU: 0.334
INFO:main: Val epoch: 28 [530/654] Mean IoU: 0.334
INFO:main: Val epoch: 28 [540/654] Mean IoU: 0.334
INFO:main: Val epoch: 28 [550/654] Mean IoU: 0.336
INFO:main: Val epoch: 28 [560/654] Mean IoU: 0.337
INFO:main: Val epoch: 28 [570/654] Mean IoU: 0.337
INFO:main: Val epoch: 28 [580/654] Mean IoU: 0.337
INFO:main: Val epoch: 28 [590/654] Mean IoU: 0.337
INFO:main: Val epoch: 28 [600/654] Mean IoU: 0.338
INFO:main: Val epoch: 28 [610/654] Mean IoU: 0.338
INFO:main: Val epoch: 28 [620/654] Mean IoU: 0.339
INFO:main: Val epoch: 28 [630/654] Mean IoU: 0.340
INFO:main: Val epoch: 28 [640/654] Mean IoU: 0.341
INFO:main: Val epoch: 28 [650/654] Mean IoU: 0.340
INFO:main: IoUs: [0.71555132 0.79084598 0.38018856 0.58270573 0.49350641 0.53325564
0.33868351 0.24003405 0.35049824 0.38725722 0.53552238 0.43335014
0.52524775 0.13147147 0.06738203 0.45353059 0.10944007 0.35642416
0.13853911 0.24521033 0.2119736 0.52864194 0.25071423 0.31138197
0.40138153 0.23264673 0.31995535 0.16649904 0.06017227 0.3065639
0.61248457 0.18471847 0.68977884 0.36473194 0.31989985 0.28686686
0.0092722 0.17552748 0.12222387 0.28322877]
INFO:main: Val epoch: 28 Mean IoU: 0.341
saving
INFO:main: New best value 0.3412, was 0.3211
saving done
starting *********

Can you share your cityscapes results and fps?

hey! i am doing some experiments on semantic segmentation in street scenes. i found that you put the cityscapes result in your paper supplementary materials. can you share your cityscapes results and fps?

Could code work segmenting building footprints from aerials?

My real interest in semantic segmentation using RefineNet is to essentially replicate the workflow used by Microsoft to extract building footprints from aerials. Is the code you have posted suitable to be adapted to extract building footprints from aerials? I know that I would have to prepare aerial image and mask chips for the training and validation data that fit the code's input requirements, but are there other changes that would have to be made to this code or its inputs to make this work for the task of segmenting building footprints?

I realize this code does not account for several aspects of the problem I am trying to solve, such as the handling of the global coordinates of the aerial tiles, but I am hoping that this code could provide a starting point for training a RefineNet enhanced building footprint semantic segmentation model. If not, are you aware of any other sources of code that would be better suited to that task that you could share?

I try to convert .pth.tar to onnx file, please help me!!!

amax@amax:/data/yh/light-weight-refinenet$ python make_onnx.py
Traceback (most recent call last):
File "make_onnx.py", line 17, in
torch.onnx.export(segmenter,dummy_input,"yh_refinenet.onnx",verbose=True)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/init.py", line 25, in export
return utils.export(*args, **kwargs)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 131, in export
strip_doc_string=strip_doc_string)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 363, in _export
_retain_param_name, do_constant_folding)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 278, in _model_to_graph
_disable_torch_constant_prop=_disable_torch_constant_prop)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 183, in _optimize_graph
torch._C._jit_pass_lower_all_tuples(graph)
RuntimeError: tuple appears in op that does not forward tuples (VisitNode at /opt/conda/conda-bld/pytorch_1556653099582/work/torch/csrc/jit/passes/lower_tuples.cpp:117)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fd313353dc5 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0xadf7d0 (0x7fd30e4777d0 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #2: + 0xadfa34 (0x7fd30e477a34 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #3: torch::jit::LowerAllTuples(std::shared_ptrtorch::jit::Graph&) + 0x13 (0x7fd30e477a73 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #4: + 0x3f59a4 (0x7fd3426259a4 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x12ce4a (0x7fd34235ce4a in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #36: __libc_start_main + 0xf0 (0x7fd349179830 in /lib/x86_64-linux-gnu/libc.so.6)

my python:

import torch
from models.resnet import rf_lw50
from torch.autograd import Variable

file = 'ckpt/checkpoint.pth.tar'

segmenter = rf_lw50(15, False)
segmenter = torch.nn.DataParallel(segmenter)

ckpt = torch.load(file)
segmenter.load_state_dict(ckpt['segmenter'])
segmenter.cuda()
segmenter.eval()

dummy_input = Variable(torch.randn(1,3,360,640)).cuda()

torch.onnx.export(segmenter,dummy_input,"yh_refinenet.onnx",verbose=True)

PASCAL Person-Part dataset

I want to retrain on the PASCAL Person-Part dataset, could you please send me the Person-Part dataset, thank you

The mobilenetV2 produced abnormal results

I adapted the function create_segmenter in train.py, by adding a branch as follow:

 elif str(net) == 'mbv2':
        from models.mobilenet import mbv2
        return mbv2(num_classes, pretrained=pretrained)

I also adapted other necessary codes such as dataloader and then I ran the code using provided mbv2 on NYU and cityscapes. But I got very low mIoUs on both these datasets. The results were similar to follow and almost invariant during training:

IoUs: [0.233812, 0, 0, 0, 0, 0, 0, ...] 
Mean IoU: 0.006 

It seemed that the model classify all pixels as background.

I also ran the provided resnet101 and its results were normal. Is there something wrong with the code in mobilenet.py ?

python setup.py build_ext --build-lib=./src/的时候遇到错误

C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -ID:\Anaconda3.5\lib\site-packages\numpy\core\include -ID:\Anaconda3.5\include -ID:\Anaconda3.5\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\include" /TcF:\light-weight-refinenet-master/src/miou_utils.c /Fobuild\temp.win-amd64-3.6\Release\light-weight-refinenet-master/src/miou_utils.obj
miou_utils.c
d:\anaconda3.5\include\pyconfig.h(59): fatal error C1083: 无法打开包括文件: “io.h”: No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64\cl.exe' failed with exit status 2

About RuntimeError:CUDA out of memory

Hi,
Thanks for your wonderful work and detailed tutorial.
I am just a fresh new here, when I try to retrain the model, there will be RuntimeError. Then I set the Batch_Size (config.py) to [1] * 3, it also can't work. I wonder if you have ever met this problem?
Could you please help me?
Thanks in advance!

INFO:main: Train epoch: 0 [0/795] Avg. Loss: 3.751 Avg. Time: 1.046
Traceback (most recent call last):
File "src/train.py", line 425, in
main()
File "src/train.py", line 409, in main
args.freeze_bn[task_idx])
File "src/train.py", line 273, in train_segmenter
output = segmenter(input_var)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/SS/light-weight-refinenet/models/resnet.py", line 237, in forward
x1 = self.mflow_conv_g4_pool(x1)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/SS/light-weight-refinenet/utils/layer_factory.py", line 72, in forward
top = self.maxpool(top)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/pooling.py", line 146, in forward
self.return_indices)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/_jit_internal.py", line 133, in fn
return if_false(*args, **kwargs)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/functional.py", line 494, in _max_pool2d
input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 1.96 GiB total capacity; 1.14 GiB already allocated; 20.06 MiB free; 41.52 MiB cached)

cpu run

Hi,
It is my pleasure to learn your code ,but i do not have a capable GPU,so could you tell me how i run the codes in CPU? I have run .but runtime error occured, does CPU need cuda?

About evaluation

Hi, @DrSleep , I find there is no evaluation scripts to test the trained model, but output the mean IoU during training procedure, is that means I don't need to test the model?

Beginning with epoch 1,loss is equal to 0

Your examples/notebook's work is wonderful.thank you.
Now, I hope to use our dataset. It ran well ,but loss was too small.Like this,
Train epoch: 0 [0/512] Avg. Loss: 0.701 Avg. Time: 1.706 INFO:__main__: Train epoch: 0 [10/512] Avg. Loss: 0.589 Avg. Time: 0.434 INFO:__main__: Train epoch: 0 [20/512] Avg. Loss: 0.407 Avg. Time: 0.374
Finally,we got nothing except red in predict result.
our dataset is medical image, gt is white(255).we set NUM_CLASSES = 2.Is that right?

How to train on Custom Dataset?

Bravo for nice work!

I am also interested in retraining for Person Parts for custom dataset, any roadmap to do so?

Thanks

What is the minimum number of classes required by the code?

I tried running the model with my own dataset using images that had a code of 0 for buildings and 255 (ignore) for everything else and the model reached perfect validation after 1 epoch if I set the number of classes to 1 and 2 epochs if I set the number of classes to 2. Either way, I assume that is not how the model is supposed to work.

Do I need at least two classes that are not ignored for the model to work? Would the model work if I assigned a code of 0 for buildings and a code of 1 for everything else in the image? If that does not work and I have to properly classify a lot of objects in my images that will significantly reduce the number of images I will be able to use with the model, since I only have buildings classified at this point. I have 4000 images ready to go if I could just classify buildings and make everything else an ignored portion of the image or a single class, but I would only have 100 or less if I have to manually classify real objects in the images other than buildings. Also if I had a dataset with codes of 0, 1, and 255 (ignore) would the number of classes parameter in the configuration file need to be set to 2 classes or 3 classes?

Additionally, does the model work properly with 32 bit images or does it require 24 bit images like the images in the nyu dataset?

train error.

File "/home/.pyenv/versions/anaconda3-5.2.0/envs/pytorch/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/.pyenv/versions/anaconda3-5.2.0/envs/pytorch/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

could you help me

Something unclear

When using validate function, the output of the network doesn't go through the operation of softmax. Why? It is not clear?

meet train error in stage 2

your job is great!
but when i train on my own data, i meet a error in train stage 2:

`INFO:main: Val epoch: 219 Mean IoU: 1.000
INFO:main: Train epoch: 220 [0/44] Avg. Loss: 0.000 Avg. Time: 0.311
INFO:main: Train epoch: 220 [10/44] Avg. Loss: 0.000 Avg. Time: 0.267
INFO:main: Train epoch: 220 [20/44] Avg. Loss: 0.000 Avg. Time: 0.264
INFO:main: Train epoch: 220 [30/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 220 [40/44] Avg. Loss: 0.000 Avg. Time: 0.262
INFO:main: Train epoch: 221 [0/44] Avg. Loss: 0.000 Avg. Time: 0.296
INFO:main: Train epoch: 221 [10/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 221 [20/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 221 [30/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 221 [40/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 222 [0/44] Avg. Loss: 0.000 Avg. Time: 0.277
INFO:main: Train epoch: 222 [10/44] Avg. Loss: 0.000 Avg. Time: 0.263
INFO:main: Train epoch: 222 [20/44] Avg. Loss: 0.000 Avg. Time: 0.264
INFO:main: Train epoch: 222 [30/44] Avg. Loss: 0.000 Avg. Time: 0.263
INFO:main: Train epoch: 222 [40/44] Avg. Loss: 0.000 Avg. Time: 0.263
INFO:main: Train epoch: 223 [0/44] Avg. Loss: 0.000 Avg. Time: 0.303
INFO:main: Train epoch: 223 [10/44] Avg. Loss: 0.000 Avg. Time: 0.267
INFO:main: Train epoch: 223 [20/44] Avg. Loss: 0.000 Avg. Time: 0.262
INFO:main: Train epoch: 223 [30/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 223 [40/44] Avg. Loss: 0.000 Avg. Time: 0.262
INFO:main: Train epoch: 224 [0/44] Avg. Loss: 0.000 Avg. Time: 0.288
INFO:main: Train epoch: 224 [10/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 224 [20/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 224 [30/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 224 [40/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Val epoch: 224 [0/31] Mean IoU: 1.000
INFO:main: Val epoch: 224 [10/31] Mean IoU: 1.000
INFO:main: Val epoch: 224 [20/31] Mean IoU: 1.000
INFO:main: Val epoch: 224 [30/31] Mean IoU: 1.000
INFO:main: IoUs: [1. 1.]
INFO:main: Val epoch: 224 Mean IoU: 1.000

INFO:main: Train epoch: 225 [0/44] Avg. Loss: 0.000 Avg. Time: 0.316

Traceback (most recent call last):

File "/home/vetec-tf/program/light-weight-refinenet/src/train.py", line 429, in
main()

File "/home/vetec-tf/program/light-weight-refinenet/src/train.py", line 413, in main
args.freeze_bn[task_idx])

File "/home/vetec-tf/program/light-weight-refinenet/src/train.py", line 276, in train_segmenter
output = segmenter(input_var)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])

File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
raise output
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
output = module(*input, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/home/vetec-tf/program/light-weight-refinenet/models/resnet.py", line 203, in forward
l1 = self.layer1(x)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/home/vetec-tf/program/light-weight-refinenet/models/resnet.py", line 135, in forward
out += residual

RuntimeError: The expanded size of the tensor (1024) must match the existing size (256) at non-singleton dimension 1`

and stage 1 is fininshed:
`INFO:main: Val epoch: 199 [0/31] Mean IoU: 1.000

INFO:main: Val epoch: 199 [10/31] Mean IoU: 1.000

INFO:main: Val epoch: 199 [20/31] Mean IoU: 1.000

INFO:main: Val epoch: 199 [30/31] Mean IoU: 1.000

INFO:main: IoUs: [1. 1.]

INFO:main: Val epoch: 199 Mean IoU: 1.000

INFO:main:Stage 1 finished, time spent 23.135min

INFO:main: Created train set = 265 examples, val set = 31 examples

INFO:main: Training Stage 2`

can you help me ? thank you!

Confusion about task_idx

Hi! Great work with this repo!

I have a question regarding how you structured the training process. Why do you have 3-dimensional values in the config file?

I've seen that the task_idx value will iterate until it reaches the value of num_stages (which is assigned the depth of the num_classes variable, i.e. 3) and I am not sure of what it means or why it is used.

Thanks!

train my own data

Hi,DrSleep,
I want to train model using my own dataset,and I just want to 2 number class,when I changed the parameters in config.py,an error occured as flowing:

size mismatch for module.clf_conv.bias: copying a param with shape torch.Size([40]) from checkpoint, the shape in current model is torch.Size([2]).

in addition,when my label image is  binarized image,is the reason that the error occured?

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,2) and requested shape (3,2).

hope your help,thanks very much.

please help me!!!!!!!!

./train/nyu.sh
INFO:main: Loaded Segmenter 50, ImageNet-Pre-Trained=True, #PARAMS=27.34M
/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py:216: UserWarning: NLLLoss2d has been deprecated. Please use NLLLoss instead as a drop-in replacement and see https://pytorch.org/docs/master/nn.html#torch.nn.NLLLoss for more details.
warnings.warn("NLLLoss2d has been deprecated. "
INFO:main: Training Process Starts
INFO:main: Created train set = 7736 examples, val set = 48 examples
Traceback (most recent call last):
File "src/train.py", line 425, in
main()
File "src/train.py", line 388, in main
return validate(segmenter, val_loader, 0, num_classes=args.num_classes[task_idx])
File "src/train.py", line 317, in validate
output = segmenter(input_var)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/data/yh/light-weight-refinenet-bak-origin/models/resnet.py", line 222, in forward
x3 = x3 + x4
RuntimeError: The size of tensor a (40) must match the size of tensor b (32) at non-singleton dimension 3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.