Git Product home page Git Product logo

pytorch-fcn's People

Contributors

lyken17 avatar mbarnes1 avatar wkentaro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-fcn's Issues

How to run the code

Hi, can you provide simple instructions about how to train, do inference and get dataset?

'Dropout' object has no attribute 'weight'

Hi,
Thanks so much for your help! I'm new to pytorch.
There is a new error again!
Traceback (most recent call last):
File "examples/voc/train_fcn32s.py", line 100, in
main()
File "examples/voc/train_fcn32s.py", line 56, in main
model.copy_params_from_vgg16(vgg16, init_upscore=False)
File "/home/zheshiyige/Desktop/fully convolutional network/pytorch-fcn-master/torchfcn/models/fcn32s.py", line 117, in copy_params_from_vgg16
l2.weight.data = l1.weight.data.view(l2.weight.size())
File "/home/zheshiyige/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 238, in getattr
type(self).name, name))
AttributeError: 'Dropout' object has no attribute 'weight'

Thanks and Best Regards,

resuming from checkpoint error

When resuming from a saved checkpoint using the flag --resume and the path to .pth.tar file saved under the 'log' folder by the program, there seems to be an error during the optim step.

Traceback (most recent call last):
File "train_fcn32s.py", line 202, in
main()
File "train_fcn32s.py", line 198, in main
trainer.train()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torchfcn/trainer.py", line 286, in train
self.train_epoch()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torchfcn/trainer.py", line 245, in train_epoch
self.optim.step()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torch/optim/sgd.py", line 87, in step
param_state = self.state[p]
KeyError: Parameter containing:
(0 ,0 ,.,.) =
-0.1587 0.0404 -0.2275
-0.1916 -0.1479 0.0287
-0.0271 -0.3107 -0.1193

Why do we need padding=100 for a filter of size 3?

As the title, in torchfcn/models/fcn32s.py we have the setting for the first conv1 layer:

nn.Conv2d(3, 64, 3, padding=100),

Why do we need a padding of side length 100 instead of 1 according to the filter size 3?

Thanks

Not found fcn.utils.

Hi,
I cannot find 'fcn.utils' function, which is used in 'trainer.py' for fcn.utils.label_accuracy_score and fcn.utils.visualize_segmentation.

About the Preprocessing of input images.

This project used different preprocessing method with the Pytorch pretrained models. The pretrained models use RGB values normalized to [0, 1]. While in pytorch-fcn, images are BGR values and not normalized to [0, 1], only center normalized. So how did it works?

See this on pytorch discuss:
All pretrained torchvision models have the same preprocessing, which is to normalize using the following mean/std values: https://github.com/pytorch/examples/blob/master/imagenet/main.py#L92-L93164 (input is RGB format)
[ref: https://discuss.pytorch.org/t/how-to-preprocess-input-for-pre-trained-networks/683/2]

Nan error while training

Hello, when I was training the network after tuning the lr=1e4 which is the same as in FCN paper?
Why does it raise a nan error in loss? (it is ok if I use the lr set by your original script)

Can not work with pycaffe?

Hello! I am using your pytorch-fcn, which need the dependency fcn. However, when I try to convert the caffe model to pytorch one, importing both fcn and caffe the kernel become dead. But just importint single one, caffe or fcn, it works. If I import fcn after importing caffe or caffe after fcn, it can not work...

About VGG16 pretrained model

Hi @wkentaro , I see that you use the vgg16 pretrained model from caffe. I wonder that why you do not use a pretrained one provided by torchvision. Is the pretrained one from caffe better?

Thanks.

Where can I get the pretrained vgg16 model

The error tells me that I don't have the vgg16-from-caffe pretrained model.

bg455@51d9f43122cd:~/projects/pytorch-fcn/examples/voc$ ./train_fcn32s.py ./config/001.yaml
Traceback (most recent call last):
File "./train_fcn32s.py", line 128, in
main()
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "./train_fcn32s.py", line 93, in main
vgg16 = torchfcn.models.VGG16(pretrained=True)
File "/usr/local/lib/python2.7/dist-packages/torchfcn-1.5.0-py2.7.egg/torchfcn/models/vgg.py", line 10, in VGG16
state_dict = torch.load(model_file)
File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 246, in load
f = open(f, 'rb')
IOError: [Errno 2] No such file or directory: '/home/wkentaro/data/models/torch/vgg16-from-caffe.pth'

BC Support

Hi WKentaro,

Thank you for the implementation! I am new to PyTorch and I will try this out for an upcoming project.

Does your implementation include Bottleneck (BC) blocks? It seems to improve training time significantly.

  • FC

speedtest

Hi,

Thanks for sharing the code. I wonder that what kind of GPU were you using for speed test? When I ran the speed test on GTX 1080 (8G memory), the Elapsed time for chainer is: 404.87 [s / 1000 evals] and the for pytorch, it is 178.93 [s / 1000 evals]. And whenever I try to run it with the VOC dataset, it has the out of memory error, so I wonder if you can share your experiment environment, such as GPU type, how long it runs and how much memory fcn32s_pytorch takes.

Thank you!

A little bug in trainer.py

Hello, wkentaro. Do you find that the training loss will suddenly become larger when a new epoch starts? I've noticed that and think it is because of the incorrect use of the training and evaluating mode of the model.
In train_epoch(), line 171, when you finish validation, I think you should change the model back into training mode.

A bug in loss computation

Hi author

First I would like to thank for your sharing. During training, I notice the loss (around 30000) is obviously larger than fcn experiment on caffe(0.4 ~ 3.0). After reading your code, I find the cause. In your code trainer.py,

    loss = F.nll_loss(log_p, target, weight=weight, size_average=False)
    if size_average:
        loss /= mask.sum().data[0]

Note here mask is a torch.ByteTensor, which means the data is stored in a byte, value from 0~255 (8 bits). So when you sum it up, it will easily lead to overflow (because answer is also stored in torch.ByteTensor).

For example

test1 = Variable(torch.ones(1, 256,256).type('torch.ByteTensor'))
print(test1, test1.sum())

'''
Variable containing:
( 0 ,.,.) = 
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
     ...       ⋱       ...    
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
[torch.ByteTensor of size 1x256x256]
 Variable containing:
 0
[torch.ByteTensor of size 1]
'''

The correct way should be

test1 = Variable(torch.ones(1, 256,256).type('torch.ByteTensor'))
print(test1, test1.data.sum())

'''
Variable containing:
( 0 ,.,.) = 
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
     ...       ⋱       ...    
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
[torch.ByteTensor of size 1x256x256]
 65536
'''

So to compute the loss properly, you need to change loss /= mask.sum().data[0] to loss /= mask.data.sum().

I have created a pull request to fix this.

Discussion about the loss function

Hi, I did an experiment about the loss function:

log_p = log_p.transpose(1, 2).transpose(2, 3).contiguous().view(-1, c)

I find the code segment with view(-1, c) or without both works.Why? In my opinion, the code without view(-1, c) is the true one. Or, in the memory, the both are actually the same. How about your idea?

Incosistent Tensor size while loading data using data loader.

I am training an OCR using RNN. I am supplying input data as word images of varying dimensions (since each word can be of different lengths) and the size of class labels of each input data is also not consistent. Since each word can have a different number of characters.

   tensor_word_dataset = WordImagesDataset(images, truths, transform = ToTensor())
   dataset_loader = torch.utils.data.DataLoader(tensor_word_dataset,
                                            batch_size=16, shuffle=True,) 

This gives me the error:
RuntimeError: inconsistent tensor size at /py/conda-bld/pytorch_1493673470840/work/torch/lib/TH/generic/THTensorCopy.c:46

The image sizes of first 5 input labels and images respectively are:

 torch.Size([2]) torch.Size([32, 41])
 torch.Size([7]) torch.Size([32, 95])
 torch.Size([2]) torch.Size([32, 38])
 torch.Size([2]) torch.Size([32, 53])
 torch.Size([2]) torch.Size([32, 49])
 torch.Size([6]) torch.Size([32, 55])

Any suggestions as to how should I fix it. I want to shuffle the data and send it in batches instead of supplying them one at a time.

AttributeError: ' module object has no attribute 'utils'

Hi,
So sorry to bother you again!
Now, there is a new error
Traceback (most recent call last):
File "voc/train_fcn32s.py", line 100, in
main()
File "voc/train_fcn32s.py", line 56, in main
torchfcn.utils.copy_params_vgg16_to_fcn32s(vgg16, model, init_upscore=False)
AttributeError: 'module' object has no attribute 'utils'

   But I found that I have installed torchfcn (1.4.1) and can import torchfcn.

Thanks and Best Regards,

`--out` option

I was trying to run train in the VOC Example with log but this parameter doesn't exist!
noticed it will output log no regardless of parameter

strange error unknown

Hi,
When I run the examples/voc/train_fcn32s.py. There is a strange error
File "train_fcn32s.py", line 46, in main
model = torchfcn.models.FCN32s(n_class=21, deconv=deconv)
TypeError: init() got an unexpected keyword argument 'deconv'
could you tell me what's wrong with this error?
Thanks and Best Regards,

Gradients for Loss function

Hi,

I have a question for loss function,

loss = F.nll_loss(log_p, target, weight=weight, size_average=False) if size_average: loss /= mask.sum().data[0]

Whether 'size_average' is True or False, scales of gradients for the loss function are different.

This may be a problem because different scales of gradients may impact on how we set learning rate.

Is this the reason you use very low learning rate, 1e-10?

Thanks!!

Inconsistent Tensor Size

When I implement the VOC2012 Dataset based your code, it occurs an error "incosistent tensor size". The code about dataset is same with yours. But I add some test code as below:

if __name__ == '__main__':
    dst = VOCDataSet("./data", is_transform=True)
    trainloader = data.DataLoader(dst, batch_size=4)
    for i, data in enumerate(trainloader):
        imgs, labels = data
        print(imgs[0].type())

Did you encounter this problem?

Pretrained network for fcn8s model

Hi, I tried to train the fcn8s model using pretrained weights from the vgg16 model, but there was no visible learning and the model output was all zero for the first to ~10k iterations. However, the training loss converges when I use the fcn16s model to initialize the weights. I could not really understand the reason behind this, can you help with an explanation.

About the offset of crop

Hi, I just want to ask how you decide the crop offset in different models. If I want to train this model on different datasets with different image size, how to calculate the precise crop offset?

load_state_dict error

Hi,
Recently, I run the examples/voc/train_fcn32s.py, and encountered another error:
Traceback (most recent call last):
File "train_fcn32s.py", line 101, in
main()
File "train_fcn32s.py", line 55, in main
vgg16.load_state_dict(torch.load(pth_file))
File "/home/zheshiyige/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 331, in load_state_dict
.format(name))
KeyError: 'unexpected key "classifier.1.weight" in state_dict'

I have download the /home/zheshiyige/data/models/torch/vgg16-00b39a1b.pth

Could you tell me how to fix this error?

Thanks and Best Regards,

FCN8s

Hi,

May I ask that when you update FCN8s or FCN 16s?

Thanks!

How to install the module named 'fcn'?

Now after I ran python setup.py install, I got the following error:

import torchfcn
Traceback (most recent call last):

  File "<ipython-input-1-c08454750c97>", line 1, in <module>
    import torchfcn

  File "/home/zhang/anaconda2/lib/python2.7/site-packages/torchfcn-0.2-py2.7.egg/torchfcn/__init__.py", line 3, in <module>
    from trainer import Trainer  # NOQA

  File "/home/zhang/anaconda2/lib/python2.7/site-packages/torchfcn-0.2-py2.7.egg/torchfcn/trainer.py", line 6, in <module>
    import fcn

ImportError: No module named fcn****

AttributeError: 'float' object has no attribute 'total_seconds'

Hi,wkentrao. sorry to bother u !
I want to train fcn32s , like this.

./train_fcn32s.py -g 0

Now,i have a error. I dont know how to fixed that.

Traceback (most recent call last):
File "./train_fcn32s.py", line 164, in
main()
File "./train_fcn32s.py", line 160, in main
trainer.train()
File "/home/wukuan/anaconda3/envs/env27/lib/python2.7/site-packages/torchfcn/trainer.py", line 222, in train
self.train_epoch()
File "/home/wukuan/anaconda3/envs/env27/lib/python2.7/site-packages/torchfcn/trainer.py", line 208, in train_epoch
elapsed_time = elapsed_time.total_seconds()
AttributeError: 'float' object has no attribute 'total_seconds'

I want to know some details about this error.
Thanks and Best Regards!

**Training**

code-block:: bash

./train_fcn32s.py config/001.yaml

Running this step, I can see this error, as follows,
Traceback (most recent call last):
File "./train_fcn32s.py", line 128, in
main()
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "./train_fcn32s.py", line 62, in main
cfg, out = load_config_file(config_file)
File "./train_fcn32s.py", line 32, in load_config_file
name += '_VCS-%s' % git_hash()
File "./train_fcn32s.py", line 21, in git_hash
hash = subprocess.check_output(shlex.split(cmd)).strip()
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 212, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 390, in init
errread, errwrite)
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Wishing for your reply!

RuntimeWarning: invalid value encountered in divide during example training

In the first round of evaluation (with pretrained VGG16 model) there is following error

.../pytorch-fcn/examples/voc/torchfcn/utils.py:24: RuntimeWarning: invalid value encountered in divide
  acc_cls = np.diag(hist) / hist.sum(axis=1)                                    
.../pytorch-fcn/examples/voc/torchfcn/utils.py:26: RuntimeWarning: invalid value encountered in divide
  iu = np.diag(hist) / (hist.sum(axis=1) + hist.sum(axis=0) - np.diag(hist))

and the training loss becomes NaN. Is this expected?

Also I am not sure if it's related but AFTER the first round all the validation error becomes NaN.

"import fcn" error

Hi, sorry about disturbing you, i got an error when run voc/train_fcn322.py, i can't import fcn, is that i need addition package? Thanks.

slower benchmarks

I tested this out on an AWS p2 instance and I got a significantly slower benchmark. Can you confirm the hardware that you ran your benchmark on?

==> Benchmark: gpu=0, times=1000, dynamic_input=False
==> Testing FCN32s with PyTorch
Elapsed time: 245.73 [s / 1000 evals]
Hz: 4.07 [hz]

ImportError: No module named fcn

Hi, when I run the file "model_caffe_to_pytorch.py", it turns out that there is "no module named fcn". Thank you very much.
By the way, "export PYTHONAPTH=$(pwd)/python:$PYTHONPATH" should be "export PYTHONPATH=$(pwd)/python:$PYTHONPATH", I think.

ModuleNotFoundError: No module named 'v1'

$ ./train_fcn32s.py
Traceback (most recent call last):
	File "./train_fcn32s.py", line 9, in <module>
		import torchfcn
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/__init__.py", line 1, in <module>
		from . import datasets  # NOQA
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/datasets/__init__.py", line 1, in <module>
		from .apc import APC2016V1  # NOQA
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/datasets/apc/__init__.py", line 1, in <module>
		from v1 import APC2016V1  # NOQA
ModuleNotFoundError: No module named 'v1'

get error when executing the script "learning_curve.py"?

Hello. I found that when the training time exceed 24 hours, the logging will write "1 day" to the time field of "log.csv" . However, this will cause error when using the script "learning_curve.py". How can I fix this error? Thanks!

Mini-batch and Multiple-gpu

Hi,

I want to use batch size more than 1.

So, I changed batch_size to 4.
e.g.)
train_loader = torch.utils.data.DataLoader(
torchfcn.datasets.SBDClassSeg(root, split='train', transform=True),
batch_size=4, shuffle=True, **kwargs)

But, it raises an error so that I could not use it.

Could you please please let me know how to modify it?

and, is it possible to use multiple gpus by modifying codes?

Thank you!

Unknown skimage warning

Hi,
Thanks a lot for your help! Now, the code can run successfully on my computer. But there is a warnning, I'm not sure whether it is an important issue.

/home/zheshiyige/anaconda2/lib/python2.7/site-packages/fcn-5.8.1-py2.7.egg/fcn/utils.py:310: RuntimeWarning: invalid value encountered in true_divide
/home/zheshiyige/.local/lib/python2.7/site-packages/skimage/util/dtype.py:122: UserWarning: Possible precision loss when converting from float64 to uint8
.format(dtypeobj_in, dtypeobj_out))
/home/zheshiyige/.local/lib/python2.7/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
Train epoch=0: 7%| | 611/8498 [02:25<31:02, 4.24it/s]/home/zheshiyige/anaconda2/lib/python2.7/site-packages/fcn-5.8.1-py2.7.egg/fcn/utils.py:312: RuntimeWarning: invalid value encountered in true_divide

Thanks and Best Regards,

The learning rate is too small

I found that when you trained the FCN8s model, the learning rate is too small (1e-14). I remember that the learning rate is set to 1e-4 in the original FCN paper. I am a little confused.
Can you give me some answer? Thank you for advance

resume error

hi, when i using resume to load parameters, it is crash saying ''unexpected key "module.features.0.weight" in state_dict''. I don't know what wrong with this, Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.