wkentaro / pytorch-fcn Goto Github PK

View Code? Open in Web Editor NEW

1.7K 28.0 481.0 32.75 MB

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

License: MIT License

Python 100.00%

pytorch computer-vision deep-learning semantic-segmentation convolutional-networks fcn fcn8s

pytorch-fcn's People

Contributors

Stargazers

Watchers

Forkers

jdc08161063 chagge wanjinchang sunjieee gwnudt acgtyrant adrianhust kirk86 richard-chau chenbangfeng hyzcn cosmmb kyocen josipd ml-lab uzeful bookerdewitt marvis smn2010 csgwon volzkzg deepiano todawn youngfly11 weigq cshaoping chenchr wheatdog luciolis rgaonkar xlovelace jiangweixian mrochan inkimage nolsigan foxet grseb9s lxj0276 amenbo huangzhii eivindeb yangyf0419 davidtranno1 jeong-tae foreverfei kekedan liu3xing3long tiffany940107 skyhowie25 mrteera yougoforward jin-linhao hbcbh1999 mattkleinsmith feixian15 mbarnes1 mukosame xy0806 willdamon sinianyutian g380909685 sanketloke linpingchuan isnot2bad thezino tdong7 codes-kzhan aust-hansen ywwang2013 alainouyang queenie88 zhengyk11 diliu1992 siyecao99999 emmasrh yanlei03 hangtongluo chrisliu54 zetianxiao yuyangyg amwons ywxkjtsdzy dros1986 yliu134 ievn2015 xxlxsyhl stargazeryuan chuckgithub haomengchao zllrunning zhou13 wangjianyuweg afcarl bjchen666 tangal0203 liudaizong sixitingting andrewchiyz rizkywellyanto oujieww

pytorch-fcn's Issues

Which paper vision does this program follow?

Journal version or conference version?

How to run the code

Hi, can you provide simple instructions about how to train, do inference and get dataset?

'Dropout' object has no attribute 'weight'

Hi,
Thanks so much for your help! I'm new to pytorch.
There is a new error again!
Traceback (most recent call last):
File "examples/voc/train_fcn32s.py", line 100, in
main()
File "examples/voc/train_fcn32s.py", line 56, in main
model.copy_params_from_vgg16(vgg16, init_upscore=False)
File "/home/zheshiyige/Desktop/fully convolutional network/pytorch-fcn-master/torchfcn/models/fcn32s.py", line 117, in copy_params_from_vgg16
l2.weight.data = l1.weight.data.view(l2.weight.size())
File "/home/zheshiyige/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 238, in getattr
type(self).name, name))
AttributeError: 'Dropout' object has no attribute 'weight'

Thanks and Best Regards,

resuming from checkpoint error

When resuming from a saved checkpoint using the flag --resume and the path to .pth.tar file saved under the 'log' folder by the program, there seems to be an error during the optim step.

Traceback (most recent call last):
File "train_fcn32s.py", line 202, in
main()
File "train_fcn32s.py", line 198, in main
trainer.train()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torchfcn/trainer.py", line 286, in train
self.train_epoch()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torchfcn/trainer.py", line 245, in train_epoch
self.optim.step()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torch/optim/sgd.py", line 87, in step
param_state = self.state[p]
KeyError: Parameter containing:
(0 ,0 ,.,.) =
-0.1587 0.0404 -0.2275
-0.1916 -0.1479 0.0287
-0.0271 -0.3107 -0.1193

Why do we need padding=100 for a filter of size 3?

As the title, in torchfcn/models/fcn32s.py we have the setting for the first conv1 layer:

nn.Conv2d(3, 64, 3, padding=100),

Why do we need a padding of side length 100 instead of 1 according to the filter size 3?

Thanks

Not found fcn.utils.

Hi,
I cannot find 'fcn.utils' function, which is used in 'trainer.py' for fcn.utils.label_accuracy_score and fcn.utils.visualize_segmentation.

Use seg11val.txt of orignal paper

About the Preprocessing of input images.

This project used different preprocessing method with the Pytorch pretrained models. The pretrained models use RGB values normalized to [0, 1]. While in pytorch-fcn, images are BGR values and not normalized to [0, 1], only center normalized. So how did it works?

See this on pytorch discuss:
All pretrained torchvision models have the same preprocessing, which is to normalize using the following mean/std values: https://github.com/pytorch/examples/blob/master/imagenet/main.py#L92-L93164 (input is RGB format)
[ref: https://discuss.pytorch.org/t/how-to-preprocess-input-for-pre-trained-networks/683/2]

Nan error while training

Hello, when I was training the network after tuning the lr=1e4 which is the same as in FCN paper?
Why does it raise a nan error in loss? (it is ok if I use the lr set by your original script)

an issue about train_data

hi man~ sorry for bothering u
I can not get the http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/sema
ntic_contours/benchmark.tgz -O benchmark.tar for the benchmark data.
Could u please upload the files to your project?
thx!

Can not work with pycaffe?

Hello! I am using your pytorch-fcn, which need the dependency fcn. However, when I try to convert the caffe model to pytorch one, importing both fcn and caffe the kernel become dead. But just importint single one, caffe or fcn, it works. If I import fcn after importing caffe or caffe after fcn, it can not work...

About VGG16 pretrained model

Hi @wkentaro , I see that you use the vgg16 pretrained model from caffe. I wonder that why you do not use a pretrained one provided by torchvision. Is the pretrained one from caffe better?

Thanks.

how to provide the three arguments in train_fcn32s.py

Hi,
So sorry to bother you again! Could you tell me what's the meaning of the three arguments, out, resume and no-deconv? How to provide these three arguments to make it run properly?

Thanks and Best Regards,

Could you plz add more running instructions on /example/voc/train_fcn32s.py

As the title, thanks. It seems that

./train_fcn32s.py --out logs/fcs32s_sbd

does not work out with the current version.

Best,

Where can I get the pretrained vgg16 model

The error tells me that I don't have the vgg16-from-caffe pretrained model.

bg455@51d9f43122cd:~/projects/pytorch-fcn/examples/voc$ ./train_fcn32s.py ./config/001.yaml
Traceback (most recent call last):
File "./train_fcn32s.py", line 128, in
main()
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "./train_fcn32s.py", line 93, in main
vgg16 = torchfcn.models.VGG16(pretrained=True)
File "/usr/local/lib/python2.7/dist-packages/torchfcn-1.5.0-py2.7.egg/torchfcn/models/vgg.py", line 10, in VGG16
state_dict = torch.load(model_file)
File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 246, in load
f = open(f, 'rb')
IOError: [Errno 2] No such file or directory: '/home/wkentaro/data/models/torch/vgg16-from-caffe.pth'

BC Support

Hi WKentaro,

Thank you for the implementation! I am new to PyTorch and I will try this out for an upcoming project.

Does your implementation include Bottleneck (BC) blocks? It seems to improve training time significantly.

speedtest

Hi,

Thanks for sharing the code. I wonder that what kind of GPU were you using for speed test? When I ran the speed test on GTX 1080 (8G memory), the Elapsed time for chainer is: 404.87 [s / 1000 evals] and the for pytorch, it is 178.93 [s / 1000 evals]. And whenever I try to run it with the VOC dataset, it has the out of memory error, so I wonder if you can share your experiment environment, such as GPU type, how long it runs and how much memory fcn32s_pytorch takes.

Thank you!

A little bug in trainer.py

Hello, wkentaro. Do you find that the training loss will suddenly become larger when a new epoch starts? I've noticed that and think it is because of the incorrect use of the training and evaluating mode of the model.
In train_epoch(), line 171, when you finish validation, I think you should change the model back into training mode.

A bug in loss computation

Hi author

First I would like to thank for your sharing. During training, I notice the loss (around 30000) is obviously larger than fcn experiment on caffe(0.4 ~ 3.0). After reading your code, I find the cause. In your code trainer.py,

    loss = F.nll_loss(log_p, target, weight=weight, size_average=False)
    if size_average:
        loss /= mask.sum().data[0]

Note here mask is a torch.ByteTensor, which means the data is stored in a byte, value from 0~255 (8 bits). So when you sum it up, it will easily lead to overflow (because answer is also stored in torch.ByteTensor).

For example

test1 = Variable(torch.ones(1, 256,256).type('torch.ByteTensor'))
print(test1, test1.sum())

'''
Variable containing:
( 0 ,.,.) = 
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
     ...       ⋱       ...    
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
[torch.ByteTensor of size 1x256x256]
 Variable containing:
 0
[torch.ByteTensor of size 1]
'''

The correct way should be

test1 = Variable(torch.ones(1, 256,256).type('torch.ByteTensor'))
print(test1, test1.data.sum())

'''
Variable containing:
( 0 ,.,.) = 
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
     ...       ⋱       ...    
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
[torch.ByteTensor of size 1x256x256]
 65536
'''

So to compute the loss properly, you need to change loss /= mask.sum().data[0] to loss /= mask.data.sum().

I have created a pull request to fix this.

Discussion about the loss function

Hi, I did an experiment about the loss function:

log_p = log_p.transpose(1, 2).transpose(2, 3).contiguous().view(-1, c)

I find the code segment with view(-1, c) or without both works.Why? In my opinion, the code without view(-1, c) is the true one. Or, in the memory, the both are actually the same. How about your idea?

Incosistent Tensor size while loading data using data loader.

I am training an OCR using RNN. I am supplying input data as word images of varying dimensions (since each word can be of different lengths) and the size of class labels of each input data is also not consistent. Since each word can have a different number of characters.

   tensor_word_dataset = WordImagesDataset(images, truths, transform = ToTensor())
   dataset_loader = torch.utils.data.DataLoader(tensor_word_dataset,
                                            batch_size=16, shuffle=True,)

This gives me the error:
RuntimeError: inconsistent tensor size at /py/conda-bld/pytorch_1493673470840/work/torch/lib/TH/generic/THTensorCopy.c:46

The image sizes of first 5 input labels and images respectively are:

 torch.Size([2]) torch.Size([32, 41])
 torch.Size([7]) torch.Size([32, 95])
 torch.Size([2]) torch.Size([32, 38])
 torch.Size([2]) torch.Size([32, 53])
 torch.Size([2]) torch.Size([32, 49])
 torch.Size([6]) torch.Size([32, 55])

Any suggestions as to how should I fix it. I want to shuffle the data and send it in batches instead of supplying them one at a time.

AttributeError: ' module object has no attribute 'utils'

Hi,
So sorry to bother you again!
Now, there is a new error
Traceback (most recent call last):
File "voc/train_fcn32s.py", line 100, in
main()
File "voc/train_fcn32s.py", line 56, in main
torchfcn.utils.copy_params_vgg16_to_fcn32s(vgg16, model, init_upscore=False)
AttributeError: 'module' object has no attribute 'utils'

   But I found that I have installed torchfcn (1.4.1) and can import torchfcn.

Thanks and Best Regards,

`--out` option

~~I was trying to run train in the VOC Example with log but this parameter doesn't exist!~~
noticed it will output log no regardless of parameter

Questions about the parameters in the upscore layer

Hi @wkentaro ,your code is running will on my own dataset but there are some parts that I don't quite understand.
In fcn32s.py line 144 you set
h = h[:, :, 19:19 + x.size()[2], 19:19 + x.size()[3]].contiguous().
Why you use 19 here or I can just replace it with 20 or 18?
Thanks very much.

Support weight/bias-wise learning rate setting

As in https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/voc-fcn32s/net.py#L8

strange error unknown

Hi,
When I run the examples/voc/train_fcn32s.py. There is a strange error
File "train_fcn32s.py", line 46, in main
model = torchfcn.models.FCN32s(n_class=21, deconv=deconv)
TypeError: init() got an unexpected keyword argument 'deconv'
could you tell me what's wrong with this error?
Thanks and Best Regards,

Gradients for Loss function

Hi,

I have a question for loss function,

loss = F.nll_loss(log_p, target, weight=weight, size_average=False) if size_average: loss /= mask.sum().data[0]

Whether 'size_average' is True or False, scales of gradients for the loss function are different.

This may be a problem because different scales of gradients may impact on how we set learning rate.

Is this the reason you use very low learning rate, 1e-10?

Thanks!!

Inconsistent Tensor Size

When I implement the VOC2012 Dataset based your code, it occurs an error "incosistent tensor size". The code about dataset is same with yours. But I add some test code as below:

if __name__ == '__main__':
    dst = VOCDataSet("./data", is_transform=True)
    trainloader = data.DataLoader(dst, batch_size=4)
    for i, data in enumerate(trainloader):
        imgs, labels = data
        print(imgs[0].type())

Did you encounter this problem?

Pretrained network for fcn8s model

Hi, I tried to train the fcn8s model using pretrained weights from the vgg16 model, but there was no visible learning and the model output was all zero for the first to ~10k iterations. However, the training loss converges when I use the fcn16s model to initialize the weights. I could not really understand the reason behind this, can you help with an explanation.

About the offset of crop

Hi, I just want to ask how you decide the crop offset in different models. If I want to train this model on different datasets with different image size, how to calculate the precise crop offset?

load_state_dict error

Hi,
Recently, I run the examples/voc/train_fcn32s.py, and encountered another error:
Traceback (most recent call last):
File "train_fcn32s.py", line 101, in
main()
File "train_fcn32s.py", line 55, in main
vgg16.load_state_dict(torch.load(pth_file))
File "/home/zheshiyige/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 331, in load_state_dict
.format(name))
KeyError: 'unexpected key "classifier.1.weight" in state_dict'

I have download the /home/zheshiyige/data/models/torch/vgg16-00b39a1b.pth

Could you tell me how to fix this error?

Thanks and Best Regards,

FCN8s

Hi,

May I ask that when you update FCN8s or FCN 16s?

Thanks!

How to install the module named 'fcn'?

Now after I ran python setup.py install, I got the following error:

import torchfcn
Traceback (most recent call last):

  File "<ipython-input-1-c08454750c97>", line 1, in <module>
    import torchfcn

  File "/home/zhang/anaconda2/lib/python2.7/site-packages/torchfcn-0.2-py2.7.egg/torchfcn/__init__.py", line 3, in <module>
    from trainer import Trainer  # NOQA

  File "/home/zhang/anaconda2/lib/python2.7/site-packages/torchfcn-0.2-py2.7.egg/torchfcn/trainer.py", line 6, in <module>
    import fcn

ImportError: No module named fcn****

AttributeError: 'float' object has no attribute 'total_seconds'

Hi,wkentrao. sorry to bother u !
I want to train fcn32s , like this.

./train_fcn32s.py -g 0

Now,i have a error. I dont know how to fixed that.

Traceback (most recent call last):
File "./train_fcn32s.py", line 164, in
main()
File "./train_fcn32s.py", line 160, in main
trainer.train()
File "/home/wukuan/anaconda3/envs/env27/lib/python2.7/site-packages/torchfcn/trainer.py", line 222, in train
self.train_epoch()
File "/home/wukuan/anaconda3/envs/env27/lib/python2.7/site-packages/torchfcn/trainer.py", line 208, in train_epoch
elapsed_time = elapsed_time.total_seconds()
AttributeError: 'float' object has no attribute 'total_seconds'

I want to know some details about this error.
Thanks and Best Regards!

A lot of memory consumed after import trochfcn

Why is a lot of memory (~11G) consumed after importing torchfcn? And because of this, I am not able to run the training demo due to out of memory. How could this happen?

Training

code-block:: bash

./train_fcn32s.py config/001.yaml

Running this step, I can see this error, as follows,
Traceback (most recent call last):
File "./train_fcn32s.py", line 128, in
main()
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "./train_fcn32s.py", line 62, in main
cfg, out = load_config_file(config_file)
File "./train_fcn32s.py", line 32, in load_config_file
name += '_VCS-%s' % git_hash()
File "./train_fcn32s.py", line 21, in git_hash
hash = subprocess.check_output(shlex.split(cmd)).strip()
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 212, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 390, in init
errread, errwrite)
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Wishing for your reply!

Can only deal with batch size as 1?

RuntimeWarning: invalid value encountered in divide during example training

In the first round of evaluation (with pretrained VGG16 model) there is following error

.../pytorch-fcn/examples/voc/torchfcn/utils.py:24: RuntimeWarning: invalid value encountered in divide
  acc_cls = np.diag(hist) / hist.sum(axis=1)                                    
.../pytorch-fcn/examples/voc/torchfcn/utils.py:26: RuntimeWarning: invalid value encountered in divide
  iu = np.diag(hist) / (hist.sum(axis=1) + hist.sum(axis=0) - np.diag(hist))

and the training loss becomes NaN. Is this expected?

Also I am not sure if it's related but AFTER the first round all the validation error becomes NaN.

"import fcn" error

Hi, sorry about disturbing you, i got an error when run voc/train_fcn322.py, i can't import fcn, is that i need addition package? Thanks.

slower benchmarks

I tested this out on an AWS p2 instance and I got a significantly slower benchmark. Can you confirm the hardware that you ran your benchmark on?

==> Benchmark: gpu=0, times=1000, dynamic_input=False
==> Testing FCN32s with PyTorch
Elapsed time: 245.73 [s / 1000 evals]
Hz: 4.07 [hz]

ImportError: No module named fcn

Hi, when I run the file "model_caffe_to_pytorch.py", it turns out that there is "no module named fcn". Thank you very much.
By the way, "export PYTHONAPTH=$(pwd)/python:$PYTHONPATH" should be "export PYTHONPATH=$(pwd)/python:$PYTHONPATH", I think.

ModuleNotFoundError: No module named 'v1'

$ ./train_fcn32s.py
Traceback (most recent call last):
	File "./train_fcn32s.py", line 9, in <module>
		import torchfcn
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/__init__.py", line 1, in <module>
		from . import datasets  # NOQA
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/datasets/__init__.py", line 1, in <module>
		from .apc import APC2016V1  # NOQA
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/datasets/apc/__init__.py", line 1, in <module>
		from v1 import APC2016V1  # NOQA
ModuleNotFoundError: No module named 'v1'

Is there implementation of FCN16s and FCN8s

Hi,
I can only find fcn32s implementation in VOC example, is there fcn16s and fcn8s implementation?

Thanks and Best Regards,

get error when executing the script "learning_curve.py"?

Hello. I found that when the training time exceed 24 hours, the logging will write "1 day" to the time field of "log.csv" . However, this will cause error when using the script "learning_curve.py". How can I fix this error? Thanks!

Mini-batch and Multiple-gpu

Hi,

I want to use batch size more than 1.

So, I changed batch_size to 4.
e.g.)
train_loader = torch.utils.data.DataLoader(
torchfcn.datasets.SBDClassSeg(root, split='train', transform=True),
batch_size=4, shuffle=True, **kwargs)

But, it raises an error so that I could not use it.

Could you please please let me know how to modify it?

and, is it possible to use multiple gpus by modifying codes?

Thank you!

Support FCN32s copied from Caffe

Improve weight initialization

As in https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/voc-fcn32s/solve.py#L25

Unknown skimage warning

Hi,
Thanks a lot for your help! Now, the code can run successfully on my computer. But there is a warnning, I'm not sure whether it is an important issue.

/home/zheshiyige/anaconda2/lib/python2.7/site-packages/fcn-5.8.1-py2.7.egg/fcn/utils.py:310: RuntimeWarning: invalid value encountered in true_divide
/home/zheshiyige/.local/lib/python2.7/site-packages/skimage/util/dtype.py:122: UserWarning: Possible precision loss when converting from float64 to uint8
.format(dtypeobj_in, dtypeobj_out))
/home/zheshiyige/.local/lib/python2.7/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
Train epoch=0: 7%| | 611/8498 [02:25<31:02, 4.24it/s]/home/zheshiyige/anaconda2/lib/python2.7/site-packages/fcn-5.8.1-py2.7.egg/fcn/utils.py:312: RuntimeWarning: invalid value encountered in true_divide

Thanks and Best Regards,

The learning rate is too small

I found that when you trained the FCN8s model, the learning rate is too small (1e-14). I remember that the learning rate is set to 1e-4 in the original FCN paper. I am a little confused.
Can you give me some answer? Thank you for advance

resume error

hi, when i using resume to load parameters, it is crash saying ''unexpected key "module.features.0.weight" in state_dict''. I don't know what wrong with this, Thanks a lot!

wkentaro / pytorch-fcn Goto Github PK

pytorch-fcn's People

Contributors

Stargazers

Watchers

Forkers

pytorch-fcn's Issues

Recommend Projects

Recommend Topics

Recommend Org