Git Product home page Git Product logo

mtan's Introduction

MTAN - Multi-Task Attention Network

This repository contains the source code of Multi-Task Attention Network (MTAN) and baselines from the paper, End-to-End Multi-Task Learning with Attention, introduced by Shikun Liu, Edward Johns, and Andrew Davison.

See more results in our project page here.

Final Update - This repository will not be further updated. Check out our latest work: Auto-Lambda for more multi-task optimisation methods.

Experiments

Image-to-Image Predictions (One-to-Many)

Under the folder im2im_pred, we have provided our proposed network along with all the baselines on NYUv2 dataset presented in the paper. All models were written in PyTorch, and we have updated the implementation to PyTorch version 1.5 in the latest commit.

Download our pre-processed NYUv2 dataset here which we evaluated in the paper. We use the pre-computed ground-truth normals from here. The raw 13-class NYUv2 dataset can be directly downloaded in this repo with segmentation labels defined in this repo.

I am sorry that I am not able to provide the raw pre-processing code due to an unexpected computer crash.

Update - Jun 2019: I have now released the pre-processing CityScapes dataset with 2, 7, and 19-class semantic labels (see the paper for more details) and (inverse) depth labels. Download [256x512, 2.42GB] version here and [128x256, 651MB] version here.

Update - Oct 2019: For pytorch 1.2 users: The mIoU evaluation method has now been updated to avoid "zeros issue" from computing binary masks. Also, to correctly run the code, please move the scheduler.step() after calling the optimizer.step(), e.g. one line before the last performance printing step to fit the updated pytorch requirements. See more in the official pytorch documentation here. [We have fixed this in the latest commit.]

Update - May 2020: We now have provided our official MTAN-DeepLabv3 (or ResNet-like architecture) design to support more complicated and modern multi-task network backbone. Please check out im2im_pred/model_resnet_mtan for more details. One should easily replace this model with any training template defined in im2im_pred.

Update - July 2020: We have further improved the readability and updated all implementations in im2im_pred to comply the current latest version PyTorch 1.5. We fixed a bug to exclude non-defined pixel predictions for a more accurate mean IoU computation in semantic segmentation tasks. We also provided an additional option for users applying data augmentation in NYUv2 to avoid over-fitting and achieve better performances.

Update - Nov 2020 [IMPORTANT!]: We have updated mIoU and Pixel Accuracy formulas to be consistent with the standard benchmark from the official COCO segmentation scripts. The mIoU for all methods are now expected to improve approximately 8% of performance. The new formulas compute mIoU and Pixel Accuracy based on the accumulated pixel predictions across all images, while the original formulas were based on average pixel predictions in each image across all images.

All models (files) built with SegNet (proposed in the original paper), are described in the following table:

File Name Type Flags Comments
model_segnet_single.py Single task, dataroot standard single task learning
model_segnet_stan.py Single task, dataroot our approach whilst applied on one task
model_segnet_split.py Multi weight, dataroot, temp, type multi-task learning baseline in which the shared network splits at the last layer (also known as hard-parameter sharing)
model_segnet_dense.py Multi weight, dataroot, temp multi-task learning baseline in which each task has its own paramter space (also known as soft-paramter sharing)
model_segnet_cross.py Multi weight, dataroot, temp our implementation of the Cross Stitch Network
model_segnet_mtan.py Multi weight, dataroot, temp our approach

For each flag, it represents

Flag Name Usage Comments
task pick one task to train: semantic (semantic segmentation, depth-wise cross-entropy loss), depth (depth estimation, l1 norm loss) or normal (normal prediction, cos-similarity loss) only available in single-task learning
dataroot directory root for NYUv2 dataset just put under the folder im2im_pred to avoid any concerns :D
weight weighting options for multi-task learning: equal (direct summation of all task losses), DWA (our proposal), uncert (our implementation of the Weight Uncertainty Method) only available in multi-task learning
temp hyper-parameter temperature in DWA weighting option to determine the softness of task weighting
type different versions of multi-task baseline split: standard, deep, wide only available in the baseline split
apply_augmentation toggle on to apply data augmentation in NYUv2 to avoid over-fitting available in all training models

To run any model, cd im2im_pred/ and simply run python MODEL_NAME.py --FLAG_NAME 'FLAG_OPTION' (default option is training without augmentation). Toggle on apply_augmentation flag to train with data augmentation: python MODEL_NAME.py --FLAG_NAME 'FLAG_OPTION' --apply_augmentation.

Please note that, we did not apply any data augmentation in the original paper.

Benchmarking Multi-task Learning

Benchmarking multi-task learning is always a tricky question, since the performance and evaluation method for each task is different. In the original paper, I simply averaged the performance for each task from the last 10 epochs, assuming we do not have access to the validation data.

For a more standardized and fair comparison, I would suggest researchers adopt the evaluation method defined in Section 5, Equation 4 of this paper, which computes the average relative task improvements over single task learning.

NYUv2 can be easily over-fitted due to its small sample size. In July's update, we have provided an option to apply data augmentation to alleviate the over-fitting issue (thanks to Jialong's help). We highly recommend to benchmark NYUv2 dataset with this data augmentation, to be consistent with other SOTA multi-task learning methods using the same data augmentation technique, such as PAD-Net and MTI-Net.

Visual Decathlon Challenge (Many-to-Many)

We also provided source code for Visual Decathlon Challenge for which we build MTAN based on Wide Residual Network from the implementation here.

To run the code, please follow the steps below.

  1. Download the dataset and devkit at the official Visual Decathlon Challenge website here. Move the dataset folder decathlon-1.0-data under the folder visual_decathlon. Then, move decathlon_mean_std.pickle into the folder of the dataset folder decathlon-1.0-data.

  2. Create a directory under test folder for each dataset, and move all test files into that created folder. (That is to comply the PyTorch dataloader format.)

  3. Install setup.py in decathlon devkit under code/coco/PythonAPI folder. And then move pycocotools and annotations from devkit into visual_decathlon folder.

  4. cd visual_decathlon and run python model_wrn_mtan.py --gpu [GPU_ID] --mode [eval, or all] for training. eval represents evaluating on validation dataset (normally for debugging or hyper-parameter tuning), and all represents training on all datasets (normally for final evaluating, or benchmarking).

  5. Run python model_wrn_eval.py --dataset 'imagenet' and 'notimagenet' (sequentially) for evaluating on Imagenet and other datasets. And finally, run python coco_results.py for converting into COCO format for online evaluation.

Other Notices

  1. The provided code is highly optimised for readability. If you find any unusual behaviour, please post an issue or directly contact my email below.
  2. Training the provided code will result different performances (depending on the type of task) than the reported numbers in the paper for image-to-image prediction tasks. But, the rankings stay the same. If you want to compare any models in the paper for image-to-image prediction tasks, please re-run the model on your own with your preferred training strategies (learning rate, optimiser, etc) and keep all training strategies consistent to ensure fairness. To compare results in Visual Decathlon Challenge, you may directly borrow the results presented in the paper. To fairly compare in your research, please build your multi-task network with the same backbone architecture.
  3. From my personal experience, designing a better architecture is usually more helpful (and easier) than finding a better task weighting in multi-task learning.

Citation

If you found this code/work to be useful in your own research, please considering citing the following:

@inproceedings{liu2019end,
  title={End-to-End Multi-task Learning with Attention},
  author={Liu, Shikun and Johns, Edward and Davison, Andrew J},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={1871--1880},
  year={2019}
}

Acknowledgement

We would like to thank Simon Vandenhende for his help on MTAN-DeepLabv3 design; Jialong Wu on his generous contribution to benchmarking MTAN-DeepLabv3, and implementation on data augmentation for NYUv2 dataset.

Contact

If you have any questions, please contact [email protected].

mtan's People

Contributors

lorenmt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtan's Issues

Training MTAN-DeepLabv3

Hi,

I'm trying to run your resnet_mtan.py to do two semantic label tasks. I used your utils.py and create_dataset.py.

From resnet_mtan.py I modified the # Task specific decoders part as follows:

        for i, t in enumerate(self.tasks):
            #out[i] = F.interpolate(self.decoders[i](a_4[i]), size=out_size, mode='bilinear', align_corners=True)
            if t == 'segmentation1':
                out[i] = F.log_softmax(out[i], dim=1)
            if t == 'segmentation2':
                out[i] = F.log_softmax(out[i], dim=1)
            #if t == 'normal':
            #    out[i] = out[i] / torch.norm(out[i], p=2, dim=1, keepdim=True)
        return out

But I get this error (AttributeError: 'int' object has no attribute 'log_softmax'):

Parameter Space: ABS: 72062479.0, REL: 2.8847
LOSS FORMAT: SEMANTIC-1_LOSS MEAN_IOU PIX_ACC | SEMANTIC-2_LOSS MEAN_IOU PIX_ACC
Standard training strategy without data augmentation.
torch.Size([512, 512])
Traceback (most recent call last):
  File "resnet_mtan.py", line 177, in <module>
    two_task_trainer(nyuv2_train_loader,
  File "/home/models/mtan/im2im_pred/utils.py", line 359, in two_task_trainer
    train_pred = two_task_model(train_data)
  File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "resnet_mtan.py", line 112, in forward
    out[i] = F.log_softmax(out[i], dim=1)
  File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 1535, in log_softmax
    ret = input.log_softmax(dim)
AttributeError: 'int' object has no attribute 'log_softmax'

Could you help me to solve the error?

Also, I think this line (out[i] = F.interpolate(self.decoders[i](a_4[i]), size=out_size, mode='bilinear', align_corners=True) ) aime to do depth prediction, but I'm not sure what the out-size variable is for. So, I remove it based on my tasks, is that right?

Your help is greatly appreciated.

cityscapes depth

I noticed that the number of invalid depth pixels in a cityscapes depth image you provided is more than three-quarters of the total number of pixels in the image. So that's the original number of invalid depth or did you do the extra processing? In addition, during the training, the rel-depth-error is too large. Can you explain your preprocessing about the raw depth data in cityscapes? Is it the same as the following:

disparity = (p-1)/256, for each pixel p with p > 0
depth = baseline * fx / disparity, where baseline=0.20, fx=2262

about uncert

Thank you for your excellent code!

In README, there is the option of 'uncert' in 'weight', but there are no details about 'uncert' in the code. Can you provide the relevant code? Thank you very much!

Unstable evaluation results on NYUv2

Hello, sorry to bother you.
I ran the same code several times but got unstable evaluate results. Here is the numbers:

MEAN_IOU PIX_ACC ABS_ERR REL_ERR MEAN MED <11.25 <22.5 <30
semantic (2 times) 0.1693 0.548
0.178 0.5685
depth (2 times) 0.6336 0.2739
0.6398 0.2585
normal (3 times) 31.8452 26.0211 0.2161 0.4419 0.5633
30.3995 23.9483 0.2406 0.4779 0.5975
29.788 23.3448 0.2453 0.4861 0.6058
split (2 times) 0.1858 0.5571 0.6323 0.288 32.4637 27.4159 0.2028 0.4184 0.5412
0.164 0.5255 0.6479 0.3159 33.8992 28.8828 0.1806 0.395 0.5174

Details:

  1. results above are the average of last 10 epoches.
  2. I used most of default options, which means I ran:
python model_segnet_single.py --task [task] --dataroot nyuv2

and

python model_segnet_split.py --dataroot nyuv2
  1. I made some modification on the code. There are mainly two:
    • move scheduler.step() according to README
    • change batch size from 2 to 8

UPDATE

Here is the results of the code without any modification (except the location of scheduler.step()) to make my claim more convincing:

ABS_ERR REL_ERR
depth (2 times) 0.6615 0.2859
0.7053 0.2996

Any pretrained weights?

Do you use any pretrained weights (e.g. VGG-16 on ImageNet) or you just start from scratch? The performance gap even for the baseline Cross Stitch on NYU v2 using VGG 16 is very interesting, and I am trying to find out the reason. Thanks!

The para Temperature of DWA

您好,我想问一下联合多任务损失函数中DWA的Temperature的取值范围在多少合适呢,依据什么来选择得到文章中的2呢?(loss值的大小?),先提前感谢您的回答

NYUv2 surface normals grey boundary

Hi,

I downloaded the precomputed NYUv2 surface normals ground truth as mentioned in the readme (nyu link). I see that there is a grey border surrounding the surface normal ground truth image as in one example below:

0000

could you let me know whether you also observed this ?

Some problem of the implementation of task-specific attention networks.

Hello, I read your paper recently and thank you for sharing your codes.
I tried to read the architecture mentioned in the paper, but I can't find the attention module.
Your code' format is different with other common Pytorch code.
In my understanding, you implement attention by 1x1 depth-wise conv, and then merged the knowledge. I is different with attention in NLP, for example, the Multi-head Attention, have a k\q\v vector.
Would you please help me, thank you very much.

Code modifications to Cityscapes dataset

Hi,
@lorenmt
Thanks for releasing cityscape dataset. Now NYUV2 dataset is not compatible to cityscapes dataset since normals data hasn't been uploaded. In this image-to-image methods, should i need to major changes in architecture(CNN N/w) since Surface Normal dataset is not available for cityscapes.
Kindly suggest how to go ahead to solve this problem. Please help

Question about the architecture

Hello, I have a question about the architecture of the model. In Fig.2, is the pool in Attention Module for Encoder the same as the same stage in VGG? And so as the samp in Decoder part.
Thanks for your time!

preprocessing code

Hi, @lorenmt . Thanks for your code!
I'm following your work and have some questions. Considering you only show us your pre-processed label, .npy file , without any explanation, could you please provide your preprocessing code about original datasets? I want to know more details, such as data argumentation, calculation methods about normal and so on through your preprocessing code.

how to save and evaluate the model in the im2im_pred folder?

First, Thanks for your amazing work!
When I run the program in the im2im_pred folder, I don't know how to save and evaluate the model. And I don't know whether I've missed some of the highlights abou this aspect. I would appreciate it if you could answer me.
If there are some grammatical errors, please forgive me.

NYUv2 classes

Hi.
For NYUv2 dataset, the label I read from the original dataset is more than 13 categories, maybe dozens of categories. The pre-processed label you provides is range from -1 to 12. I want to know how you transform the original labels to final labels with only 13 classes or whether you have discarded some classes? Also, do you know the numbers of final labels, e.g. -1 ,0, 1 , represent which class respectively?
I will appreciate it if you can offer more helpful information about the NYUv2 dataset.

Cannot unpickle dict_mean_std due to not mentioning python version

Hi there, I have tried multiple versions of python to unpickle the dict_mean_std but I keep getting a KeyError: 10 due to trying to unpickle in a different python version. Would you be so kind to tell me either the python version you used to pickle this object or the actual mean and std values. Thanks in advance!

minimal NYU dataset

Hello,
I wonder if you provide a subset of the Nyu dataset to enable people to run the code without downloading 8.5 GB.

Thanks

model_segnet_mtan prediction code

Hello and sorry to bother.
I am currently trying to train a multitask segnet model on NYUv2 dataset. I just have a few questions about it and hope you could help me.

  1. After running the code and training for long enough, where the trained model will save? The wide resnet codes have torch.save at the bottom lines of the script. but the segnet models don't include the same lines and basically after the system finishes running the script I can't find any model as the result. Of course results of the evaluation on all 3 tasks will be given to me but I just can't find the saved model.

  2. I also wanted to ask for your help about the prediction code. Is there a way to give the trained model and one desired image to some prediction script and get 3 different image outputs for semantic masks and depth plus the surface normalization?

Best regards.

number of parameters

`def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('Parameter Space: ABS: {:.1f}, REL: {:.4f}\n'.format(count_parameters(SegNet),
count_parameters(SegNet)/24981069))`

what is the meaning of 24981069", why is the total parameters devided by 24981069 ? What is the unit of parameters?

Questions about decathlon challenge

Thank you for sharing code. I'm puzzled about training python model_wrn_eval.py --dataset 'notimagenet'. I can't find the flags about this setting.

The Compute for Evaluation may not stable

Hello, sorry to bother. But I found that using different batch_size settings may cause different Evaluation Results. For Example, the follows are Metrics in SGNet-MTAN, and the batch_size is used as test_batch_size.(RMSE is modified according to the equation)

Batch_size RMSE Abs_Rel M-IOU Pixel_ACC MEAN MED <11.25 Times
1 0.7974 0.2550 0.1993 0.5536 30.9889 27.0419 0.1986 49s
2 0.8138 0.2545 0.1993 0.5536 30.9630 26.9051 0.1988 47s
4 0.8241 0.2536 0.1993 0.5532 30.9538 26.8244 0.1988 47s
8 0.8321 0.2526 0.1993 0.5531 30.9388 26.7398 0.1989 46s

Different Single-task results from the paper?

Hello - thank you so much for uploading high-quality code along with your paper.

I'm interested in reproducing Table 3, starting with "One Task". I ran 'model_segnet_single.py' with task=semantic, and got mIOU scores which looked a lot higher than the paper (17.82 vs 15.10). Is this expected? Did something change in the codebase since when the paper was submitted?

Thanks!

Here are the first 30 epochs of logs:
Epoch: 0000 | TRAIN: 1.9690 0.0748 0.3519 TEST: 1.7898 0.0909 0.3818
Epoch: 0001 | TRAIN: 1.7170 0.1067 0.4119 TEST: 1.7185 0.1073 0.4083
Epoch: 0002 | TRAIN: 1.6698 0.1123 0.4205 TEST: 1.6828 0.1334 0.4168
Epoch: 0003 | TRAIN: 1.6397 0.1254 0.4343 TEST: 1.6710 0.1163 0.4221
Epoch: 0004 | TRAIN: 1.6036 0.1329 0.4447 TEST: 1.6538 0.1332 0.4249
Epoch: 0005 | TRAIN: 1.5880 0.1350 0.4478 TEST: 1.6328 0.1306 0.4315
Epoch: 0006 | TRAIN: 1.5570 0.1429 0.4593 TEST: 1.6215 0.1559 0.4438
Epoch: 0007 | TRAIN: 1.5254 0.1507 0.4703 TEST: 1.6558 0.1374 0.4345
Epoch: 0008 | TRAIN: 1.5026 0.1545 0.4773 TEST: 1.5411 0.1455 0.4665
Epoch: 0009 | TRAIN: 1.4802 0.1562 0.4838 TEST: 1.5541 0.1525 0.4637
Epoch: 0010 | TRAIN: 1.4455 0.1616 0.4960 TEST: 1.5362 0.1441 0.4690
Epoch: 0011 | TRAIN: 1.4185 0.1616 0.5024 TEST: 1.5183 0.1539 0.4725
Epoch: 0012 | TRAIN: 1.3854 0.1654 0.5138 TEST: 1.5059 0.1523 0.4774
Epoch: 0013 | TRAIN: 1.3608 0.1668 0.5205 TEST: 1.4475 0.1524 0.4972
Epoch: 0014 | TRAIN: 1.3269 0.1698 0.5317 TEST: 1.4648 0.1649 0.5012
Epoch: 0015 | TRAIN: 1.2967 0.1731 0.5422 TEST: 1.4877 0.1551 0.4949
Epoch: 0016 | TRAIN: 1.2693 0.1778 0.5522 TEST: 1.4616 0.1596 0.4787
Epoch: 0017 | TRAIN: 1.2284 0.1843 0.5667 TEST: 1.4990 0.1772 0.5040
Epoch: 0018 | TRAIN: 1.1794 0.1926 0.5853 TEST: 1.4896 0.1568 0.4949
Epoch: 0019 | TRAIN: 1.1606 0.1974 0.5917 TEST: 1.4416 0.1698 0.5186
Epoch: 0020 | TRAIN: 1.0995 0.2084 0.6145 TEST: 1.4340 0.1686 0.5167
Epoch: 0021 | TRAIN: 1.0474 0.2168 0.6341 TEST: 1.4271 0.1649 0.5252
Epoch: 0022 | TRAIN: 1.0047 0.2268 0.6466 TEST: 1.4728 0.1714 0.5108
Epoch: 0023 | TRAIN: 0.9398 0.2376 0.6708 TEST: 1.4875 0.1721 0.5255
Epoch: 0024 | TRAIN: 0.8777 0.2489 0.6930 TEST: 1.5140 0.1708 0.5050
Epoch: 0025 | TRAIN: 0.8138 0.2587 0.7159 TEST: 1.5678 0.1716 0.5222
Epoch: 0026 | TRAIN: 0.7687 0.2687 0.7326 TEST: 1.5570 0.1745 0.5309
Epoch: 0027 | TRAIN: 0.6881 0.2884 0.7633 TEST: 1.5163 0.1762 0.5360
Epoch: 0028 | TRAIN: 0.6309 0.2993 0.7832 TEST: 1.7214 0.1782 0.5246
Epoch: 0029 | TRAIN: 0.5857 0.3097 0.8008 TEST: 1.7705 0.1710 0.5105
Epoch: 0030 | TRAIN: 0.5333 0.3222 0.8184 TEST: 1.7390 0.1679 0.5236

file structure

Wish you show me your file(data) structure.I can't run this program since I don't konw how to prepare the data

Missing items in cityscapes/val/label_19

Hi there,
I downloaded the [256x512, 2.42GB] version of the Cityscapes dataset and found that there are only 490 files in val/label_19 but 500 in other corresponding folders. I checked the Dropbox folder roughly and found out that some files like 390.npy is missing. Could you please help me with the missing files?
Thank you very much!

Conv init

Hi,
thanks for the code! It seems that the initialization for Conv is never used in model_wrn_mtan.py:

def conv_init(m):

Are MTAN modules sensitive to initializations in your experience? Was there a reason you chose Xavier uniform specifically?
Best

why miou decrease when training?

Hi, I ran your official MTAN-DeepLabv3 and find that miou decrease when training while picAcc stay steady and loss on validate set are increasing consistently.

I also modify it to become resnet_split.py, resnet_single.py, resnet_cs.py. Same trends can be found.

When training MTL model, benchmarks in depth and surface normal estimation are increasing/decreasing normally.

I also run model_segnet_*.py. Decreasing on miou when training also exists but is much smaller and imperceptible. I think it is because of low miou and accuracy with SegNet.

I'm a new hand in NYUv2 and segmentation, I am not sure whether it is because of overfitting.

Some implement details:

  • I use the same code in model_segnet_*.py for compute miou and pixAcc except a slight modification to accelerate training.

  • optimizer and lr_scheduler are the same with model_segnet_*.py

  • for more details, my code can be find in resnet_mtan.py and train_utils.py

Here are some records(mtan, split, single, and mixed respectively):

image
image
image
image

DWA

您好,我想问一下您的多任务损失函数中DWA是在哪定义的和调用呢?另外,在多任务网络中可以存在多个优化器嘛?

Results of directly running `model_segnet_single.py`

Hi @lorenmt , thanks for releasing your code!

I am running the single task script model_segnet_single.py and the final result seems to be not the same as reported in your paper. Here is my training log semantic-1022070532.log.

I just ran the script without modifying any params setting in the code. The best is about 12%, while in the paper the best is 15%. Could you please check the params setting?

Thanks.

Training with my own dataset

Hi lorenmt!

Thank you for sharing your code. I would like to train the model_segnet_mtan.py with my own dataset. I have prepared the data to have the same size (500x500) and to be in the same format as in create_dataset.py. However, when I've started the training I got the following error:

Parameter Space: ABS: 44229076.0, REL: 1.7705
LOSS FORMAT: SEMANTIC_LOSS MEAN_IOU PIX_ACC | DEPTH_LOSS ABS_ERR REL_ERR | NORMAL_LOSS MEAN MED <11.25 <22.5 <30
Standard training strategy without data augmentation.
image: torch.Size([3, 500, 500])
semantic: torch.Size([500, 500])
depth: torch.Size([500, 500])
normal: torch.Size([500, 500])
Traceback (most recent call last):
  File "model_segnet_mtan.py", line 222, in <module>
    200)
  File "/home/models/mtan/im2im_pred/utils.py", line 158, in multi_task_trainer
    train_pred, logsigma = multi_task_model(train_data)
  File "/home/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "model_segnet_mtan.py", line 143, in forward
    g_upsampl[i] = self.up_sampling(g_decoder[i - 1][-1], indices[-i - 1])
  File "/home/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/miniconda3/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 356, in forward
    self.padding, output_size)
  File "/home/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 598, in max_unpool2d
    return torch._C._nn.max_unpool2d(input, indices, output_size)
RuntimeError: Shape of input must match shape of indices

It seems the error because of the input shape. Do I need to resize the images or can the model accept images with size 500x500?, how can I modify the indices shape? Could you help me to solve the error, I'm new to PyTorch?.

depth label

I noticed that your used depth is range in [-1,1], and you set invalid depth value as 0 so that a normal and depth ignored mask is used in your code . But from NYUv2 dataset homepage, I can't find any mask.
I want to know why you not use the original depth ranged in [0,10], which is used by many other depth estimation methods and how you get the depth invalid mask.

About the instantiation of the model

How to use the generated multi-tasking network model to complete the semantic segmentation and depth estimation of a single image, as shown in Figure 4 of the paper?

NYU2_ dataset, npy file

I'm sorry but I can't understand the npy file, like this.
image
where can I get this npy file?Thank you

about the backbone model

Hi, @lorenmt,

I read your code and found that on NYUv2, you use VGG as the backbone, but

  1. it is not implemented by vgg in torchvision, which exact vgg structure do you use?
  2. it is not initialized by the model pretrained on ImageNet. Why is it the case?

Thanks.

My own dataset

Good evening,

I would like to use your repo with my own dataset. Images are of size 256x256 pixels and I have 5 different ground truth sets of the same size. Some are binary images and some other tasks contain multiple classes.

Could you please guide me on what I need to change to train your proposed model?

Thank you very much for your time.

Avg cost function

Hello !
Can you explain please why did you initialize the avg cost function with 24?
eg : avg_cost = np.zeros([total_epoch, 24], dtype=np.float32)

keras implementation

thank you for sharing your cod impressive work,
is there any implementation in keras?

Some questions about visual_decathlon

hi @lorenmt Thanks for sharing your high-quality code .

  1. I download the dataset and devkit at the official Visual Decathlon Challenge website, and run python model_wrn_mtan.py for training , the result seems much lower than yours . Is there someting wrong with my code? and I find visual_decathlon did not use DWA module

EPOCH: 0398 | DATASET: aircraft || TRAIN: 4.4714 0.0295 || TEST: 4.5622 0.0113
EPOCH: 0398 | DATASET: cifar100 || TRAIN: 3.9217 0.1044 || TEST: 4.4817 0.0232
EPOCH: 0398 | DATASET: daimlerpedcls || TRAIN: 0.5747 0.7077 || TEST: 0.8359 0.5173
EPOCH: 0398 | DATASET: dtd || TRAIN: 3.4813 0.1200 || TEST: 3.5718 0.0908
EPOCH: 0398 | DATASET: gtsrb || TRAIN: 2.4235 0.2809 || TEST: 3.5151 0.0662
EPOCH: 0398 | DATASET: omniglot || TRAIN: 6.6661 0.0077 || TEST: 8.3716 0.0012
EPOCH: 0398 | DATASET: svhn || TRAIN: 2.0999 0.2669 || TEST: 2.3221 0.0960
EPOCH: 0398 | DATASET: ucf101 || TRAIN: 4.2835 0.0416 || TEST: 4.2502 0.0405
EPOCH: 0398 | DATASET: vgg-flowers || TRAIN: 3.8070 0.1078 || TEST: 3.5612 0.1459

EPOCH: 0399 | DATASET: aircraft || TRAIN: 4.4398 0.0367 || TEST: 4.5639 0.0136
EPOCH: 0399 | DATASET: cifar100 || TRAIN: 3.7655 0.1223 || TEST: 4.4952 0.0246
EPOCH: 0399 | DATASET: daimlerpedcls || TRAIN: 0.5523 0.7139 || TEST: 0.8194 0.6353
EPOCH: 0399 | DATASET: dtd || TRAIN: 3.5056 0.1171 || TEST: 3.5246 0.1028
EPOCH: 0399 | DATASET: gtsrb || TRAIN: 2.3447 0.2975 || TEST: 3.5864 0.0551
EPOCH: 0399 | DATASET: omniglot || TRAIN: 6.9420 0.0035 || TEST: 7.3892 0.0014
EPOCH: 0399 | DATASET: svhn || TRAIN: 2.1091 0.2652 || TEST: 2.3105 0.1013
EPOCH: 0399 | DATASET: ucf101 || TRAIN: 4.3354 0.0334 || TEST: 4.2975 0.0337
EPOCH: 0399 | DATASET: vgg-flowers || TRAIN: 4.2414 0.0676 || TEST: 3.7590 0.0885

  1. when I run python coco_results.py, there is a error:
    Traceback (most recent call last):
    File "coco_results.py", line 7, in
    pickle_in = open("imagenet.pickle","rb")
    FileNotFoundError: [Errno 2] No such file or directory: 'imagenet.pickle'

I can't find this file 'imagenet.pickle'

Cityscapes depth

Hi, @lorenmt. I downloaded your processed Cityscapes dataset and found that the values in those numpy arrays are >= 0 (probably most of them are <0.5). And when I load the official Cityscapes disparity, the values are also >=0, but are much larger (maybe ~30000). Would you tell me how do you pre-process the original disparity data to get those numpy arrays? Thanks in advance.

Mean of the sum with only one element

Dear Shikun Liu,

First of all, thanks for sharing your code.

I was checking it out and noticed that it has this line for your final loss used by both the dwa and equal weighting schemes:

loss = torch.mean(sum(lambda_weight[i, index] * train_loss[i] for i in range(3)))

In this line of code, I believe the torch.mean is not doing anything and might be confusing, as the input vector will be 1x1d, i.e., the sum of lambda * train_loss is 1x1-d.

Honestly, I am not sure if dividing the total loss by 3 would make any difference, but just to clarify, your intent was to have something like:

torch.mean(torch.Tensor([lambda_weight[i, index] * train_loss[i] for i in range(3)]))

or
torch.sum(torch.Tensor([lambda_weight[i, index] * train_loss[i] for i in range(3)]))

Thanks,

Joao

loss = torch.mean(sum(lambda_weight[i, index] * train_loss[i] for i in range(3)))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.