Comments (16)
Hello. I believe that is because your image size 500 is not divisible by 16, so when you downsampled the images, the pixels shifted a bit, the saved sampling indices would not match the size of upsampled features.
An easy fix is to resize the input images to the size that is divisible by 16, like 256 as an example.
Let me know whether that solves your issue.
from mtan.
Hi!
Thank you for your quick answer.
I've padded the images to 512 to be divisible by 16 and it seems that solves this issue. Thanks!
from mtan.
Hi again :),
My dataset is very large which caused the following error:
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 5.79 GiB total capacity; 4.48 GiB already allocated; 10.31 MiB free; 25.81 MiB cached)
I see the model uses only 1 GPU. So, I'm trying to train the model with 2 GPUs by wrapping the model in nn.DataParallel
as follow:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = torch.nn.DataParallel(SegNet())
SegNet_MTAN = model.to(device)
But I got this error:
Standard training strategy without data augmentation.
Traceback (most recent call last):
File "model_segnet_mtan.py", line 230, in <module>
5)
File "/home/models/mtan/im2im_pred/utils.py", line 152, in multi_task_trainer
conf_mat = ConfMatrix(multi_task_model.class_nb)
File "/home/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 591, in __getattr__
type(self).__name__, name))
AttributeError: 'DataParallel' object has no attribute 'class_nb'
could you help me to solve it?
from mtan.
I think after you wrapped the model in a multi-gpu setting, the model class becomes model.module. So you need to access the class_nb by multi_task_model.module.class_nb.
I would suggest to check out pytorch documentation for these questions.
Best,
from mtan.
Thanks, that solves the error.
I'm using the model to do three semantic tasks with a different number of classes. So, instead of semantic, depth and normal, I have semantic label, semantic label_2 and semantic label_3. I've modified the model_segnet_mtan.py
and utils.py
files, but when I'm using the semantic label_2 I got nan
value:
Parameter Space: ABS: 44234471.0, REL: 1.7707
LOSS FORMAT: SEMANTIC-1_LOSS MEAN_IOU PIX_ACC | SEMANTIC-1_LOSS MEAN_IOU PIX_ACC | SEMANTIC-2_LOSS MEAN_IOU PIX_ACC
Standard training strategy without data augmentation.
Epoch: 0000 | TRAIN: 1.8811 0.0339 0.5995 | 1.5766 0.0374 0.7284 | 2.6665 0.0122 0.6375 ||TEST: 1.0568 0.0376 0.7893 | 1.0827 0.0376 0.7893 | 1.5529 nan 0.7953
Epoch: 0001 | TRAIN: 0.9069 0.0390 0.8198 | 0.9206 0.0390 0.8198 | 1.2247 0.0142 0.8263 ||TEST: 1.0012 0.0376 0.7893 | 1.0153 0.0376 0.7893 | 1.3278 nan 0.7953
Epoch: 0002 | TRAIN: 0.8436 0.0390 0.8198 | 0.8585 0.0390 0.8198 | 1.0681 0.0142 0.8263 ||TEST: 0.9553 0.0376 0.7893 | 0.9683 0.0376 0.7893 | 1.2469 nan 0.7953
Epoch: 0003 | TRAIN: 0.8302 0.0390 0.8198 | 0.8416 0.0390 0.8198 | 1.0364 0.0142 0.8263 ||TEST: 0.9744 0.0376 0.7891 | 0.9848 0.0377 0.7884 | 1.2454 nan 0.7953
Epoch: 0004 | TRAIN: 0.8391 0.0391 0.8198 | 0.8458 0.0392 0.8197 | 1.0312 0.0142 0.8263 ||TEST: 1.9153 0.0377 0.7890 | 1.6111 0.0386 0.7873 | 2.0755 nan 0.7953
While when I'm using semantic label_3 I got the following error:
Parameter Space: ABS: 44237786.0, REL: 1.7709
LOSS FORMAT: SEMANTIC-1_LOSS MEAN_IOU PIX_ACC | SEMANTIC-1_LOSS MEAN_IOU PIX_ACC | SEMANTIC-3_LOSS MEAN_IOU PIX_ACC
Standard training strategy without data augmentation.
Traceback (most recent call last):
File "model_segnet_mtan.py", line 230, in <module>
10)
File "/mtan/im2im_pred/utils.py", line 165, in multi_task_trainer
model_fit(train_pred[2], train_label3, 'semantic')]
File "/mtan/im2im_pred/utils.py", line 22, in model_fit
loss = F.nll_loss(x_pred, x_output, ignore_index=-1)
File "/home/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 1873, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /pytorch/aten/src/THNN/generic/SpatialClassNLLCriterion.c:109
Could you give me some advice, thank you.
from mtan.
For 3 segmentation tasks, you probably need to make sure the prediction classes for each classifer head is correct... It looks like you probably predict a larger number of classes compared to a pre-defined ground-truth classes. And you need to define confusion matrices for each semantic class so the miou score would be computed correctly.
from mtan.
Thanks for your reply!
Can you clarify this part more "And you need to define confusion matrices for each semantic class so the miou score would be computed correctly."
This is how I modified utils.py
def multi_task_trainer(train_loader, test_loader, multi_task_model, device, optimizer, scheduler, opt, total_epoch=200):
train_batch = len(train_loader)
test_batch = len(test_loader)
T = opt.temp
avg_cost = np.zeros([total_epoch, 19], dtype=np.float32)
lambda_weight = np.ones([3, total_epoch])
for index in range(total_epoch):
cost = np.zeros(19, dtype=np.float32)
# apply Dynamic Weight Average
if opt.weight == 'dwa':
if index == 0 or index == 1:
lambda_weight[:, index] = 1.0
else:
w_1 = avg_cost[index - 1, 0] / avg_cost[index - 2, 0]
w_2 = avg_cost[index - 1, 3] / avg_cost[index - 2, 3]
w_3 = avg_cost[index - 1, 6] / avg_cost[index - 2, 6]
lambda_weight[0, index] = 3 * np.exp(w_1 / T) / (np.exp(w_1 / T) + np.exp(w_2 / T) + np.exp(w_3 / T))
lambda_weight[1, index] = 3 * np.exp(w_2 / T) / (np.exp(w_1 / T) + np.exp(w_2 / T) + np.exp(w_3 / T))
lambda_weight[2, index] = 3 * np.exp(w_3 / T) / (np.exp(w_1 / T) + np.exp(w_2 / T) + np.exp(w_3 / T))
# iteration for all batches
multi_task_model.train()
train_dataset = iter(train_loader)
conf_mat = ConfMatrix(multi_task_model.module.class_nb)
conf_mat2 = ConfMatrix(multi_task_model.module.class_nb2)
conf_mat3 = ConfMatrix(multi_task_model.module.class_nb3)
for k in range(train_batch):
train_data, train_label, train_label2, train_label3 = train_dataset.next()
train_data, train_label = train_data.to(device), train_label.long().to(device)
train_label2, train_label3 = train_label2.long().to(device), train_label3.long().to(device)
train_pred, logsigma = multi_task_model(train_data)
optimizer.zero_grad()
train_loss = [model_fit(train_pred[0], train_label, 'semantic'),
model_fit(train_pred[1], train_label2, 'semantic'),
model_fit(train_pred[2], train_label3, 'semantic')]
if opt.weight == 'equal' or opt.weight == 'dwa':
loss = sum([lambda_weight[i, index] * train_loss[i] for i in range(3)])
else:
loss = sum(1 / (2 * torch.exp(logsigma[i])) * train_loss[i] + logsigma[i] / 2 for i in range(3))
loss.backward()
optimizer.step()
# accumulate label prediction for every pixel in training images
conf_mat.update(train_pred[0].argmax(1).flatten(), train_label.flatten())
conf_mat2.update(train_pred[1].argmax(1).flatten(), train_label2.flatten())
conf_mat3.update(train_pred[2].argmax(1).flatten(), train_label3.flatten())
cost[0] = train_loss[0].item()
cost[3] = train_loss[1].item()
cost[6] = train_loss[2].item()
avg_cost[index, :9] += cost[:9] / train_batch
# compute mIoU and acc
avg_cost[index, 1:3] = conf_mat.get_metrics()
avg_cost[index, 4:6] = conf_mat2.get_metrics()
avg_cost[index, 7:9] = conf_mat3.get_metrics()
# evaluating test data
multi_task_model.eval()
conf_mat = ConfMatrix(multi_task_model.module.class_nb)
conf_mat2 = ConfMatrix(multi_task_model.module.class_nb2)
conf_mat3 = ConfMatrix(multi_task_model.module.class_nb3)
with torch.no_grad(): # operations inside don't track history
test_dataset = iter(test_loader)
for k in range(test_batch):
test_data, test_label, test_label2, test_label3 = test_dataset.next()
test_data, test_label = test_data.to(device), test_label.long().to(device)
test_label2, test_label3 = test_label2.long().to(device), test_label3.long().to(device)
test_pred, _ = multi_task_model(test_data)
test_loss = [model_fit(test_pred[0], test_label, 'semantic'),
model_fit(test_pred[1], test_label2, 'semantic'),
model_fit(test_pred[2], test_label3, 'semantic')]
conf_mat.update(test_pred[0].argmax(1).flatten(), test_label.flatten())
conf_mat2.update(test_pred[1].argmax(1).flatten(), test_label2.flatten())
conf_mat3.update(test_pred[2].argmax(1).flatten(), test_label3.flatten())
cost[9] = test_loss[0].item()
cost[12] = test_loss[1].item()
cost[15] = test_loss[2].item()
avg_cost[index, 9:] += cost[9:] / test_batch
# compute mIoU and acc
avg_cost[index, 10:12] = conf_mat.get_metrics()
avg_cost[index, 13:15] = conf_mat2.get_metrics()
avg_cost[index, 16:18] = conf_mat3.get_metrics()
scheduler.step()
print('Epoch: {:04d} | TRAIN: {:.4f} {:.4f} {:.4f} | {:.4f} {:.4f} {:.4f} | {:.4f} {:.4f} {:.4f} ||'
'TEST: {:.4f} {:.4f} {:.4f} | {:.4f} {:.4f} {:.4f} | {:.4f} {:.4f} {:.4f} '
.format(index, avg_cost[index, 0], avg_cost[index, 1], avg_cost[index, 2], avg_cost[index, 3],
avg_cost[index, 4], avg_cost[index, 5], avg_cost[index, 6], avg_cost[index, 7], avg_cost[index, 8],
avg_cost[index, 9], avg_cost[index, 10], avg_cost[index, 11], avg_cost[index, 12], avg_cost[index, 13],
avg_cost[index, 14], avg_cost[index, 15], avg_cost[index, 16], avg_cost[index, 17], avg_cost[index, 18]))
Is that what you meant or do you mean to define confusion matrices for each task?
from mtan.
Yes. This modification looks correct to me.
from mtan.
Hello, sorry to bother you.
I still get nan value, even when I'm using model_segnet_single.py.
Do you think it is related to the linear layer? the number of classes that I have is 58.
from mtan.
Maybe try to remove F.log_softmax in prediction and change the loss function to F.cross_entropy(pred, gt).
If you still have the issue, then it's definitely from your dataset or your implementation.
from mtan.
OK, I modify it like the following:
t1_pred = F.cross_entropy(self.pred_task1(atten_decoder[0][-1][-1]), GT)
t2_pred = F.cross_entropy(self.pred_task2(atten_decoder[1][-1][-1]), Gt)
t3_pred = F.cross_entropy(self.pred_task3(atten_decoder[2][-1][-1]), GT)
return [t1_pred, t2_pred, t3_pred], self.logsigma
Sorry, Could you let me know what should I pass to the GTs?
from mtan.
Sorry What I meant is, do:
t1_pred = self.pred_task1(atten_decoder[0][-1][-1])
t2_pred = self.pred_task2(atten_decoder[1][-1][-1])
...
and modify the loss function in utils.py:
Line 22 in 268c5c1
into F.cross_entropy(x_pred, x_output, ignore_index=-1)
-1 represents the index you wish to ignore. You probably need to modify that as well according to the configuration in your dataset.
from mtan.
Oh!!! That was a silly mistake! I thought that my dataset is 0 indexed but I found that it is 1 indexed!! So sorry to take your time. Thank you very much for the help!
from mtan.
Hi, sorry to bother you.
I would like to modify the mtan model to do two tasks instead of three. Depending on the stan model I modify the code as the following:
- reduce j from 3 to 2:
for j in range(2):
if j < 2:
self.encoder_att.append(nn.ModuleList([self.att_layer([filter[0], filter[0], filter[0]])]))
self.decoder_att.append(nn.ModuleList([self.att_layer([2 * filter[0], filter[0], filter[0]])]))
for i in range(4):
self.encoder_att[j].append(self.att_layer([2 * filter[i + 1], filter[i + 1], filter[i + 1]]))
self.decoder_att[j].append(self.att_layer([filter[i + 1] + filter[i], filter[i], filter[i]]))
- reduce i from 3 to 2:
# define task dependent attention module
for i in range(2):
for j in range(5):
if j == 0:
- logsigma
self.logsigma = nn.Parameter(torch.FloatTensor([-0.5, -0.5]))
- makes the mtan model returns only 3 values
return [t1_pred, t2_pred], self.logsigma
So, I'm wondering if I did that correctly, especially for steps 1 and 3.
also, why is the total parameters divided by 24981069?
Thanks!
from mtan.
# Multi-task Attention Network:
class MTANSegNet(nn.Module):
def __init__(self, tasks=['segmentation', 'depth', 'normal'],
out_channels={'segmentation': 13, 'depth': 1, 'normal': 3}):
super(MTANSegNet, self).__init__()
# initialise network parameters
filter = [64, 128, 256, 512, 512]
self.tasks = tasks
self.num_tasks = len(tasks)
# define encoder decoder layers
self.encoder_block = nn.ModuleList([self.conv_layer([3, filter[0]])])
self.decoder_block = nn.ModuleList([self.conv_layer([filter[0], filter[0]])])
for i in range(4):
self.encoder_block.append(self.conv_layer([filter[i], filter[i + 1]]))
self.decoder_block.append(self.conv_layer([filter[i + 1], filter[i]]))
# define convolution layer
self.conv_block_enc = nn.ModuleList([self.conv_layer([filter[0], filter[0]])])
self.conv_block_dec = nn.ModuleList([self.conv_layer([filter[0], filter[0]])])
for i in range(4):
if i == 0:
self.conv_block_enc.append(self.conv_layer([filter[i + 1], filter[i + 1]]))
self.conv_block_dec.append(self.conv_layer([filter[i], filter[i]]))
else:
self.conv_block_enc.append(nn.Sequential(self.conv_layer([filter[i + 1], filter[i + 1]]),
self.conv_layer([filter[i + 1], filter[i + 1]])))
self.conv_block_dec.append(nn.Sequential(self.conv_layer([filter[i], filter[i]]),
self.conv_layer([filter[i], filter[i]])))
# define task attention layers
self.encoder_att = nn.ModuleList([nn.ModuleList([self.att_layer([filter[0], filter[0], filter[0]])])])
self.decoder_att = nn.ModuleList([nn.ModuleList([self.att_layer([2 * filter[0], filter[0], filter[0]])])])
self.encoder_block_att = nn.ModuleList([self.conv_layer([filter[0], filter[1]])])
self.decoder_block_att = nn.ModuleList([self.conv_layer([filter[0], filter[0]])])
for j in range(self.num_tasks):
if j < (self.num_tasks - 1):
self.encoder_att.append(nn.ModuleList([self.att_layer([filter[0], filter[0], filter[0]])]))
self.decoder_att.append(nn.ModuleList([self.att_layer([2 * filter[0], filter[0], filter[0]])]))
for i in range(4):
self.encoder_att[j].append(self.att_layer([2 * filter[i + 1], filter[i + 1], filter[i + 1]]))
self.decoder_att[j].append(self.att_layer([filter[i + 1] + filter[i], filter[i], filter[i]]))
for i in range(4):
if i < 3:
self.encoder_block_att.append(self.conv_layer([filter[i + 1], filter[i + 2]]))
self.decoder_block_att.append(self.conv_layer([filter[i + 1], filter[i]]))
else:
self.encoder_block_att.append(self.conv_layer([filter[i + 1], filter[i + 1]]))
self.decoder_block_att.append(self.conv_layer([filter[i + 1], filter[i + 1]]))
self.pred_task = nn.ModuleList([self.conv_layer([filter[0], out_channels[t]], pred=True) for t in tasks])
# define pooling and unpooling functions
self.down_sampling = nn.MaxPool2d(kernel_size=2, stride=2, return_indices=True)
self.up_sampling = nn.MaxUnpool2d(kernel_size=2, stride=2)
def conv_layer(self, channel, pred=False):
if not pred:
conv_block = nn.Sequential(
nn.Conv2d(in_channels=channel[0], out_channels=channel[1], kernel_size=3, padding=1),
nn.BatchNorm2d(num_features=channel[1]),
nn.ReLU(inplace=True),
)
else:
conv_block = nn.Sequential(
nn.Conv2d(in_channels=channel[0], out_channels=channel[0], kernel_size=3, padding=1),
nn.Conv2d(in_channels=channel[0], out_channels=channel[1], kernel_size=1, padding=0),
)
return conv_block
def att_layer(self, channel):
att_block = nn.Sequential(
nn.Conv2d(in_channels=channel[0], out_channels=channel[1], kernel_size=1, padding=0),
nn.BatchNorm2d(channel[1]),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=channel[1], out_channels=channel[2], kernel_size=1, padding=0),
nn.BatchNorm2d(channel[2]),
nn.Sigmoid(),
)
return att_block
def forward(self, x):
g_encoder, g_decoder, g_maxpool, g_upsampl, indices = ([0] * 5 for _ in range(5))
for i in range(5):
g_encoder[i], g_decoder[-i - 1] = ([0] * 2 for _ in range(2))
# define attention list for tasks
atten_encoder, atten_decoder = ([0] * self.num_tasks for _ in range(2))
for i in range(self.num_tasks):
atten_encoder[i], atten_decoder[i] = ([0] * 5 for _ in range(2))
for i in range(self.num_tasks):
for j in range(5):
atten_encoder[i][j], atten_decoder[i][j] = ([0] * 3 for _ in range(2))
# define global shared network
for i in range(5):
if i == 0:
g_encoder[i][0] = self.encoder_block[i](x)
g_encoder[i][1] = self.conv_block_enc[i](g_encoder[i][0])
g_maxpool[i], indices[i] = self.down_sampling(g_encoder[i][1])
else:
g_encoder[i][0] = self.encoder_block[i](g_maxpool[i - 1])
g_encoder[i][1] = self.conv_block_enc[i](g_encoder[i][0])
g_maxpool[i], indices[i] = self.down_sampling(g_encoder[i][1])
for i in range(5):
if i == 0:
g_upsampl[i] = self.up_sampling(g_maxpool[-1], indices[-i - 1])
g_decoder[i][0] = self.decoder_block[-i - 1](g_upsampl[i])
g_decoder[i][1] = self.conv_block_dec[-i - 1](g_decoder[i][0])
else:
g_upsampl[i] = self.up_sampling(g_decoder[i - 1][-1], indices[-i - 1])
g_decoder[i][0] = self.decoder_block[-i - 1](g_upsampl[i])
g_decoder[i][1] = self.conv_block_dec[-i - 1](g_decoder[i][0])
# define task dependent attention module
for i in range(self.num_tasks):
for j in range(5):
if j == 0:
atten_encoder[i][j][0] = self.encoder_att[i][j](g_encoder[j][0])
atten_encoder[i][j][1] = (atten_encoder[i][j][0]) * g_encoder[j][1]
atten_encoder[i][j][2] = self.encoder_block_att[j](atten_encoder[i][j][1])
atten_encoder[i][j][2] = F.max_pool2d(atten_encoder[i][j][2], kernel_size=2, stride=2)
else:
atten_encoder[i][j][0] = self.encoder_att[i][j](torch.cat((g_encoder[j][0], atten_encoder[i][j - 1][2]), dim=1))
atten_encoder[i][j][1] = (atten_encoder[i][j][0]) * g_encoder[j][1]
atten_encoder[i][j][2] = self.encoder_block_att[j](atten_encoder[i][j][1])
atten_encoder[i][j][2] = F.max_pool2d(atten_encoder[i][j][2], kernel_size=2, stride=2)
for j in range(5):
if j == 0:
atten_decoder[i][j][0] = F.interpolate(atten_encoder[i][-1][-1], scale_factor=2, mode='bilinear', align_corners=True)
atten_decoder[i][j][0] = self.decoder_block_att[-j - 1](atten_decoder[i][j][0])
atten_decoder[i][j][1] = self.decoder_att[i][-j - 1](torch.cat((g_upsampl[j], atten_decoder[i][j][0]), dim=1))
atten_decoder[i][j][2] = (atten_decoder[i][j][1]) * g_decoder[j][-1]
else:
atten_decoder[i][j][0] = F.interpolate(atten_decoder[i][j - 1][2], scale_factor=2, mode='bilinear', align_corners=True)
atten_decoder[i][j][0] = self.decoder_block_att[-j - 1](atten_decoder[i][j][0])
atten_decoder[i][j][1] = self.decoder_att[i][-j - 1](torch.cat((g_upsampl[j], atten_decoder[i][j][0]), dim=1))
atten_decoder[i][j][2] = (atten_decoder[i][j][1]) * g_decoder[j][-1]
# define task prediction layers
out = [0 for _ in self.tasks]
for i, t in enumerate(self.tasks):
out[i] = self.pred_task[i](atten_decoder[i][-1][-1])
if t == 'normal':
out[i] = out[i] / torch.norm(out[i], p=2, dim=1, keepdim=True)
return out
You can follow this implementation which should be cleaner in terms of scaling tasks.
And 24981069 represents the parameter size of single-task learning. So it's easier to compare the parameter size relatively in different networks.
from mtan.
Thank you, your reply is much appreciated.
from mtan.
Related Issues (20)
- depth and segmentation label in cityscape HOT 1
- NYUv2 surface normals grey boundary HOT 1
- Missing items in cityscapes/val/label_19 HOT 2
- Definition of multi-task loss function HOT 5
- Training MTAN-DeepLabv3 HOT 10
- About the instantiation of the model HOT 1
- DWA HOT 1
- The para Temperature of DWA HOT 6
- How do you pre-process NYUv2 dataset? HOT 7
- about the uncertainty loss HOT 4
- about visulization of the result HOT 1
- Avg Cost accumulation HOT 2
- why dwa didn't consider common scale of gradient? HOT 8
- 关于Attention Module的设计 HOT 4
- results inconsistency HOT 19
- Question about the pre-processed cityscapes HOT 5
- DWA的loss梯度爆炸问题 HOT 2
- Question about the structure of the encoder_block_att. HOT 2
- Used Mtan on depth estimation and object detection HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mtan.