qianyu-dlut / mvanet Goto Github PK

License: MIT License

Python 100.00%

mvanet's Issues

"Can I only initialize random training from scratch now? How can I fine-tune the model based on your pre-trained model?"

Arbitrary input size and einops error

First of all thanks a lot for pushing this repository 🙌.

I am having troubles in processing inputs of arbitrary size: when processing an image of size [1, 3, 864, 1280] the model throws the following error:

einops.EinopsError:  Error while processing rearrange-reduction pattern "b c (hg h) (wg w) -> (hg wg b) c h w".
 Input tensor shape: torch.Size([1, 128, 27, 40]). Additional info: {'hg': 2, 'wg': 2}.
 Shape mismatch, can't divide axis of length 27 in chunks of 2

Which it seems is caused by this line:

MVANet/model/MVANet.py

Line 319 in ff270a6

 patched_glb = rearrange(glb, 'b c (hg h) (wg w) -> (hg wg b) c h w', hg=2, wg=2) 

I've noticed in predict.py all inputs are resized to 1024x1204, I assume exactly for this reason. Is resizing inputs to a standard size the correct strategy here?

Any plans to release pretrained models

hey guys are there any plans to release the pre-trained models ?

Error in prediction

@qianyu-dlut

can you please suggest why i am getting the result in the below image when i am running the predict.py

inf_MCRM and MCRM weights names discreptancy

@qianyu-dlut thanks for this great work, i was having one more question regarding MCRM module.

MCRM naming is linear1 and linear2 here

inf_MCRM naming is linear3 and linear4 here

In Model_80.pth, there are 4 different layers linear1 / linear2 / linear3 / linear4

dec_blk1.linear1.weight
dec_blk1.linear1.bias
dec_blk1.linear2.weight
dec_blk1.linear2.bias
dec_blk1.linear3.weight
dec_blk1.linear3.bias
dec_blk1.linear4.weight
dec_blk1.linear4.bias

and the values are different :

import torch

pretrained_dict = torch.load("./saved_model/MVANet/Model_80.pth", map_location='cuda')
print('dec_blk1.linear1.weight', torch.sum(pretrained_dict['dec_blk1.linear1.weight']))
print('dec_blk1.linear3.weight', torch.sum(pretrained_dict['dec_blk1.linear3.weight']))

outputs

dec_blk1.linear1.weight tensor(2.0187, device='cuda:0')
dec_blk1.linear3.weight tensor(-0.5632, device='cuda:0')

What is the difference between linear1 and linear3 ?

Thanks for your help

Questions About 'MCRM' Module: Positional Encoding and Output Feature

Hello,

Thank you for your excellent work on this project!

While reviewing the code, I noticed a few discrepancies between the implementation and the manuscript's description, specifically in the "MCRM" module. According to the manuscript, the local feature should include positional encoding before applying the cross-attention mechanism. However, in the code, the local feature is directly used as the key and value for cross-attention without adding positional information.

MVANet/model/MVANet.py

Line 275 in ff270a6

loc_ = rearrange(loc, 'nl c h w -> nl (h w) 1 c')

Additionally, the manuscript states that the output of the "MCRM" module is derived from the element-wise sum of the "updated local feature" and the "global feature." In contrast, the code seems to compute the output feature using the "updated local feature" and the "local feature."

MVANet/model/MVANet.py

Line 283 in ff270a6

src = loc.view(4, c, -1).permute(2, 0, 1) + self.dropout1(outputs)

Could these be potential bugs in the implementation?
Thank you again for your impressive work! I look forward to your clarification.

Removing OpenMMLab dependencies

Any chance you'd be open to removing the openmmlab dependencies for this repository? It's a pretty hefty set of dependencies, and from what I can tell, the only thing all those libraries are used for is a logger class that is then used to log the timm library's load_checkpoint function.

I'd be happy to replace that logger with another without dependencies and submit a pull request if you're open to it. It would make using your project much simpler!

Training error : Relu variables has been modified by an inplace operation

Thanks for this work, very interesting paper

in place error raised

I faced following error while trying to run

python train.py

Result :

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [256, 1, 256]], which is output 0 of ReluBackward0

After doing some investigation i feel the problem is coming from self.activation = get_activation_fn('relu') and m.inplace = True

I have been able to find a workaround be using gelu instead of relu, but i'm still not sure why is this piece of code :

        for m in self.modules():
            if isinstance(m, nn.ReLU) or isinstance(m, nn.Dropout):
                m.inplace = True

Full trace

❯ python train.py 
Generator Learning Rate: 1e-05
/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/PIL/Image.py:3179: DecompressionBombWarning: Image size (101824320 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
  DecompressionBombWarning,
/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/PIL/Image.py:3179: DecompressionBombWarning: Image size (102717153 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
  DecompressionBombWarning,
/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 12 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
  warnings.warn(warning.format(ret))
Generator Learning Rate: 1e-05
/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/nn/functional.py:3734: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/PIL/Image.py:3179: DecompressionBombWarning: Image size (102521250 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
  DecompressionBombWarning,
/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/autograd/__init__.py:199: UserWarning: Error detected in ReluBackward0. Traceback of forward call that caused the error:
  File "train.py", line 103, in <module>
    sideout5, sideout4, sideout3, sideout2, sideout1, final, glb5, glb4, glb3, glb2, glb1, tokenattmap4, tokenattmap3,tokenattmap2,tokenattmap1= generator.forward(images)
  File "/home/piercus/repos/mvanet/model/MVANet.py", line 412, in forward
    e5 = self.multifieldcrossatt(loc_e5, glb_e5)  # (4,128,16,16)
  File "/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/piercus/repos/mvanet/model/MVANet.py", line 141, in forward
    activated = self.activation(linear1)
  File "/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/nn/functional.py", line 1457, in relu
    result = torch.relu(input)
  File "/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/fx/traceback.py", line 57, in format_stack
    return traceback.format_stack()
 (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:114.)
  allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
  File "train.py", line 126, in <module>
    scaler.scale(loss).backward()
  File "/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/_tensor.py", line 489, in backward
    self, gradient, retain_graph, create_graph, inputs=inputs
  File "/home/piercus/miniconda3/envs/mvanet/lib/python3.7/site-packages/torch/autograd/__init__.py", line 199, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [256, 1, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

how to export onnx?

This is working fine. Wish to add conversion method
https://github.com/hpc203/MVANet-BiRefNet-onnxrun

Environemnt setup

Hi, thank you for sharing your code, trying to setup the environment with pip, conda without succes, can you write a guidance on how to install all dependencies. Ty

Feature Request: Add model to HuggingFace

Would be nice to have this hosted on HuggingFace, similar to how BiRefNet does it: https://huggingface.co/ZhengPeng7/BiRefNet

Error when using batch size > 1

I am having errors due to the line
loc_e5, glb_e5 = e5.split([4, 1], dim=0)
(https://github.com/qianyu-dlut/MVANet/blob/main/model/MVANet.py#L418)
when training with batch size > 1

here, e5 will have leading (5 * batch_size) and hence split([4,1]) (is possible only for 5) is not possible for any batch size > 1

when I dug deeper, the batch index was mixed up (for instance , in https://github.com/qianyu-dlut/MVANet/blob/main/model/MVANet.py#L38)

The exact error i got was:

Traceback (most recent call last):
File "./train.py", line 1117, in
sideout5, sideout4, sideout3, sideout2, sideout1, final, glb5, glb4, glb3, glb2, glb1, tokenattmap4, tokenattmap3, tokenattmap2, tokenattmap1 = generator.forward(
File "./train.py", line 882, in forward
loc_e5, glb_e5 = e5.split([4, 1], dim=0)
File "/lib/python3.10/site-packages/torch/_tensor.py", line 921, in split
return torch._VF.split_with_sizes(self, split_size, dim)
RuntimeError: split_with_sizes expects split_sizes to sum exactly to 10 (input tensor's size at dimension 0), but got split_sizes=[4, 1]

Is it possible to fix this while still retaining the exact architecture of the model (finetune on personal datasets starting from the pretrained 80th epoch)?

qianyu-dlut / mvanet Goto Github PK

mvanet's Issues

"Can I only initialize random training from scratch now? How can I fine-tune the model based on your pre-trained model?"

Arbitrary input size and einops error

Any plans to release pretrained models

Error in prediction

inf_MCRM and MCRM weights names discreptancy

Questions About 'MCRM' Module: Positional Encoding and Output Feature

Removing OpenMMLab dependencies

Training error : Relu variables has been modified by an inplace operation

in place error raised

Full trace

how to export onnx?

Environemnt setup

Feature Request: Add model to HuggingFace

Error when using batch size > 1

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent