swintransformer / ait Goto Github PK

View Code? Open in Web Editor NEW

98.0 98.0 8.0 1.64 MB

Python 100.00%

ait's People

Contributors

Stargazers

Watchers

Forkers

dl-vit braca51e evdcush paperwave xinyuhou97 sunnyxiaohu hungle45

ait's Issues

Error(s) in loading state_dict for VQVAE

Thank you for your nice work!
However, after training VA-VAE on depth estimation, I tried to train task-solver on depth estimation, the following error comes out:

Error(s) in loading state_dict for VQVAE:
        Missing key(s) in state_dict: "encoder.0.weight", "encoder.0.bias", "encoder.2.weight", "encoder.2.bias", "encoder.4.weight", "encoder.4.bias", "encoder.6.weight", "encoder.6.bias", "encoder.8.weight", "encoder.8.bias", "encoder.10.net.0.weight", "encoder.10.net.0.bias", "encoder.10.net.2.weight", "encoder.10.net.2.bias", "encoder.10.net.4.weight", "encoder.10.net.4.bias", "encoder.11.net.0.weight", "encoder.11.net.0.bias", "encoder.11.net.2.weight", "encoder.11.net.2.bias", "encoder.11.net.4.weight", "encoder.11.net.4.bias", "encoder.12.weight", "encoder.12.bias", "decoder.0.weight", "decoder.0.bias", "decoder.2.net.0.weight", "decoder.2.net.0.bias", "decoder.2.net.2.weight", "decoder.2.net.2.bias", "decoder.2.net.4.weight", "decoder.2.net.4.bias", "decoder.3.net.0.weight", "decoder.3.net.0.bias", "decoder.3.net.2.weight", "decoder.3.net.2.bias", "decoder.3.net.4.weight", "decoder.3.net.4.bias", "decoder.4.weight", "decoder.4.bias", "decoder.6.weight", "decoder.6.bias", "decoder.8.weight", "decoder.8.bias", "decoder.10.weight", "decoder.10.bias", "decoder.12.weight", "decoder.12.bias", "decoder.14.weight", "decoder.14.bias", "_vq_vae._embedding", "_vq_vae._ema_cluster_size", "_vq_vae._ema_w".

How can I solve it? Thank you.

Single Image Inference

How can i perform inferencing with my custom set of images? What changes do I need to do for data pre processing? Do I need to change val dict under data in AiT/ait/configs/swinv2b_480reso_depthonly.py ?

RuntimeError: The size of tensor a (256) must match the size of tensor b (225) at non-singleton dimension 1

Dear author:
Thanks for your meaning work. During inference, I met 'RuntimeError: The size of tensor a (256) must match the size of tensor b (225) at non-singleton dimension 1'.

a

Small typo

Hi, great work! I just noticed a small typo :
In the inference section of the readme, the supposedly <model_checkpoint> is written <model_checkpiont>

Swin-S and Swin-Ti weights

Thank you for releasing your code! I am wondering if you happen to have any pre-trained checkpoints for Swin-S and Swin-Ti? or even just the ImageNet-1k weights. The ImageNet-1k pre-trained weights would be more preferable, as I can't seem to find these released anywhere with matching sizes.

Thanks!

Some problem with visualizing the depth of pred and gt.

Thanks for your work. I meet some problems with visualizing the depth of pred and gt. Here is the location to visualize them in

AiT/ait/code/model/depth/depth.py

Lines 157 to 159 in ca2c2d1

 for pred_d, depth_gt in results: 

 pred_crop, gt_crop = cropping_img(pred_d, depth_gt) 

 computed_result = eval_depth(pred_crop, gt_crop)

    for pred_d, depth_gt in results:
        '''visualize 'pred_d'''
        pred_crop, gt_crop = cropping_img(pred_d, depth_gt)
         ''' After reshaping, visualize 'pred_crop, gt_crop'''
        computed_result = eval_depth(pred_crop, gt_crop)

this is cmd:
CUDA_VISIBLE_DEVICES=5,6,7 python -m torch.distributed.launch --nproc_per_node=3 code/train.py configs/swinv2b_480reso_depthonly.py --cfg-options model.task_heads.depth.vae_cfg.pretrained=vqvae_depth.pt --eval ait_joint_swinv2b.pth

However, the results of pred_d,pred_crop and gt_crop are very similar. The results of them are like this picture[The picture is almost white]

Training time

Hi, interesting work! Can you share the approximate time to train the VQVAE and the task solver on both tasks? Thanks!

train/visualize on single GPU

Hello!
I am trying to evaluate it by one GPU,but found a lot of errors.
I am new in these,do you have the code for a single GPU?
Best wishes

'PublicAccessNotPermitted' when download the checkpoints

Hi, thank you for the excellent work!
I come across some troubles when I download the checkpoints using wget, it raises an error 'PublicAccessNotPermitted'. I would like to know how to download them properly, especially the pre-trained backbone models.
Thank you in advance!

Unable to evaluate the results

Hello,

I am trying to run these models to evaluate the results, however I am not able to do that due to errors at runtime.

The best "result" I could get is by with this Dockerfile (at the root of the project):

FROM nvidia/cuda:11.4.3-cudnn8-devel-ubuntu18.04

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Etc/UTC
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

# Install system dependencies
RUN apt-get update && \
    apt-get install -y \
    git \
    wget \
    python3-pip \
    python3-dev \
    python3-opencv \
    python3-six

RUN python3 -m pip install --upgrade pip

RUN pip3 install setuptools openmim

# Install PyTorch and torchvision
RUN pip3 install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html
RUN python3 -m pip install h5py albumentations tensorboardX gdown scipy

RUN python3 -m mim install mmcv

# Upgrade pip

WORKDIR /

RUN wget http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat -O nyu_depth_v2_labeled.mat

RUN git clone https://github.com/vinvino02/GLPDepth.git --depth 1

RUN mv GLPDepth/code/utils/logging.py GLPDepth/code/utils/glp_depth_logging.py


# Set the working directory
WORKDIR /app


RUN python3 ../GLPDepth/code/utils/extract_official_train_test_set_from_mat.py ../nyu_depth_v2_labeled.mat ../GLPDepth/datasets/splits.mat ./data/nyu_depth_v2/official_splits/


# RUN ln -s data ait/data


COPY requirements.txt requirements.txt

RUN python3 -m pip install -r requirements.txt

COPY . .

RUN rm -rf .git

Built the Dockerfile with:

sudo docker build -t mde . -f Dockerfile

And run with:

sudo docker run --name mde-test --gpus all --ipc=host -it --rm -v $(pwd):/app mde

Finally running the evaluation command. For example:

cd ait
python3 -m torch.distributed.launch --nproc_per_node=1 code/train.py configs/swinv2b_480reso_parallel_depthonly.py  --cfg-options model.task_heads.depth.vae_cfg.pretrained=../models/vqvae_depth_2bp.pt --eval ../models/ait_depth_swinv2b_parallel.pth

In this way, the inference process is launched, eventually an anonymous error happen:

eval task depth
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 654/654, 2.5 task/s, elapsed: 262s, ETA:     0sERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 34) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/run.py", line 713, in run
    )(*cmd_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
===================================================
code/train.py FAILED
---------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
---------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-26_03:01:18
  host      : f50427e7ad50
  rank      : 0 (local_rank: 0)
  exitcode  : -9 (pid: 34)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 34
===================================================

Are the authors able to provide the versions of all the software they are using? In particular:

Linux version and distribution
CUDA version
Python version
Packages version (in the requirements, some versions are missing)
Any other relevant information about

Thanks.

There is a bug in dataset maybe. Might cause over-fitting maybe.

Thanks for yours sharing.

    transform = [
        A.Crop(x_min=41, y_min=0, x_max=601, y_max=480),
        A.HorizontalFlip(),
        A.RandomCrop(crop_size[0], crop_size[1]),
    ]

In dataset./nyudepthv2.py , i found you cropped image to (480,480)[fixed region], after that a randomcrop is used.
Maybe albumentations could change the transform sequence？
I am not sure.

denorm twice in eval_coco.py

Hello! I find that /vae/utils/eval_coco.py denorm the reconstruction image twice in line 45.

if hasattr(vae, 'get_codebook_indices'):
                code = vae.get_codebook_indices(mask)
                remask = vae.decode(code)[0, 0, :, :].cpu().numpy() * 0.5 + 0.5 # why denorm here?

because in class func decode, the attr use_norm is True, so decode will denorm the image, but the code denorm after decodeing.
I will try to investigate the effect when evaluating.

	for pred_d, depth_gt in results:
	pred_crop, gt_crop = cropping_img(pred_d, depth_gt)
	computed_result = eval_depth(pred_crop, gt_crop)