Git Product home page Git Product logo

croco's Introduction

CroCo + CroCo v2 / CroCo-Stereo / CroCo-Flow

[CroCo arXiv] [CroCo v2 arXiv] [project page and demo]

This repository contains the code for our CroCo model presented in our NeurIPS'22 paper CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion and its follow-up extension published at ICCV'23 Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow, refered to as CroCo v2:

image

@inproceedings{croco,
  title={{CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion}},
  author={{Weinzaepfel, Philippe and Leroy, Vincent and Lucas, Thomas and Br\'egier, Romain and Cabon, Yohann and Arora, Vaibhav and Antsfeld, Leonid and Chidlovskii, Boris and Csurka, Gabriela and Revaud J\'er\^ome}},
  booktitle={{NeurIPS}},
  year={2022}
}

@inproceedings{croco_v2,
  title={{CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow}},
  author={Weinzaepfel, Philippe and Lucas, Thomas and Leroy, Vincent and Cabon, Yohann and Arora, Vaibhav and Br{\'e}gier, Romain and Csurka, Gabriela and Antsfeld, Leonid and Chidlovskii, Boris and Revaud, J{\'e}r{\^o}me}, 
  booktitle={ICCV},
  year={2023}
}

License

The code is distributed under the CC BY-NC-SA 4.0 License. See LICENSE for more information. Some components are based on code from MAE released under the CC BY-NC-SA 4.0 License and timm released under the Apache 2.0 License. Some components for stereo matching and optical flow are based on code from unimatch released under the MIT license.

Preparation

  1. Install dependencies on a machine with a NVidia GPU using e.g. conda. Note that habitat-sim is required only for the interactive demo and the synthetic pre-training data generation. If you don't plan to use it, you can ignore the line installing it and use a more recent python version.
conda create -n croco python=3.7 cmake=3.14.0
conda activate croco
conda install habitat-sim headless -c conda-forge -c aihabitat
conda install pytorch torchvision -c pytorch
conda install notebook ipykernel matplotlib
conda install ipywidgets widgetsnbextension
conda install scikit-learn tqdm quaternion opencv # only for pretraining / habitat data generation
  1. Compile cuda kernels for RoPE

CroCo v2 relies on RoPE positional embeddings for which you need to compile some cuda kernels.

cd models/curope/
python setup.py build_ext --inplace
cd ../../

This can be a bit long as we compile for all cuda architectures, feel free to update L9 of models/curope/setup.py to compile for specific architectures only. You might also need to set the environment CUDA_HOME in case you use a custom cuda installation.

In case you cannot provide, we also provide a slow pytorch version, which will be automatically loaded.

  1. Download pre-trained model

We provide several pre-trained models:

modelname pre-training data pos. embed. Encoder Decoder
CroCo.pth Habitat cosine ViT-B Small
CroCo_V2_ViTBase_SmallDecoder.pth Habitat + real RoPE ViT-B Small
CroCo_V2_ViTBase_BaseDecoder.pth Habitat + real RoPE ViT-B Base
CroCo_V2_ViTLarge_BaseDecoder.pth Habitat + real RoPE ViT-L Base

To download a specific model, i.e., the first one (CroCo.pth)

mkdir -p pretrained_models/
wget https://download.europe.naverlabs.com/ComputerVision/CroCo/CroCo.pth -P pretrained_models/

Reconstruction example

Simply run after downloading the CroCo_V2_ViTLarge_BaseDecoder pretrained model (or update the corresponding line in demo.py)

python demo.py

Interactive demonstration of cross-view completion reconstruction on the Habitat simulator

First download the test scene from Habitat:

python -m habitat_sim.utils.datasets_download --uids habitat_test_scenes --data-path habitat-sim-data/

Then, run the Notebook demo interactive_demo.ipynb.

In this demo, you should be able to sample a random reference viewpoint from an Habitat test scene. Use the sliders to change viewpoint and select a masked target view to reconstruct using CroCo. croco_interactive_demo

Pre-training

CroCo

To pre-train CroCo, please first generate the pre-training data from the Habitat simulator, following the instructions in datasets/habitat_sim/README.MD and then run the following command:

torchrun --nproc_per_node=4 pretrain.py --output_dir ./output/pretraining/

Our CroCo pre-training was launched on a single server with 4 GPUs. It should take around 10 days with A100 or 15 days with V100 to do the 400 pre-training epochs, but decent performances are obtained earlier in training. Note that, while the code contains the same scaling rule of the learning rate as MAE when changing the effective batch size, we did not experimented if it is valid in our case. The first run can take a few minutes to start, to parse all available pre-training pairs.

CroCo v2

For CroCo v2 pre-training, in addition to the generation of the pre-training data from the Habitat simulator above, please pre-extract the crops from the real datasets following the instructions in datasets/crops/README.MD. Then, run the following command for the largest model (ViT-L encoder, Base decoder):

torchrun --nproc_per_node=8 pretrain.py --model "CroCoNet(enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_num_heads=12, dec_depth=12, pos_embed='RoPE100')" --dataset "habitat_release+ARKitScenes+MegaDepth+3DStreetView+IndoorVL" --warmup_epochs 12 --max_epoch 125 --epochs 250 --amp 0 --keep_freq 5 --output_dir ./output/pretraining_crocov2/

Our CroCo v2 pre-training was launched on a single server with 8 GPUs for the largest model, and on a single server with 4 GPUs for the smaller ones, keeping a batch size of 64 per gpu in all cases. The largest model should take around 12 days on A100. Note that, while the code contains the same scaling rule of the learning rate as MAE when changing the effective batch size, we did not experimented if it is valid in our case.

Stereo matching and Optical flow downstream tasks

For CroCo-Stereo and CroCo-Flow, please refer to stereoflow/README.MD.

croco's People

Contributors

jerome-revaud avatar rbregier avatar yocabon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

croco's Issues

Stereo code release time

Thanks for your amazing job! One question, when will be release about code and models for CroCo-Stereo?

Tiling-based Inference

Hi,

I wonder why the inference results looks so different with images I took and images in dataset.
Bike:
im0
im1
bike
My images:
grill
grill0
grill1

Bike looks so smooth but I just use the same setting for inference.

Best

About Submission to Spring.

Dear author,

Thank you for your interesting contribution. Do you mind sharing the code for submitting to Spring?

Many thanks

Availability of using my own images on CrocoFlow

Thank you for your CrocoStereo and CrocoFlow codes.
I have read through the CrocoFlow codes to use my own image dates to see the results.
However, I am having difficulty doing so and thus is there any description or inference model available?
Thanks.

pre-training details

Hi,
Could you please specify the meaning of "warmup learning rate=1e-6" in the pre-training stage? Does it mean that the learning rate starts from 1e-6 and linearly grows to 1.5e-4?

Additionally, for an image pair, the Homography and Color jittering augmentation should be applied to each individual one independently, right?

Thank you for your attention and time!

The submission of MPI-sintel Dataset

Hello,

I want to express my gratitude for sharing your code โ€“ it has been incredibly helpful.

I have a query regarding the submission process for the MPI-Sintel test dataset results. I attempted to register at http://sintel.is.tue.mpg.de/sessions, but encountered a technical issue. The website displayed the error message "ActiveRecord::StatementInvalid in UsersController#create SQLite3::SQLException: cannot rollback - no transaction is active: rollback transaction". This seems to prevent me from creating a new account.

Could you please advise on how to proceed with the submission, or if there's an alternative method available?

Thank you for your assistance and time.

The result

I have a question to ask. In the paper, the D1-all value for KITTI 2015 is 2.03. However, in the KITTI 2015 stereo benchmark, the D1-all value is 1.59. Was this improvement achieved by adjusting parameters or implementing other effective methods?

[CNN architecture support]

Out of curiosity, may i ask is there any possibility to make a version of CNN based CROCO self-supervised pipeline?

Question about .pth file setting

Hello, many thanks for your interesting paper.

May I ask you about the settings under which these two checkpoint files:
stereoflow_models/crocostereo_subtrain_vitb_smalldecoder.pth, stereoflow_models/crocostereo_subtrain_vitb_basedecoder.pth were trained?
I'm curious about the number of epochs, the dataset used, and whether there was a separate pretraining .pth file used when creating above .pth files.
Thank you.

Compile cuda kernels for RoPE Fail

Hello,
I have problem compiling cuda kernels for RoPE. This seem to be some kind of compatibility issue with ninja, PyTorch, cuda and etc. Can you specify your chosen version of ninja-build, cuda, torch, python extensions while running Crocov2? Also, how much gpu memory is required to run Crocov2? Thank you.

(croco) youngkwan@deep16:~/croco/models/curope$ python setup.py build_ext --inplace
running build_ext
building 'curope' extension
Emitting ninja build file /home/youngkwan/croco/models/curope/build/temp.linux-x86_64-cpython-37/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /usr/local/cuda/bin/nvcc -I/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/include -I/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/include/TH -I/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/youngkwan/anaconda3/envs/croco/include/python3.7m -c -c /home/youngkwan/croco/models/curope/kernels.cu -o /home/youngkwan/croco/models/curope/build/temp.linux-x86_64-cpython-37/kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --ptxas-options=-v --use_fast_math -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_37,code=compute_37 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=curope -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
FAILED: /home/youngkwan/croco/models/curope/build/temp.linux-x86_64-cpython-37/kernels.o
/usr/local/cuda/bin/nvcc -I/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/include -I/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/include/TH -I/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/youngkwan/anaconda3/envs/croco/include/python3.7m -c -c /home/youngkwan/croco/models/curope/kernels.cu -o /home/youngkwan/croco/models/curope/build/temp.linux-x86_64-cpython-37/kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --ptxas-options=-v --use_fast_math -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_37,code=compute_37 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=curope -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/local/cuda/include/cuda_runtime.h:83,
from :
/usr/local/cuda/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 9 are not supported!
138 | #error -- unsupported GNU version! gcc versions later than 9 are not supported!
| ^~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1522, in _run_ninja_build
env=env)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "setup.py", line 33, in
'build_ext': BuildExtension
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 653, in build_extensions
build_ext.build_extensions(self)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
self._build_extensions_serial()
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
self.build_extension(ext)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 556, in build_extension
depends=ext.depends,
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 482, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1238, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/home/youngkwan/anaconda3/envs/croco/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1538, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Issue with building RoPE - CUDA MISMATCH

Hi,
I am getting this error on trying to build RoPE:

    raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError: 
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

Any help on how I can fix this?

domain generalization of croco-stereo

Helle, thanks for your shared code and checkpoints.
I am interested in the domain generalization ability of croco-stereo. I evaluate the pre-trained checkpoint "crocostereo.pth" on KITTI datasets
However, I got the D1 error rate of 14.00 on KITTI 2012 and 19.38 on KITTI 2015, which is worse than recent stereo networks.
Could you let me know if you have evaluated the generalization ability and whether I made an improper inference process?
I feel a little strange as the checkpoint is trained on large scale data and with popular data augmentation.

pre-training code

Dear author,

Thank you for your interesting contribution. Do you mind sharing the pre-training code, too? 

Many thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.