Comments (2)
Thank you for your answer!
from croco.
Hi,
For pre-training, we indeed use 256x256 images (both for habitat and real image pairs) from which we extract 224x224 crops.
What we find the most important for downstream tasks is to both train and test at the same resolution, even if different from pre-training. This is why we use a tiling-based approach for stereo/flow at test time. While relative positional embedding helps, it is not enough to generalize to any resolution at test time.
Overall, specially once real image pairs are included, the pre-training should be effective irrespectively of the focals or resolution of the downstream tasks. Pre-training at higher resolution is likely to be better but it would be slow( DINOv2 actually did pre-train first at 224x224 before doing a second step at larger resolution, and a similar strategy could be used there if needed).
Best
Philippe
from croco.
Related Issues (20)
- Issue with building RoPE - CUDA MISMATCH HOT 1
- train croco-stereo with a dataset without disparity map HOT 1
- [CNN architecture support] HOT 3
- Question about .pth file setting HOT 7
- Data generation without metadata takes forever HOT 1
- MegaDepth does not contain images used in Crops dataset HOT 1
- The submission of MPI-sintel Dataset HOT 1
- Tiling-based Inference HOT 4
- pre-training details HOT 12
- How long does it take to train Croco-Stereo? HOT 2
- [W reducer.cpp:320] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance. HOT 1
- ./data/crop_metadata does not exist
- Is there any bug in the pytorch RoPE codes? HOT 2
- The result HOT 2
- Stereo code release time HOT 2
- About Submission to Spring. HOT 2
- Compile cuda kernels for RoPE Fail HOT 4
- Availability of using my own images on CrocoFlow HOT 2
- domain generalization of croco-stereo HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from croco.