Git Product home page Git Product logo

latentsplat's Introduction

latentSplat

arXiv

This is the code for latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction by Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen.

Check out the project website here.

teaser.mp4

Installation

To get started, create a conda environment:

conda env create -f environment.yml
conda activate latentsplat

or a virtual environment using Python 3.10+:

python3.10 -m venv latentsplat
source latentsplat/bin/activate
pip install -r requirements.txt

Please not that for training you need to download the pre-trained checkpoints of the VAE-GAN from LDM as explained in section Acquiring Pre-trained Checkpoints.

Troubleshooting

If you face unrealistic CUDA out of memory issues (probably because of different GPU architectures during kernel compilation and training), try deinstalling the rasterizer and installing it with specified architectures:

pip uninstall diff-gaussian-rasterization
TORCH_CUDA_ARCH_LIST="6.0 7.0 7.5 8.0 8.6+PTX" pip install git+https://github.com/Chrixtar/latent-gaussian-rasterization

Acquiring Datasets

Please move all dataset directories into a newly created datasets folder in the project root directory or modify the root path as part of the dataset config files in config/dataset.

RealEstate10k

For experiments on RealEstate10k, we use the same dataset version and preprocessing into chunks as pixelSplat. Please refer to their codebase here for information about how to obtain the data.

CO3D

Simply download the hydrant and teddybear categories of CO3D from here, extract them into the created datasets folder (see above), and rename them to hydrant or teddybear, respectively.

Acquiring Pre-trained Checkpoints

We provide two sets of checkpoints as part of our releases here:

  1. Pre-trained autoencoders and discriminators from LDM adapted for finetuning within latentSplat. They serve as a starting point for latentSplat training. Please download the [pretrained.zip] and extract it in the project root directory for training from scratch.

  2. Trained versions of latentSplat for RealEstate10k and CO3D hydrants and teddybears.

Running the Code

Training

The main entry point is src/main.py. Call it via:

python3 -m src.main +experiment=co3d_hydrant

This configuration requires a GPU with at least 40 GB of VRAM. To reduce memory usage, you can change the batch size as follows:

python3 -m src.main +experiment=co3d_hydrant data_loader.train.batch_size=1

Our code supports multi-GPU training. The above batch size is the per-GPU batch size.

Evaluation

To render frames from an existing checkpoint, run the following:

python3 -m src.main +experiment=co3d_hydrant mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index/co3d_hydrant_extra.json checkpointing.load=checkpoints/co3d_hydrant.ckpt

Ablations

You can run the ablations from the paper by using the corresponding experiment configurations. For example, to ablate the deterministic version:

python3 -m src.main +experiment=co3d_hydrant_det

Camera Conventions

Our extrinsics are OpenCV-style camera-to-world matrices. This means that +Z is the camera look vector, +X is the camera right vector, and -Y is the camera up vector. Our intrinsics are normalized, meaning that the first row is divided by image width, and the second row is divided by image height.

Figure Generation Code

We've included the scripts that generate tables and figures in the paper. Note that since these are one-offs, they might have to be modified to be run.

BibTeX

@inproceedings{wewer24latentsplat,
    title     = {latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction},
    author    = {Wewer, Christopher and Raj, Kevin and Ilg, Eddy and Schiele, Bernt and Lenssen, Jan Eric},
    booktitle = {arXiv},
    year      = {2024},
}

Acknowledgements

This project was partially funded by the Saarland/Intel Joint Program on the Future of Graphics and Media. We also thank David Charatan and co-authors for the great pixelSplat codebase, on which our implementation is based on.

latentsplat's People

Contributors

chrixtar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.