Git Product home page Git Product logo

openlrm's Introduction

OpenLRM: Open-Source Large Reconstruction Models

Code License Weight License LRM

HF Models HF Demo

News

  • [2024.03.13] Update training code and release OpenLRM v1.1.1.
  • [2024.03.08] We have released the core blender script used to render Objaverse images.
  • [2024.03.05] The Huggingface demo now uses openlrm-mix-base-1.1 model by default. Please refer to the model card for details on the updated model architecture and training settings.
  • [2024.03.04] Version update v1.1. Release model weights trained on both Objaverse and MVImgNet. Codebase is majorly refactored for better usability and extensibility. Please refer to v1.1.0 for details.
  • [2024.01.09] Updated all v1.0 models trained on Objaverse. Please refer to HF Models and overwrite previous model weights.
  • [2023.12.21] Hugging Face Demo is online. Have a try!
  • [2023.12.20] Release weights of the base and large models trained on Objaverse.
  • [2023.12.20] We release this project OpenLRM, which is an open-source implementation of the paper LRM.

Setup

Installation

git clone https://github.com/3DTopia/OpenLRM.git
cd OpenLRM

Environment

Quick Start

Pretrained Models

  • Model weights are released on Hugging Face.
  • Weights will be downloaded automatically when you run the inference script for the first time.
  • Please be aware of the license before using the weights.
Model Training Data Layers Feat. Dim Trip. Dim. In. Res. Link
openlrm-obj-small-1.1 Objaverse 12 512 32 224 HF
openlrm-obj-base-1.1 Objaverse 12 768 48 336 HF
openlrm-obj-large-1.1 Objaverse 16 1024 80 448 HF
openlrm-mix-small-1.1 Objaverse + MVImgNet 12 512 32 224 HF
openlrm-mix-base-1.1 Objaverse + MVImgNet 12 768 48 336 HF
openlrm-mix-large-1.1 Objaverse + MVImgNet 16 1024 80 448 HF

Model cards with additional details can be found in model_card.md.

Prepare Images

  • We put some sample inputs under assets/sample_input, and you can quickly try them.
  • Prepare RGBA images or RGB images with white background (with some background removal tools, e.g., Rembg, Clipdrop).

Inference

  • Run the inference script to get 3D assets.

  • You may specify which form of output to generate by setting the flags EXPORT_VIDEO=true and EXPORT_MESH=true.

  • Please set default INFER_CONFIG according to the model you want to use. E.g., infer-b.yaml for base models and infer-s.yaml for small models.

  • An example usage is as follows:

    # Example usage
    EXPORT_VIDEO=true
    EXPORT_MESH=true
    INFER_CONFIG="./configs/infer-b.yaml"
    MODEL_NAME="zxhezexin/openlrm-mix-base-1.1"
    IMAGE_INPUT="./assets/sample_input/owl.png"
    
    python -m openlrm.launch infer.lrm --infer $INFER_CONFIG model_name=$MODEL_NAME image_input=$IMAGE_INPUT export_video=$EXPORT_VIDEO export_mesh=$EXPORT_MESH
    

Tips

  • The recommended PyTorch version is >=2.1. Code is developed and tested under PyTorch 2.1.2.
  • If you encounter CUDA OOM issues, please try to reduce the frame_size in the inference configs.
  • You should be able to see UserWarning: xFormers is available if xFormers is actually working.

Training

Configuration

  • We provide a sample accelerate config file under configs/accelerate-train.yaml, which defaults to use 8 GPUs with bf16 mixed precision.
  • You may modify the configuration file to fit your own environment.

Data Preparation

Run Training

  • A sample training config file is provided under configs/train-sample.yaml.

  • Please replace data related paths in the config file with your own paths and customize the training settings.

  • An example training usage is as follows:

    # Example usage
    ACC_CONFIG="./configs/accelerate-train.yaml"
    TRAIN_CONFIG="./configs/train-sample.yaml"
    
    accelerate launch --config_file $ACC_CONFIG -m openlrm.launch train.lrm --config $TRAIN_CONFIG
    

Inference on Trained Models

  • The inference pipeline is compatible with huggingface utilities for better convenience.

  • You need to convert the training checkpoint to inference models by running the following script.

    python scripts/convert_hf.py --config <YOUR_EXACT_TRAINING_CONFIG> convert.global_step=null
    
  • The converted model will be saved under exps/releases by default and can be used for inference following the inference guide.

Acknowledgement

  • We thank the authors of the original paper for their great work! Special thanks to Kai Zhang and Yicong Hong for assistance during the reproduction.
  • This project is supported by Shanghai AI Lab by providing the computing resources.
  • This project is advised by Ziwei Liu and Jiaya Jia.

Citation

If you find this work useful for your research, please consider citing:

@article{hong2023lrm,
  title={Lrm: Large reconstruction model for single image to 3d},
  author={Hong, Yicong and Zhang, Kai and Gu, Jiuxiang and Bi, Sai and Zhou, Yang and Liu, Difan and Liu, Feng and Sunkavalli, Kalyan and Bui, Trung and Tan, Hao},
  journal={arXiv preprint arXiv:2311.04400},
  year={2023}
}
@misc{openlrm,
  title = {OpenLRM: Open-Source Large Reconstruction Models},
  author = {Zexin He and Tengfei Wang},
  year = {2023},
  howpublished = {\url{https://github.com/3DTopia/OpenLRM}},
}

License

openlrm's People

Contributors

tengfei-wang avatar zexinhe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openlrm's Issues

What resources to train from scratch

Hi,

Could you please provide some numbers around which GPUs, how many of them, and how many hours roughly were required to train these models?

Question about default source camera and the render camera setting.

I noticed that only when source camera following the setting that azimuth=270, elevation=0, the triplane would render the same image as input under input pose.
My question is, when I have a image under pose $T_0$, and I want to render a image under pose $T_1$. How should I adjust the render camera?
I've tried to get the ref_pose $T_2 = T_dT_0^{-1}$ where $T_d$ is the default pose, and convert $T_1$ with ref_pose, but it totally gives an wrong image.
In my setting, the camera distance is 1.5, and the input azimuth is 2.3562(rad), elevation is -0.1317(rad)
the target azimuth is 0, elevation is 0.52359878.
The input image
006
And the target image
000
Both two image are rendered by blender.

How to load custom-trained model and inference?

Thank you for releasing the training code. I have train a model and I am wondering how to load it on the inference stage.
Could you please give me some example scripts or advice?

The checkpoints strcture:
exps/..../000100/
custom_checkpoint_0.pkl model.safetensors optimizer.bin random_states_0.pkl

Inference Issue

Hi,

I run the inference command -

python -m openlrm.launch infer.lrm --infer $INFER_CONFIG model_name=$MODEL_NAME image_input=$IMAGE_INPUT export_video=$EXPORT_VIDEO export_mesh=$EXPORT_MESH

I get the erro as shown in the below error stack -

Traceback (most recent call last):
File "/simplstor/ypatel/anaconda3/envs/lrm/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/simplstor/ypatel/anaconda3/envs/lrm/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/simplstor/ypatel/workspace/OpenLRM/openlrm/launch.py", line 18, in
from openlrm.runners import REGISTRY_RUNNERS
File "/simplstor/ypatel/workspace/OpenLRM/openlrm/runners/init.py", line 20, in
from .train import *
File "/simplstor/ypatel/workspace/OpenLRM/openlrm/runners/train/init.py", line 16, in
from .lrm import LRMTrainer
File "/simplstor/ypatel/workspace/OpenLRM/openlrm/runners/train/lrm.py", line 24, in
from .base_trainer import Trainer
File "/simplstor/ypatel/workspace/OpenLRM/openlrm/runners/train/base_trainer.py", line 54, in
class Trainer(Runner):
File "/simplstor/ypatel/workspace/OpenLRM/openlrm/runners/train/base_trainer.py", line 215, in Trainer
@ control(synchronized=True)
TypeError: 'staticmethod' object is not callable

What could be the issue ?

Error in exporting mesh and video when applying official examples

similar as closed issue #1, but I did not find an answer.

when I try the official example exporting videos, the videos are blank
python -m lrm.inferrer --model_name openlrm-base-obj-1.0 --source_image ./assets/sample_input/owl.png --export_video

In the videos I output, it shows a white image with a faintly visible object rotating in the center, and the object is very unclear with the white background. I have tried this on both 2080ti and 3090, also used the openlrm-small-obj-1.0 model and other example images, but the results are similar.

when I try the official example exporting mesh the vertices are empty.

python -m lrm.inferrer --model_name openlrm-base-obj-1.0 --source_image ./assets/sample_input/owl.png --export_mesh

lrm/models/rendering/synthesizer.py", line 189, in forward_points

for k in outs[0].keys()

IndexError: list index out of range

I looked at the result after executing forward_planes in line 165 inferrer.py.
forwardplane

and result after excuting model.synthesizer in line 157 inferrer.py.,in which the images_rgb values for each frame seem to be abnormal.
image_rgb

How can I fix it and perform similar as official example?

Matrix multiplication: not supported between 'Matrix' and 'Vector' types

I tried the blender_script.py,get the logs:

Connected to pydev debugger (build 232.10227.11) =================== CYCLES =================== Data are loaded, start creating Blender stuff glTF import finished in 0.18s Failed to render ./dataset/input/bb7a8d3861f04ab68d0e09120ce11f3a.glb Matrix multiplication: not supported between 'Matrix' and 'Vector' types
and I find the problem is from if not ignore_matrix: coord = obj.matrix_world @ coord
my package mathutils is 3.3.0 and bpy is 4.0.0 with torch ==2.1.2

MVImgNet code

could you release the part of processing MVImgNet datasets please? I notice that you use it in your experiment, but commented code of this part.

CUDA OOM in model v1.1; Not able to run v1.0

Hi,

thanks for open-sourcing the code.

After today's push, i meet CUDA OOM issue for all 6 v1.1 models with the new code base. I have a 4090 24GB which is still having the OOM issue.

Meanwhile, i have trouble with loading the model v1.0 with current code base. It shows json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 3 column 1 (char 44).

The v1.0 model is working in the last commit d4caebb.

I guess there is some trouble with the code base. Could you please have a look?

Thanks!

Seeking clarification on a code snippet in synthesizer.py

Hello 3DTopia,

I hope this message finds you well. I am a user of the OpenLRM repository and while exploring the code, I came across a specific line that I'm struggling to comprehend. The code snippet in question can be found at this link: link to the code snippet.

The line I'm referring to is:

rgb = torch.sigmoid(x[..., 1:])(1 + 2*0.001) - 0.001 # Utilizes sigmoid clamping from MipNeRF

I would greatly appreciate it if you could provide some clarification on its intended meaning and functionality. Specifically, I would like to understand how the sigmoid clamping from MipNeRF is being utilized to affect the RGB values.

Thank you for taking the time to help me understand this code better. I eagerly await your response.

Matrix multiplication: not supported between 'Matrix' and 'Vector' types

Hi, I'm trying to render one objaverse object with the provided blender script (scripts/data/objaverse/blender_script.py), but I'm getting the following error:

python scripts/data/objaverse/blender_script.py -- --object_path=304253851afd493d958fc8e256c189df.glb --output_dir=renderings/ --num_images=32
=================== CYCLES ===================
Data are loaded, start creating Blender stuff
glTF import finished in 0.00s
Failed to render 304253851afd493d958fc8e256c189df.glb
Matrix multiplication: not supported between 'Matrix' and 'Vector' types

The code is breaking here:

coord = obj.matrix_world @ coord

and I tried printing the type and shape of the two operands, and they seem ok:

<class 'Matrix'> <Matrix 4x4 ( 0.6480, 0.7616, -0.0000, -8.2287)
            (-0.7616, 0.6480, -0.0000, -4.0429)
            (-0.0000, 0.0000,  1.0000, 26.8229)
            ( 0.0000, 0.0000,  0.0000,  1.0000)>
<class 'Vector'> Vector((-12.560022354125977, -2.44964861869812, -26.822948455810547))

My environment is:

  • python 3.11.9
  • bpy 4.1.0
  • mathutils 3.3.0.

The weird thing is that this operation should actually be supported, and in fact using a different script it is working (in the same environment), the script being this one:
https://github.com/allenai/objaverse-rendering/blob/970731404ae2dd091bb36150e04c4bd6ff59f0a0/scripts/blender_script.py

How the training set is rendered for Objaverse

Hi, amazing work, and it's nice to see an open-sourced version of LRM!

I wonder how the camera and object scale are defined when rendering the Objaverse dataset.
Based on your code, the camera is at a distance of 2, but can you confirm that?
How is the scene normalized?

Thanks

Objaverse + MVImgNet Models

Awesome work! Working on a project with openlrm and i am wondering when the Objaverse + MVImgNet weights may be released?

Fine tuning on a dataset

  1. Do we need to add 'views' folder as root path in training config (views folder has rgba and pose folders) or do we need to add models to directory as well?
  2. How to configure model for fine tuning?

Pretrained models config files

Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large.

Export mesh not working

I am trying the official example but the vertices are empty

python -m lrm.inferrer --model_name lrm-base-obj-v1 --source_image ./assets/sample_input/owl.png --export_mesh

lrm/models/rendering/synthesizer.py", line 189, in forward_points
for k in outs[0].keys()
IndexError: list index out of range

How to fix "Cannot find valid data directory for uid" error?

All runs in virtual box Ubuntu linux

train_metadata.json
val_metadata.json
There is jsons with links to my dataset of png files . When I tried to start training model I have "Cannot find valid data directory for uid" error . It seems that trouble can be in links or config files that is provided downwards but I`m not sure

root_dirs:
./DATA_set/
meta_path:
train: ./DATA_set/train_metadata.json
val: ./DATA_set/val_metadata.json

Maybe there are some naming rules or something related to image tracking?

The performance of released model

I did not find information about the evaluation of the released models. Are there any evaluations of the released models, that can be related to the original paper in the same settings?

Training with .obj files

Hi ZexinHe, sorry to bother!.

I have a question about training with .obj files. Is this possible? or if you can give a hint on how to.

Thanks in advance!

Fixed focal

Thanks for your great work!
I noticed that you used a fixed focal length when rendering objaverse, which can cause the focal length of the input image to be inconsistent with the camera model. I’m curious why you didn’t use a random focal length.

Not able to run blender_script.py and also how do I train on objaverse?

I'm not able to install the following two libraries mathutils and bpy:
For bpy it says:
ERROR: No matching distribution found for bpy
For mathutils it says:
ERROR: Failed building wheels for mathutils and
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
My python version is 3.8.10

I have the objaverse dataset downloaded with objects, my question is do I have to run blender_script.py file for each .glb object in the dataset ?
Can you please provide the instructions on how to prepare the data properly for training.

ValueError: math domain error

summary

  • error happens when training
  • tested on Runpod's A100 SXM 80GB x4 GPUs, 128 vCPU 1006 GB RAM
  • runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04

reproduction of the error

  1. installation of OpenLRM was successful

  2. data preparation using blender_script.py was successful, generated 100 pairs of data each containig rgba, pose, intrinsics.npy.

  3. configuration of training_sample.yaml and accelerate_training.yaml as follows :

        
    experiment:
        type: lrm
        seed: 42
        parent: lrm-objaverse
        child: small-dummyrun
    
    model:
        camera_embed_dim: 1024
        rendering_samples_per_ray: 96
        transformer_dim: 512
        transformer_layers: 12
        transformer_heads: 8
        triplane_low_res: 32
        triplane_high_res: 64
        triplane_dim: 32
        encoder_type: dinov2
        encoder_model_name: dinov2_vits14_reg
        encoder_feat_dim: 384
        encoder_freeze: false
    
    dataset:
        subsets:
            -   name: objaverse
                root_dirs:
                    - "/root/OpenLRM/views" # modified this value
                meta_path:
                    train: "/root/OpenLRM/train_uids.json" # modified this value
                    val: "/root/OpenLRM/val_uids.json" # modified this value
                sample_rate: 1.0
        sample_side_views: 3
        source_image_res: 224
        render_image:
            low: 64
            high: 192
            region: 64
        normalize_camera: true
        normed_dist_to_center: auto
        num_train_workers: 4
        num_val_workers: 2
        pin_mem: true
    
    train:
        mixed_precision: bf16  # REPLACE THIS BASED ON GPU TYPE
        find_unused_parameters: false
        loss:
            pixel_weight: 1.0
            perceptual_weight: 1.0
            tv_weight: 5e-4
        optim:
            lr: 4e-4
            weight_decay: 0.05
            beta1: 0.9
            beta2: 0.95
            clip_grad_norm: 1.0
        scheduler:
            type: cosine
            warmup_real_iters: 3000
        batch_size: 16  # REPLACE THIS (PER GPU)
        accum_steps: 1  # REPLACE THIS
        epochs: 60  # REPLACE THIS
        debug_global_steps: null
    
    val:
        batch_size: 4
        global_step_period: 1000
        debug_batches: null
    
    saver:
        auto_resume: true
        load_model: null
        checkpoint_root: ./exps/checkpoints
        checkpoint_global_steps: 1000
        checkpoint_keep_level: 5
    
    logger:
        stream_level: WARNING
        log_level: INFO
        log_root: ./exps/logs
        tracker_root: ./exps/trackers
        enable_profiler: false
        trackers:
            - tensorboard
        image_monitor:
            train_global_steps: 100
            samples_per_log: 4
    
    compile:
        suppress_errors: true
        print_specializations: true
        disable: true
    compute_environment: LOCAL_MACHINE
    debug: false
    distributed_type: MULTI_GPU
    downcast_bf16: 'no'
    gpu_ids: all
    machine_rank: 0
    main_training_function: main
    mixed_precision: bf16
    num_machines: 1
    num_processes: 4 # only modified this value
    rdzv_backend: static
    same_network: true
    tpu_env: []
    tpu_use_cluster: false
    tpu_use_sudo: false
    use_cpu: false
  4. the error message :

    [TRAIN STEP]loss=0.624, loss_pixel=0.0577, loss_perceptual=0.566, loss_tv=0.698, lr=8.13e-6: 100%|███████████████████████████████████████████████| 60/60 [04:55<00:00,  4.92s/it]Traceback (most recent call last):
      File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File "/root/OpenLRM/openlrm/launch.py", line 36, in <module>
        main()
      File "/root/OpenLRM/openlrm/launch.py", line 32, in main
        runner.run()
      File "/root/OpenLRM/openlrm/runners/train/base_trainer.py", line 338, in run
        self.train()
      File "/root/OpenLRM/openlrm/runners/train/lrm.py", line 343, in train
        self.save_checkpoint()
      File "/root/OpenLRM/openlrm/runners/train/base_trainer.py", line 118, in wrapper
        result = accelerated_func(self, *args, **kwargs)
      File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 669, in _inner
        return PartialState().on_main_process(function)(*args, **kwargs)
      File "/root/OpenLRM/openlrm/runners/train/base_trainer.py", line 246, in save_checkpoint
        cur_order = ckpt_base ** math.floor(math.log(max_ckpt // ckpt_period, ckpt_base))
    ValueError: math domain error
    [2024-04-17 08:24:09,179] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 65932 closing signal SIGTERM
    [2024-04-17 08:24:09,183] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 65933 closing signal SIGTERM
    [2024-04-17 08:24:09,186] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 65934 closing signal SIGTERM
    [2024-04-17 08:24:09,301] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 65931) of binary: /usr/bin/python
    Traceback (most recent call last):
      File "/usr/local/bin/accelerate", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
        args.func(args)
      File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1066, in launch_command
        multi_gpu_launcher(args)
      File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 711, in multi_gpu_launcher
        distrib_run.run(args)
      File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 803, in run
        elastic_launch(
      File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 135, in __call__
        return launch_agent(self._config, self._entrypoint, list(args))
      File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
        raise ChildFailedError(
    torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
    ============================================================
    openlrm.launch FAILED
    ------------------------------------------------------------
    Failures:
      <NO_OTHER_FAILURES>
    ------------------------------------------------------------
    Root Cause (first observed failure):
    [0]:
      time      : 2024-04-17_08:24:09
      host      : dcf76dfb9908
      rank      : 0 (local_rank: 0)
      exitcode  : 1 (pid: 65931)
      error_file: <N/A>
      traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
    ============================================================

Possible to release rendered Objaverse images?

Thank you so much for this fantastic work!

Though I noticed the doc of the training stage is not available yet, may I have the images rendered from the Objaverse dataset so that I can finetune the models by myself? It seems like a long time would be required to render the whole Objaverse dataset,

How to inference custom-trained model

Hello OpenLRM Team,

Following the guidance provided in issue #24, I've managed to run python scripts/convert_hf.py --config <YOUR_EXACT_TRAINING_CONFIG> using the files from exps/checkpoints such as custom_check and model.sfet. This process successfully generated a config.json file and model.safetensors in the releases folder. However, I'm struggling to use these files for inference. Following the instructions in the readme section on inference, I keep encountering 'key not found' errors. It seems I might be setting the properties incorrectly, but I can't figure out the right approach. Could you provide additional guidance on how to proceed with inference using trained models? Any assistance would be greatly appreciated.

训练需要什么样的显卡配置

论文中说训练 我们在128个NVIDIA(40G)A100 GPU上以批次大小1024(每次迭代1024个不同形状)训练LRM 30个周期,大约需要3天时间完成。每个周期包含Objaverse的一份渲染图像数据和MV。
论文里使用的数据集allenai/objaverse 有好几个T

没有这么多a100, 一张a100训练少点数据可以么?

HF Demo - 3D model?

HF demo seems to have output video only - Why not also expose the 3d model for download?

How to evaluate metrics

Hi @ZexinHe , I wanted to run some metrics on a custom trained model, how do I do that, I tried using FID, but during inference I'm getting ply file for which I rendered images using blender, but the images don't have colour, so the FID value is very high, can you suggest any way to save the object as a glb file.

Computation Cost

Thanks for open-sourcing this wonderful work.

I'm curious about the computation cost for training your LRM on the Objaverse dataset.

For example, it takes 128 A100-40G GPUs for 3 days to complete training on the Objaverse + MVImgNet dataset in the original paper.

Inference result quality on trained model is not good

information

device info

  • tested on Runpod GPU instance
  • A100 SXM 80GB x3, 96 vCPU 750 GB RAM
  • runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04

data preparation

  • prepared 100 high quality glb files
  • successfully generated dataset with rgba, pose, and intrinsics.npy using blender_script.py.
  • splited them into two parts, each for training(80%) and evaluation(20%), configured train_uids,json and val_uids.json.

training

modified epochs (60 => 1000) and global_step_period (1000 => 100)
train_sample.yaml :

experiment:
    type: lrm
    seed: 42
    parent: lrm-objaverse
    child: small-dummyrun

model:
    camera_embed_dim: 1024
    rendering_samples_per_ray: 96
    transformer_dim: 512
    transformer_layers: 12
    transformer_heads: 8
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 32
    encoder_type: dinov2
    encoder_model_name: dinov2_vits14_reg
    encoder_feat_dim: 384
    encoder_freeze: false

dataset:
    subsets:
        -   name: objaverse
            root_dirs:
                - "/root/OpenLRM/views"
            meta_path:
                train: "/root/OpenLRM/train_uids.json"
                val: "/root/OpenLRM/val_uids.json"
            sample_rate: 1.0
    sample_side_views: 3
    source_image_res: 224
    render_image:
        low: 64
        high: 192
        region: 64
    normalize_camera: true
    normed_dist_to_center: auto
    num_train_workers: 4
    num_val_workers: 2
    pin_mem: true

train:
    mixed_precision: bf16  # REPLACE THIS BASED ON GPU TYPE
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 16  # REPLACE THIS (PER GPU)
    accum_steps: 1  # REPLACE THIS
    epochs: 1000  # REPLACE THIS
    debug_global_steps: null

val:
    batch_size: 4
    global_step_period: 100
    debug_batches: null

saver:
    auto_resume: true
    load_model: null
    checkpoint_root: ./exps/checkpoints
    checkpoint_global_steps: 1000
    checkpoint_keep_level: 5

logger:
    stream_level: WARNING
    log_level: INFO
    log_root: ./exps/logs
    tracker_root: ./exps/trackers
    enable_profiler: false
    trackers:
        - tensorboard
    image_monitor:
        train_global_steps: 100
        samples_per_log: 4

compile:
    suppress_errors: true
    print_specializations: true
    disable: true

training result as follows :

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.09it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.19it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.05it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.23it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.11it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.12it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.13it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.14it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.27it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.03it/s]
[TRAIN STEP]loss=0.369, loss_pixel=0.033, loss_perceptual=0.335, loss_tv=3.03, lr=0.000133: 100%|█████████████████████████████████████████████████████| 1000/1000 [1:26:39<00:00,  5.78s/it]
root@b5f5ee77bf34:~/OpenLRM# 

converted the generated checkpoint into huggingface-compatible model using the following script :

python scripts/convert_hf.py --config ./configs/train-sample.yaml  convert.global_step=null

succesfully generated model :

root@b5f5ee77bf34:~/OpenLRM# python scripts/convert_hf.py --config ./configs/train-sample.yaml  convert.global_step=null
/root/OpenLRM/./openlrm/models/encoders/dinov2/layers/swiglu_ffn.py:43: UserWarning: xFormers is available (SwiGLU)
  warnings.warn("xFormers is available (SwiGLU)")
/root/OpenLRM/./openlrm/models/encoders/dinov2/layers/attention.py:27: UserWarning: xFormers is available (Attention)
  warnings.warn("xFormers is available (Attention)")
/root/OpenLRM/./openlrm/models/encoders/dinov2/layers/block.py:39: UserWarning: xFormers is available (Block)
  warnings.warn("xFormers is available (Block)")
Downloading: "https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_reg4_pretrain.pth" to /root/.cache/torch/hub/checkpoints/dinov2_vits14_reg4_pretrain.pth
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 84.2M/84.2M [00:00<00:00, 192MB/s]
Loading from exps/checkpoints/lrm-objaverse/small-dummyrun/000100/model.safetensors
Saving locally to exps/releases/lrm-objaverse/small-dummyrun/step_000100
root@b5f5ee77bf34:~/OpenLRM# 

inference

started inference on trained-model :

EXPORT_VIDEO=true
EXPORT_MESH=true
MODEL_NAME="./exps/releases/lrm-objaverse/small-dummyrun/step_001000"

python -m openlrm.launch infer.lrm --infer "./configs/infer-l.yaml" model_name=$MODEL_NAME image_input="./assets/sample_input/test.png" export_video=$EXPORT_VIDEO export_mesh=$EXPORT_MESH

the input image is from one of the multi view images from the data used for training :

and the output ply look likes this :

Why does the result quality is so bad?
What should I do to increase the quality?
I'm so new to AI, don't know how to properly generate checkpoint models modifying config files.
Should I increase the dataset amount?
Should I increase the training epoch more than 1000, or decrease it?
So confused and not know where to start.
If someone could give me advise, it would be great help.
Thank you in advance:)

How can I solve this issue on the colab implementation?

Example usage

EXPORT_VIDEO=True
EXPORT_MESH=True
INFER_CONFIG="./configs/infer-b.yaml"
MODEL_NAME="zxhezexin/openlrm-mix-base-1.1"
IMAGE_INPUT="./assets/sample_input/owl.png"

!python -m openlrm.launch infer.lrm --infer $INFER_CONFIG model_name=$MODEL_NAME image_input=$IMAGE_INPUT export_video=$EXPORT_VIDEO export_mesh=$EXPORT_MESH

/usr/bin/python3: No module named openlrm

Details about the dataset

Thanks for open-sourcing this amazing work. I am curious about the details of the dataset used for training, i.e. the scale and radius when rendering the objaverse dataset? And what's the orientation of the camera coordinate system, is it x front, y right and z up?

DINO encoder

Dear authors:

Thank you for your great work! I wonder why you used DINOV1 instead DINOV2, which is more suitable for dense prediction task. Thank you!

question about inference

hello! when I run python -m openlrm.launch infer.lrm --infer "./configs/infer-b.yaml" model_name="zxhezexin/openlrm-mix-base-1.1" image_input="./assets/sample_input/owl.png" export_video=true export_mesh=trueas the example did , it failed, error message:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/data1/ubu/OpenLRM/openlrm/launch.py", line 36, in
main()
File "/data1/ubu/OpenLRM/openlrm/launch.py", line 31, in main
with RunnerClass() as runner:
^^^^^^^^^^^^^
File "/data1/ubu/OpenLRM/openlrm/runners/infer/lrm.py", line 121, in init
self.model = self._build_model(self.cfg).to(self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data1/ubu/OpenLRM/openlrm/runners/infer/lrm.py", line 126, in _build_model
model = hf_model_cls.from_pretrained(cfg.model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubu/anaconda3/envs/lrm/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/ubu/anaconda3/envs/lrm/lib/python3.11/site-packages/huggingface_hub/hub_mixin.py", line 277, in from_pretrained
instance = cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/ubu/anaconda3/envs/lrm/lib/python3.11/site-packages/huggingface_hub/hub_mixin.py", line 485, in _from_pretrained
model = cls(**model_kwargs)
^^^^^^^^^^^^^^^^^^^
TypeError: wrap_model_hub..HfModel.init() missing 1 required positional argument: 'config'

my environment python=3.11
torch=2.1.2 cuda=11.8
Is there anything wrong with my operation?

Release the scene level model

Dear Authors,

I greatly appreciate your efforts in open-sourcing this impressive model. I am interested in the upcoming release of the scene-level pretrained model on MV-imagenet+Objaverse. Could you kindly share the expected release date for this? Additionally, I would like to inquire if there are any intentions to train the model on Objectverse XL in the future. Thank you for your attention to these queries.

About training dataset

Hi, could you provide the your rendering code for objaverse? we wanna train OpenLRM by myself

Inference error

$ python -m lrm.inferrer --model_name openlrm-base-obj-1.0 --source_image ./assets/sample_input/owl.png --export_video

======== Loaded model from checkpoint ========
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/inferrer.py", line 265, in
inferrer.infer(
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/inferrer.py", line 210, in infer
results = self.infer_single(
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/inferrer.py", line 155, in infer_single
planes = self.model.forward_planes(image, source_camera)
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/models/generator.py", line 91, in forward_planes
planes = self.transformer(image_feats, camera_embeddings)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/models/transformer.py", line 126, in forward
x = layer(x, image_feats, camera_embeddings)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/models/transformer.py", line 79, in forward
x = x + self.self_attn(before_sa, before_sa, before_sa)[0]
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1160, in forward
return torch._native_multi_head_attention(
TypeError: _native_multi_head_attention(): argument 'qkv_bias' (position 7) must be Tensor, not NoneType

CUDA out of memory for inferring

I use 3090 24G to run infer process with openlrm-mix-large-1.1 ,but CUDA out of memory,I wonder how to make it infer in 24G 3090.

CUDA out of memory. Tried to allocate 16.88 GiB. GPU 0 has a total capacity of 23.68 GiB of which 1.70 GiB is free. Process 130544 has 2.31 GiB memory in use. Including non-PyTorch memory, this process has 19.49 GiB memory in use. Of the allocated memory 19.10 GiB is allocated by PyTorch, and 76.80 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Issue serving a model trained with the provided training code

I'm trying to run inference on a custom model, trained with the provided code, but there seems to be a problem with building the model:

self.model = self._build_model(self.cfg).to(self.device)
def _build_model(self, cfg):
from openlrm.models import model_dict
hf_model_cls = wrap_model_hub(model_dict[self.EXP_TYPE])
model = hf_model_cls.from_pretrained(cfg.model_name)
return model

(venv) root@bc700a1d6a6c:/workspace/OpenLRM# python -m openlrm.launch infer.lrm --infer=configs/infer-b.yaml model_name=exps/checkpoints/lrm-objaverse/overfitting-test/001000 image_input=test.png export_video=true export_mesh=true
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
config.json not found in /workspace/OpenLRM/exps/checkpoints/lrm-objaverse/overfitting-test/001000
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/OpenLRM/openlrm/launch.py", line 36, in <module>
    main()
  File "/workspace/OpenLRM/openlrm/launch.py", line 31, in main
    with RunnerClass() as runner:
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 121, in __init__
    self.model = self._build_model(self.cfg).to(self.device)
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 126, in _build_model
    model = hf_model_cls.from_pretrained(cfg.model_name)
  File "/workspace/OpenLRM/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/workspace/OpenLRM/venv/lib/python3.10/site-packages/huggingface_hub/hub_mixin.py", line 277, in from_pretrained
    instance = cls._from_pretrained(
  File "/workspace/OpenLRM/venv/lib/python3.10/site-packages/huggingface_hub/hub_mixin.py", line 485, in _from_pretrained
    model = cls(**model_kwargs)
TypeError: wrap_model_hub.<locals>.HfModel.__init__() missing 1 required positional argument: 'config'

and the folder that is passed as model_name argument looks like this:

exps
|-- checkpoints
|   `-- lrm-objaverse
|       `-- overfitting-test
|           `-- 001000
|               |-- custom_checkpoint_0.pkl
|               |-- model.safetensors
|               |-- optimizer.bin
|               `-- random_states_0.pkl

which contains a file named model.safetensors as required by huggingface_hub when initialising from path.

From some tests it seems that the method hf_model_cls.from_pretrained needs as dictionary the section "model" from file configs/train-sample.yaml

model:
    camera_embed_dim: 1024
    rendering_samples_per_ray: 96
    transformer_dim: 512
    transformer_layers: 12
    transformer_heads: 8
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 32
    encoder_type: dinov2
    encoder_model_name: dinov2_vits14_reg
    encoder_feat_dim: 384
    encoder_freeze: false

But even so, after passing this as a dictionary, the code breaks a bit further:

(venv) root@bc700a1d6a6c:/workspace/OpenLRM# python -m openlrm.launch infer.lrm --infer=configs/infer-b.yaml model_name=exps/checkpoints/lrm-objaverse/overfitting-test/001000 image_input=test.png export_video=true export_mesh=true
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
config.json not found in /workspace/OpenLRM/exps/checkpoints/lrm-objaverse/overfitting-test/001000
[2024-03-25 15:34:32,383] openlrm.models.modeling_lrm: [INFO] Using DINOv2 as the encoder
/workspace/OpenLRM/openlrm/models/encoders/dinov2/layers/swiglu_ffn.py:43: UserWarning: xFormers is available (SwiGLU)
  warnings.warn("xFormers is available (SwiGLU)")
/workspace/OpenLRM/openlrm/models/encoders/dinov2/layers/attention.py:27: UserWarning: xFormers is available (Attention)
  warnings.warn("xFormers is available (Attention)")
/workspace/OpenLRM/openlrm/models/encoders/dinov2/layers/block.py:39: UserWarning: xFormers is available (Block)
  warnings.warn("xFormers is available (Block)")
Loading weights from local directory
  0%|                                                                                                                                                                                                                              | 0/1 [00:00<?, ?it/s]/workspace/OpenLRM/openlrm/datasets/cam_utils.py:153: UserWarning: Using torch.cross without specifying the dim arg is deprecated.
Please either pass the dim explicitly or simply use torch.linalg.cross.
The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at ../aten/src/ATen/native/Cross.cpp:63.)
  x_axis = torch.cross(up_world, z_axis)
  0%|                                                                                                                                                                                                                              | 0/1 [00:14<?, ?it/s]
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/OpenLRM/openlrm/launch.py", line 36, in <module>
    main()
  File "/workspace/OpenLRM/openlrm/launch.py", line 32, in main
    runner.run()
  File "/workspace/OpenLRM/openlrm/runners/infer/base_inferrer.py", line 62, in run
    self.infer()
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 298, in infer
    self.infer_single(
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 258, in infer_single
    mesh = self.infer_mesh(planes, mesh_size=mesh_size, mesh_thres=mesh_thres, dump_mesh_path=dump_mesh_path)
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 221, in infer_mesh
    vtx_colors = self.model.synthesizer.forward_points(planes, vtx_tensor)['rgb'].squeeze(0).cpu().numpy()  # (0, 1)
  File "/workspace/OpenLRM/openlrm/models/rendering/synthesizer.py", line 206, in forward_points
    for k in outs[0].keys()
IndexError: list index out of range

Could anyone help here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.