Git Product home page Git Product logo

meshgpt-pytorch's Introduction

MeshGPT - Pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch

Will also add text conditioning, for eventual text-to-3d asset

Please join Join us on Discord if you are interested in collaborating with others to replicate this work

Appreciation

  • StabilityAI, A16Z Open Source AI Grant Program, and 🤗 Huggingface for the generous sponsorships, as well as my other sponsors, for affording me the independence to open source current artificial intelligence research

  • Einops for making my life easy

  • Marcus for the initial code review (pointing out some missing derived features) as well as running the first successful end-to-end experiments

Install

$ pip install meshgpt-pytorch

Usage

import torch

from meshgpt_pytorch import (
    MeshAutoencoder,
    MeshTransformer
)

# autoencoder

autoencoder = MeshAutoencoder(
    dim = 512,
    encoder_depth = 6,
    decoder_depth = 6,
    num_discrete_coors = 128
)

# mock inputs

vertices = torch.randn((2, 121, 3))            # (batch, num vertices, coor (3))
faces = torch.randint(0, 121, (2, 64, 3))      # (batch, num faces, vertices (3))

# make sure faces are padded with `-1` for variable lengthed meshes

# forward in the faces

loss = autoencoder(
    vertices = vertices,
    faces = faces
)

loss.backward()

# after much training...
# you can pass in the raw face data above to train a transformer to model this sequence of face vertices

transformer = MeshTransformer(
    autoencoder,
    dim = 512,
    max_seq_len = 768
)

loss = transformer(
    vertices = vertices,
    faces = faces
)

loss.backward()

# after much training of transformer, you can now sample novel 3d assets

faces_coordinates, face_mask = transformer.generate()

# (batch, num faces, vertices (3), coordinates (3)), (batch, num faces)
# now post process for the generated 3d asset

For text-conditioned 3d shape synthesis, simply set condition_on_text = True on your MeshTransformer, and then pass in your list of descriptions as the texts keyword argument

ex.

transformer = MeshTransformer(
    autoencoder,
    dim = 512,
    max_seq_len = 768,
    condition_on_text = True
).cpu()


loss = transformer(
    vertices = vertices,
    faces = faces,
    texts = ['a high chair', 'a small teapot'],
)

loss.backward()

# after much training of transformer, you can now sample novel 3d assets conditioned on text

faces_coordinates, face_mask = transformer.generate(texts = ['a long table'])

If you want to tokenize meshes, for use in your multimodal transformer, simply invoke .tokenize on your autoencoder (or same method on autoencoder trainer instance for the exponentially smoothed model)

mesh_token_ids = autoencoder.tokenize(
    vertices = vertices,
    faces = faces
)

# (batch, num face vertices, residual quantized layer)

Todo

  • autoencoder

    • encoder sageconv with torch geometric
    • proper scatter mean accounting for padding for meaning the vertices and RVQ the vertices before gathering back for decoder
    • complete decoder and reconstruction loss + commitment loss
    • handle variable lengthed faces
    • add option to use residual LFQ, latest quantization development that scales code utilization
    • xcit linear attention in encoder and decoder
    • figure out how to auto-derive face_edges directly from faces and vertices
    • embed any derived values (area, angles, etc) from the vertices before sage convs
  • transformer

    • properly mask out eos logit during generation
    • make sure it trains
      • take care of sos token automatically
      • take care of eos token automatically if sequence length or mask is passed in
    • handle variable lengthed faces
      • on forwards
      • on generation, do all eos logic + substitute everything after eos with pad id
    • generation + cache kv
  • trainer wrapper with hf accelerate

    • autoencoder - take care of ema
    • transformer
  • text conditioning using own CFG library

    • complete preliminary text conditioning
    • make sure CFG library can support passing in arguments to the two separate calls when cond scaling (as well as aggregating their outputs)
  • hierarchical transformers (using the RQ transformer)

  • fix caching in simple gateloop layer in other repo

  • local attention

  • fix kv caching for two-staged hierarchical transformer - 7x faster now, and faster than original non-hierarchical transformer

  • fix caching for gateloop layers

  • allow for customization of model dimensions of fine vs coarse attention network

  • make transformer efficient

    • reversible networks
    • give mamba a test drive
  • speculative decoding option

Citations

@inproceedings{Siddiqui2023MeshGPTGT,
    title   = {MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers},
    author  = {Yawar Siddiqui and Antonio Alliegro and Alexey Artemov and Tatiana Tommasi and Daniele Sirigatti and Vladislav Rosov and Angela Dai and Matthias Nie{\ss}ner},
    year    = {2023},
    url     = {https://api.semanticscholar.org/CorpusID:265457242}
}
@inproceedings{dao2022flashattention,
    title   = {Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
    author  = {Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
    booktitle = {Advances in Neural Information Processing Systems},
    year    = {2022}
}
@inproceedings{Leviathan2022FastIF,
    title   = {Fast Inference from Transformers via Speculative Decoding},
    author  = {Yaniv Leviathan and Matan Kalman and Y. Matias},
    booktitle = {International Conference on Machine Learning},
    year    = {2022},
    url     = {https://api.semanticscholar.org/CorpusID:254096365}
}
@misc{yu2023language,
    title   = {Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation}, 
    author  = {Lijun Yu and José Lezama and Nitesh B. Gundavarapu and Luca Versari and Kihyuk Sohn and David Minnen and Yong Cheng and Agrim Gupta and Xiuye Gu and Alexander G. Hauptmann and Boqing Gong and Ming-Hsuan Yang and Irfan Essa and David A. Ross and Lu Jiang},
    year    = {2023},
    eprint  = {2310.05737},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
@article{Lee2022AutoregressiveIG,
    title   = {Autoregressive Image Generation using Residual Quantization},
    author  = {Doyup Lee and Chiheon Kim and Saehoon Kim and Minsu Cho and Wook-Shin Han},
    journal = {2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year    = {2022},
    pages   = {11513-11522},
    url     = {https://api.semanticscholar.org/CorpusID:247244535}
}
@inproceedings{Katsch2023GateLoopFD,
    title   = {GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling},
    author  = {Tobias Katsch},
    year    = {2023},
    url     = {https://api.semanticscholar.org/CorpusID:265018962}
}

meshgpt-pytorch's People

Contributors

lucidrains avatar

Stargazers

Wang Wei avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.