Git Product home page Git Product logo

openmotionlab / motiongpt Goto Github PK

View Code? Open in Web Editor NEW
1.2K 1.2K 74.0 8.75 MB

[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs

Home Page: https://motion-gpt.github.io

License: MIT License

Python 98.52% CSS 0.97% Shell 0.51%
3d-generation chatgpt gpt language-model motion motion-generation motiongpt multi-modal text-driven text-to-motion

motiongpt's People

Contributors

52penguin avatar baitian752 avatar billl-jiang avatar chaiyuntian avatar chenfengye avatar eltociear avatar ntamotsu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

motiongpt's Issues

Questions about testing results

Thank you for your great job! I have tried to reproduce the results and encountered some issues.

Following instructions, I evaluate the provided checkpoint downloaded from huggingface.

I run the following commands:

python -m test --cfg configs/config_h3d_stage3.yaml --task t2m
python -m test --cfg configs/config_h3d_stage3.yaml --task m2t

The evaluation results are not consistent with the results reported in the paper. The attachments are the log and metrics.

t2m results:
image
log_2023-10-04-19-56-23_test.log
image

Would you happen to have any idea about what's wrong with the configuration?

Stage1 training crashes on eval - continued

hi,
thanks for a very interesting paper and supporting code.
I'm trying to run training, but fail.
image

I tried to create the dataset few times, but it didn't help.
dataset preparation looks ok, here are the results:
image

Questions on fps

Hi. Thank you for the great work. I am still confused on how fps affects the model performance. I see that the motion dataset used to train on is in 20fps. Does this work well if you have lower fps (say 15 fps) or higher fps motion (say 60 fps)?

question on mT5 pretrain

def create_sentinel_ids(self, mask_indices):
        # From https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py
        start_indices = mask_indices - np.roll(mask_indices, 1,
                                               axis=-1) * mask_indices
        start_indices[:, 0] = mask_indices[:, 0]
        sentinel_ids = np.where(start_indices != 0,
                                np.cumsum(start_indices, axis=-1),
                                start_indices)
        sentinel_ids = np.where(sentinel_ids != 0,
                                (len(self.tokenizer) - sentinel_ids), 0)
        sentinel_ids -= mask_indices - start_indices
        return sentinel_ids

In the code, you replace mask with sentinel_ids, the position of which is at last of tokenizer。But before doing this,you had add motion token to the last of tokenzie,Was this done on purpose?

Help me run python app.py

Hi, I am running the gradio demo using python app.py. now I am encountering this error.
please help me how to fix.

Global seed set to 1234
Traceback (most recent call last):
File "/Users/namhuiju/opt/anaconda3/envs/mgpt/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 261, in hf_raise_for_status
response.raise_for_status()
File "/Users/namhuiju/opt/anaconda3/envs/mgpt/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/deps/whisper-large-v2/resolve/main/preprocessor_config.json

AttributeError: 'Joints' object has no attribute 'joinst

Hi Author,
I am a newer to human motion.
Hopefully, you can give me some suggestions.
While rendering by custom prompts, I got this issue.

AttributeError: 'Joints' object has no attribute 'joinst

    if jointstype == "mmm":
        self.kinematic_tree = mmm_kinematic_tree
        self.joints = mmm_joints
        self.joinst.append("")                                                     // the error happens here
    elif jointstype == "humanml3d":
        self.kinematic_tree = humanml3d_kinematic_tree
        self.joints = humanml3d_joints

How should I solve this issue?
Thanks

ValueError when running demo.py and app.py

Hi there, I'm trying to get the demo up and running, but encounter the following error after following the provided instructions and adding any missing files.

Global seed set to 1234
Traceback (most recent call last):
File "/home/msegado/MotionGPT/app.py", line 31, in
datamodule = build_data(cfg, phase="test")
File "/home/msegado/MotionGPT/mGPT/data/build_data.py", line 10, in build_data
return instantiate_from_config(data_config)
File "/home/msegado/MotionGPT/mGPT/config.py", line 42, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/home/msegado/MotionGPT/mGPT/data/HumanML3D.py", line 76, in init
self._sample_set = self.get_sample_set(overrides={"split": "test", "tiny": True})
File "/home/msegado/MotionGPT/mGPT/data/init.py", line 20, in get_sample_set
return self.DatasetEval(**sample_params)
File "/home/msegado/MotionGPT/mGPT/data/humanml/dataset_t2m_eval.py", line 24, in init
super().init(data_root, split, mean, std, max_motion_length,
File "/home/msegado/MotionGPT/mGPT/data/humanml/dataset_t2m.py", line 152, in init
name_list, length_list = zip(
ValueError: not enough values to unpack (expected 2, got 0)

I get the same error when running both "python demo.py --cfg ./configs/config_h3d_stage3.yaml --example ./demos/t2m.txt" and "python app.py"

Any suggestions? Thanks!

ValueError: not enough values to unpack (expected 2, got 0)

Hi, I got this issue even after unzip the texts.zip file.
Traceback (most recent call last):
File "/home/zzmarybloody/MotionGPT/demo.py", line 237, in
main()
File "/home/zzmarybloody/MotionGPT/demo.py", line 147, in main
datamodule = build_data(cfg)
File "/home/zzmarybloody/MotionGPT/mGPT/data/build_data.py", line 10, in build_data
return instantiate_from_config(data_config)
File "/home/zzmarybloody/MotionGPT/mGPT/config.py", line 42, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/home/zzmarybloody/MotionGPT/mGPT/data/HumanML3D.py", line 76, in init
self._sample_set = self.get_sample_set(overrides={"split": "test", "tiny": True})
File "/home/zzmarybloody/MotionGPT/mGPT/data/init.py", line 20, in get_sample_set
return self.DatasetEval(**sample_params)
File "/home/zzmarybloody/MotionGPT/mGPT/data/humanml/dataset_t2m_eval.py", line 24, in init
super().init(data_root, split, mean, std, max_motion_length,
File "/home/zzmarybloody/MotionGPT/mGPT/data/humanml/dataset_t2m.py", line 165, in init
name_list, length_list = zip(
ValueError: not enough values to unpack (expected 2, got 0)

The number of GPUs

Hi, I see the paper that you use 8 gpus in the main paper, but also state 64 gpus in the appendix. So what is the number of used gpus during training?

Help with motion tokens and motion files

Hi. I have a couple questions regarding how motion tokens are fed in in inference and training. I have an array of SMPL parameters (pose, beta, etc.).

  • Do I have to convert it into a .ply file of a video like in the demo and it takes in that format only? Can it take in raw arrays or other format files? I don't have access to Blender so I can't use Blender to generate these files.

  • Does the motion tokens have to fit in a shared environment space? Meaning if I have 2 different motion files for "a person running", do they have to be exact motion tokens or can they be translated a bit (aka different x/y coordinates)?

Can't find the paper

The arxiv link points to "Executing your Commands via Motion Diffusion in Latent Space".

Where the token to enter T5 comes from?

Thank you for bringing such interesting work. When reading the paper, I had a confusion. In Figure 2, token from vqvae's codebook (yellow token input to T5). But in Sec. 3.2, you said "we combine the original text vocabulary V_{t} with motion vocabulary V_{m}, which is order-preserving to our motion codebook Z." Does this mean that the token entered into T5 does not come from the codebook of VQVAE (V_{m} is different from Z).

TypeError: Audio.__init__() got an unexpected keyword argument 'source'

Traceback (most recent call last):
File "/root/autodl-tmp/MotionGPT-main/app.py", line 512, in
aud = gr.Audio(source="microphone",
File "/root/miniconda3/envs/mgpt/lib/python3.10/site-packages/gradio/component_meta.py", line 146, in wrapper
return fn(self, **kwargs)
TypeError: Audio.init() got an unexpected keyword argument 'source'
Hello, I want to know how to solve this problem. Thanks!

gradio app has error: "ValueError: Need to enable queue to use generator."

Hi Author,
While running app.py, I input prompt, get the following error.
Any suggestion is appreciated. Thanks

Traceback (most recent call last):
File "/opt/conda/envs/mgpt/lib/python3.10/site-packages/gradio/routes.py", line 508, in predict
output = await route_utils.call_process_api(
File "/opt/conda/envs/mgpt/lib/python3.10/site-packages/gradio/route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "/opt/conda/envs/mgpt/lib/python3.10/site-packages/gradio/blocks.py", line 1437, in process_api
result = await self.call_function(
File "/opt/conda/envs/mgpt/lib/python3.10/site-packages/gradio/blocks.py", line 1117, in call_function
raise ValueError("Need to enable queue to use generators.")
ValueError: Need to enable queue to use generators.

smpl to mixamo rig

hi,
can you explain how to translate from the smpl output to mixamo rig?
I tried but got strange animations.

demo is not ready

Hi Author,
Thanks for your excellent work.
It seems the demo is not ready. I meet some issues such as the parameter being missing.
Hopefully, it will be resolved recently. Thanks

Questions about results reproduction

Hello, thank you for releasing this amazing work.
I would like to reproduce the results on the humandml3d dataset and have some questions about it:

  • About the hyperparameters, in the paper you use the number of iterations, while the configs in the code use END_EPOCH. Do they mean the same thing for you, or is it that nb_iterations = END_EPOCH * nb_batches ?

I also have a question about the real time application of the method; is it possible to use the method for motion captioning in real time, for example a motion is being played and the description of motion is quite in sync as the motion is being generated, so there is no big lag between the two?

Why T5 is used instead of GPT?

It seems GPT like llama2 is more popular.
But the paper still use T5.
Compared to GPT, does it have any special advantages to use T5?

Motion tokens

Hello. I read through the GitHub website and had a couple questions:

  • how are you getting the motion tokens in the first place? What 3D model is being used and does it know what joints are what?
  • how do you feed the tokens into GPT? I assume motion tokens are a lot so how does this work considering the limited context length?

Stage1 training crashes on eval

Command line used:
python -m train --cfg configs/config_h3d_stage1.yaml --nodebug

Error:
torch._C._LinAlgError: linalg.svd: (Batch element 0): The algorithm failed to converge because the input matrix contained non-finite values.

image

run demo.py meet parameter missing(issue #17 additional details)

Thanks for excellent work!
I have reviewed issue #17 and noticed that others have faced similar problems.
Issue Description:
In the context you've mentioned, running demo.py requires certain parameters like 'render' and 'frame_rate,' but I couldn't locate them in the parameter table. I'm unsure about how to resolve this issue and would appreciate your assistance.Thanks.
267278978-1da49512-e0d4-4d71-a513-d253c49d2026

FileNotFoundError: [Errno 2] No such file or directory:

python demo.py --cfg ./configs/config_h3d_stage3.yaml --example ./demos/t2m.txt

FileNotFoundError: [Errno 2] No such file or directory: 'deps/t2m/t2m\VQVAEV3_CB1024_CMT_H1024_NRES3\meta\mean.npy'

But, I have this file.
image

So, my file path is t2m folder is repeated 3 times

I followed the Quick Start guideline exactly, but is it just me?

train pycharm debug crash

I'm trying to run train train with params "--cfg configs/config_h3d_stage1.yaml --nodebug" in pycharm in order to debug why its not working but i'm getting "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)"

What exactly is raw motion data?

I am reading through the paper but I am confused on what you mean raw motion data. It does not seem to be clarified anywhere. Is this full 3D meshes or joint keypoints or what?

Error when rendering as a 3D human ("slow" visualisation) in Gradio demo

Hi, so I am running the gradio demo using python app.py , it works fine with the "fast" visualisation mode (i.e. skeletal keypoints). But whenever I am changing it to the "slow" visualisation mode to see the full human rendering, I am encountering this peculiar error (it's a long error msg but this is the last line -- if you need the full error message I can provide you that):

raise NoSuchDisplayException(f'Cannot connect to "{name}"')
pyglet.canvas.xlib.NoSuchDisplayException: Cannot connect to "None"

Not understanding what is happening. I searched stackoverflow, some posts said it was a vscode issue while opening a new window, but here all rendering is happening in the localhost server, so what's the problem here?

If anyone has faced this issue and knows a workaround, please let me know. Thanks in advance!

VQ-VAE or VQ-VAE-2?

great work!I would like to ask whether the technology used in the Motion Tokenizer used in the paper is VQ-VAE or VQ-VEVA-2? It looks like VQ-VAE, why not VQ-VVE-2? In addition, I do not understand the conversion process from codebooks to Motiontokens, can you answer it? looking forward to your reply.

How to finetune?

Do you have to re-train the whole model with extra data? Or is there an easy way to fine-tune?

I am deadly

looking forward to your release!!!!!!!!!!!!!! Can't wait your amazing work

Missing t2m/VQVAEV3_CB1024_CMT_H1024_NRES3

Hi, thanks for releasing the easily understandable code 😸
While I run the webui-app successfully, there are some artifacts due to missing files that I replaced probably.

--
Update:
Chaging VQVAEV3_CB1024_CMT_H1024_NRES3 to Decomp_SP001_SM001_H512 works well for generating motion!
2023-09-10-14_17_1244086

--
Example (using mean/std from HumanML3D repo):
Can you show me that a person does three straight jumping jacks ?
2023-09-10-13_59_2068067

  • dis_data_root = pjoin(cfg.DATASET.HUMANML3D.MEAN_STD_PATH, 't2m', "VQVAEV3_CB1024_CMT_H1024_NRES3", "meta")
    • The provided data didn't include VQVAEV3_CB1024_CMT_H1024_NRES3 in deps/t2m/t2m
    • Replaced by Mean.npy and Std.npy in HumanML3D repo
  • configs/webui.yaml
    • Test.CHECKPOINTS: ...ckpt to ...tar
  • configs/lm/default.yaml
    • params.model_path: ../memData/deps/flan-t5-base to google/flan-t5-base
  • configs/assets.yaml
    • model.whisper_path: deps/whisper-large-v2 to openai/whisper-large-v2
  • HumanML dataset in datasets/humanml3d
    • Use a single 012314.npy from HumanML3D repo as dataset

Thanks again 😄

Motion detection to text

Wonderful work! I am wondering if the model can be used to detect real-life videos of human motions and actions and caption them into text.

[Query] Rendering locally in Blender

Hi, so I usually run codes in a remote server via SSH and was wondering if there is a way to render the outputs in my locally installed Blender. I was checking the scripts in this repo for rendering and I found a path to blender is required, but how can I get a path from the remote server to my local machine? Can anyone pls help me with this?

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.