Comments (12)
@Yashaswini-Srirangarajan I hit the same issue and using scipy==1.11.1 solved my problem, although I'm not sure which version is mathematically more correct. See:
scipy/scipy#19415
mseitzer/pytorch-fid#103
from motiongpt.
At least a partial fix has come through at scipy/scipy#20212. We recommend trying again once SciPy 1.13.0 is released, to see whether the problems are gone.
@lucascolley, This fix now works for me :) thanks !!
from motiongpt.
fantastic - 1.13.0 should be out within the next few weeks
It was just released.
from motiongpt.
Hi @Yashaswini-Srirangarajan,
Noticed a lot of people encountered this issue, including myself. Only fix was to change the 'test' split to 'val' in the config files. Check this for more details: #22 (comment)
However,
this seems to be a strange error as even after manually checking for errors (non-finite values) in the data, and also using a different dataset, this error keeps resurfacing.
Asking @billl-jiang for any support with this issue and debugging.
Cheers.
from motiongpt.
UPDATE:
- Fixed this problem by checking all the .npy files for NAN values and other anomalies with respect to their corresponding names in the .txt files (train, val and test).
- Once found the faulty files, remove them from:
texts, new_joints, new_joint_vecs
and also in the.txt
files.- - In the end all your files and the names should be pointing to same number of samples.
- Finally most important is to is delete the 'tmp' folder created during the training runs, every time you alter the data.
from motiongpt.
@zybermonk Thanks for the inputs.. How did you debug for NANs. Looks like all my files in new_joint_vecs and new_joints don't have NANs. I am missing any step from generating the HumanML3D dataset? Thanks a lot!
UPDATE:
- Fixed this problem by checking all the .npy files for NAN values and other anomalies with respect to their corresponding names in the .txt files (train, val and test).
- Once found the faulty files, remove them from:
texts, new_joints, new_joint_vecs
and also in the.txt
files.-- In the end all your files and the names should be pointing to same number of samples.
- Finally most important is to is delete the 'tmp' folder created during the training runs, every time you alter the data.
from motiongpt.
Tried this approach as well, but I seem to getting some other error as below. Had you faced this before? Thanks!
Trainable params: 267 M
Non-trainable params: 65.1 M
Total params: 332 M
Total estimated model params size (MB): 1.3 K
Sanity Checking ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/2 0:00:02 • 0:00:00 1.64it/s 2024-01-30 16:40:28,994 Sanity checking ok.
/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_l
ightning/loops/fit_loop.py:293: The number of training batches (1) is smaller
than the logging interval Trainer(log_every_n_steps=50). Set a lower value for
log_every_n_steps if you want to see logs for the training epoch.
2024-01-30 16:40:29,481 Training started
Epoch 0/999998 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1 0:00:00 • 0:00:00 0.00it/s
Traceback (most recent call last):
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/yasha/workspace/mocap/MotionGPT/train.py", line 94, in <module>
main()
File "/home/yasha/workspace/mocap/MotionGPT/train.py", line 85, in main
trainer.fit(model, datamodule=datamodule)
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
results = self._run_stage()
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage
self.fit_loop.run()
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
self.advance()
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance
self.epoch_loop.run(self._data_fetcher)
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 137, in run
self.on_advance_end(data_fetcher)
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 285, in on_advance_end
self.val_loop.run()
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
return loop_run(self, *args, **kwargs)
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 141, in run
return self.on_run_end()
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 253, in on_run_end
self._on_evaluation_epoch_end()
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 329, in _on_evaluation_epoch_end
call._call_lightning_module_hook(trainer, hook_name)
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/home/yasha/workspace/mocap/MotionGPT/mGPT/models/base.py", line 54, in on_validation_epoch_end
dico.update(self.metrics_log_dict())
File "/home/yasha/workspace/mocap/MotionGPT/mGPT/models/base.py", line 114, in metrics_log_dict
metrics_dict = getattr(
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/torchmetrics/metric.py", line 610, in wrapped_func
value = _squeeze_if_scalar(compute(*args, **kwargs))
File "/home/yasha/miniconda3/envs/motiongpt_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/yasha/workspace/mocap/MotionGPT/mGPT/metrics/t2m.py", line 195, in compute
metrics["FID"] = calculate_frechet_distance_np(gt_mu, gt_cov, mu, cov)
File "/home/yasha/workspace/mocap/MotionGPT/mGPT/metrics/utils.py", line 205, in calculate_frechet_distance_np
raise ValueError("Imaginary component {}".format(m))
ValueError: Imaginary component 1.836488313288817e+26
Hi @Yashaswini-Srirangarajan, Noticed a lot of people encountered this issue, including myself. Only fix was to change the 'test' split to 'val' in the config files. Check this for more details: #22 (comment)
However, this seems to be a strange error as even after manually checking for errors (non-finite values) in the data, and also using a different dataset, this error keeps resurfacing.
Asking @billl-jiang for any support with this issue and debugging. Cheers.
from motiongpt.
@zybermonk Thanks for the inputs.. How did you debug for NANs. Looks like all my files in new_joint_vecs and new_joints don't have NANs. I am missing any step from generating the HumanML3D dataset? Thanks a lot!
Hi @Yashaswini-Srirangarajan, sorry for the late response.
When you build HumanML3D, by default there will be a few files that contain faulty data. You can first notice this during the data building process itself, for example, while using the 3rd notebook of HumanML3D you can see the following output -
Evidently, the .npy
files with suffixes 7975, contained NAN data when verified using np.isfinite()
or similar.
Following this method, you need to verify all your .npy
files in new_joints and new_joint_vecs, corresponding to the file names in the train, test and val .txt
files.
You will find the following files also have faulty data, as encountered previously after using the 2nd notebook from HumanML3D
Next step would be to delete these files in .npy
folders, and also filenames in the .txt
files.
- Most importantly, as I previously mentioned, make sure you delete the tmp folder before running your code with new edited dataset
from motiongpt.
I hit the same issue and using scipy==1.11.1 solved my problem, although I'm not sure which version is mathematically more correct
If anyone has any input on which version is more mathematically correct, that would be great.
from motiongpt.
I hit the same issue and using scipy==1.11.1 solved my problem, although I'm not sure which version is mathematically more correct
If anyone has any input on which version is more mathematically correct, that would be great.
Just adding to this question, changing these libraries indirectly requires finding the right numpy version as well.
from motiongpt.
At least a partial fix has come through at scipy/scipy#20212. We recommend trying again once SciPy 1.13.0 is released, to see whether the problems are gone.
from motiongpt.
fantastic - 1.13.0 should be out within the next few weeks
from motiongpt.
Related Issues (20)
- tensor (3) must match the existing size (263)
- Does the model support Chinese input and output actions? HOT 1
- The demo in Hugging Face has encountered an issue.
- When Text2MotionDataset loads cached files (in tmp folder), length_list (self.length_arr) is not properly recreated.
- How to save the VQVAE's weight separately from the whole model? HOT 1
- Training Kit-ML encountered dimension mismatch problem HOT 1
- size mismatch for main.0.weight HOT 2
- Camera location includes nan values. HOT 2
- safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge HOT 5
- Should the END-EPOCH parameter be set to 999999? HOT 3
- Issue with Training MotionGPT on Multiple Devices
- Issue with Increasing batchSize in Visualization Part2 Script: Create SMPL meshes with mulit batch
- Training on custom Dataset HOT 3
- Link for SMPL models not working HOT 1
- Hugging face demo not working required for comparison on Motion2Text
- Gradio: No supported video format or MIME type found HOT 1
- VAE - Training HOT 1
- Difference Between Testing Results and Paper Results HOT 1
- Can motoin support with Uinty or unreal?
- LM Pretrain
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from motiongpt.