rotem-shalev / ham2pose Goto Github PK

View Code? Open in Web Editor NEW

35.0 35.0 7.0 14.3 MB

Official implementation for "Ham2Pose: Animating Sign Language Notation into Pose Sequences" [CVPR 2023]

Home Page: https://rotem-shalev.github.io/ham-to-pose

Python 100.00%

ham2pose's People

Contributors

Stargazers

Watchers

Forkers

peterzs zhengdiyu jackzhousz bipinkrish

ham2pose's Issues

Ham2Pose training OOM error (after trying 5 different ways to resolve the error).

Hello, thanks for sharing the code with us author!

I am trying to reproduce the results by running the train.py code.

I have 2 RTX4090ti but during the training, it keeps outputting an error.

Below are the things that I have tried so far to resolve the issues:

Use 4 X A6000 GPUs to run the code (still gave me an OOM error)
reduce the number of data (#training data: 100, #testing data: 50)
Reduce the number of trainable parameters (to 51.9 K in total by reducing the number of layers and so on)
change the pytorch lightning strategy from 'ddp' to 'fsdp'.
Reduce the number of batch

Would it be possible to know how to resolve this error?
Below is the error message I receive:

Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:653: Checkpoint directory /home/XX/Desktop/project/asl/Ham2Pose/models/ham2pose exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]

Number	Name	Type	Params
0	embedding	Embedding	6.8 K
1	step_embedding	Embedding	160
2	pos_positional_embeddings	Embedding	6.4 K
3	text_positional_embeddings	Embedding	6.4 K
4	pose_projection	Linear	8.8 K
5	text_encoder	TransformerEncoder	21.0 K
6	pose_encoder	TransformerEncoder	2.4 K
7	step_encoder	Sequential	0
8	seq_length	Linear	0
9	pose_diff_projection	Sequential	0

51.9 K Trainable params
0 Non-trainable params
51.9 K Total params
0.208 Total estimated model params size (MB)

/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py:298: The number of training batches (2) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Epoch 0: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/XX/Desktop/project/asl/Ham2Pose/train.py", line 140, in
trainer.fit(model, train_dataloaders=train_loader)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run
results = self._run_stage()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
self.fit_loop.run()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 205, in run
self.advance()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 363, in advance
self.epoch_loop.run(self._data_fetcher)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 140, in run
self.advance(data_fetcher)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 250, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 183, in run
closure()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 144, in call
self._result = self.closure(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 129, in closure
step_output = self._step_fn()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 318, in _training_step
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 390, in training_step
return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 642, in call
wrapper_output = wrapper_module(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 839, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 635, in wrapped_forward
out = method(_args, **_kwargs)
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 232, in training_step
return self.step(batch, *unused_args, phase="train")
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 260, in step
pred, step_size = self.refinement_step(i, pose_sequence, text_encoding)
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 167, in refinement_step
change_pred = self.refine_pose_sequence(pose_sequence, text_encoding, step_encoding)
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 219, in refine_pose_sequence
pose_encoding = self.encode_pose(pose_sequence, text_encoding, step_encoding)
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 205, in encode_pose
pose_encoding = self.__get_text_pose_encoder()(
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 387, in forward
output = mod(output, src_mask=mask, is_causal=is_causal, src_key_padding_mask=src_key_padding_mask_for_layers)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 707, in forward
x = self.norm1(x + self._sa_block(x, src_mask, src_key_padding_mask, is_causal=is_causal))
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 715, in _sa_block
x = self.self_attn(x, x, x,
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1241, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/functional.py", line 5300, in multi_head_attention_forward
q, k, v = _in_projection_packed(query, key, value, in_proj_weight, in_proj_bias)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/functional.py", line 4826, in _in_projection_packed
proj = proj.unflatten(-1, (3, E)).unsqueeze(0).transpose(0, -2).squeeze(-2).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 23.64 GiB of which 3.50 MiB is free. Process 73846 has 1.22 GiB memory in use. Including non-PyTorch memory, this process has 21.12 GiB memory in use. Process 160281 has 1.28 GiB memory in use. Of the allocated memory 47.37 MiB is allocated by PyTorch, and 4.63 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Dataset arrangements

Hi,

Thank you for your amazing work!

I would like to know more about the dataset layout that you used in the paper. You have mentioned 4 datasets in the paper but I only found 2 in the code.

Also, will there be a guideline on how to organize the dataset for training your method? (e.g. How to download, data format, path)

Many thanks,

Hamnosys text to model input

Hi authors,
thank you for sharing the code base.

How can I convert the the Hamnosys text say (hamextfingerl,hampalml,hamchest,hammoveur,hamplus,hamsymmlr) to the model input which is an utf-8 encoding like
(\xee\x83\xa9\xee\x80\x80\xee\x80\x8d\xee\x80\xa6\xee\x80\xba\xee\x81\xaa\xee\x83\xa0\xee\x83\x90\xee\x81\xa9\xee\x83\xa1\xee\x82\x89\xee\x83\x86)

How to get the text description from HamNoSys?

Hi,

I would like to know how did you extract the text description in the figure like: "Two flat hands with fingers closed, rotated towards each other, touching, then symmetrically moving diagonally downwards". Thank you very much!

Best

tokenizer issue

Hi! I am getting this issue after the installing the dependencies
RuntimeError: Failed to import transformers.models.auto because of the following error (look up to see its traceback): No module named 'tokenizers.pre_tokenizers'

Is anyone else facing this issue?

Saving Ham2Pose as json files

Hi authors,
Thank you for sharing your remarkable code base. I was trying to get the coco keypoints in json formats (similar to openpose outputs). But I am finding hard to extract it because of the pose_format library module. Can you guide me in getting the json file for every frame ?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.