Git Product home page Git Product logo

ham2pose's People

Contributors

rotem-shalev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ham2pose's Issues

Ham2Pose training OOM error (after trying 5 different ways to resolve the error).

Hello, thanks for sharing the code with us author!

I am trying to reproduce the results by running the train.py code.

I have 2 RTX4090ti but during the training, it keeps outputting an error.

Below are the things that I have tried so far to resolve the issues:

  1. Use 4 X A6000 GPUs to run the code (still gave me an OOM error)
  2. reduce the number of data (#training data: 100, #testing data: 50)
  3. Reduce the number of trainable parameters (to 51.9 K in total by reducing the number of layers and so on)
  4. change the pytorch lightning strategy from 'ddp' to 'fsdp'.
  5. Reduce the number of batch

Would it be possible to know how to resolve this error?
Below is the error message I receive:

Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:653: Checkpoint directory /home/XX/Desktop/project/asl/Ham2Pose/models/ham2pose exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]

Number Name Type Params
0 embedding Embedding 6.8 K
1 step_embedding Embedding 160
2 pos_positional_embeddings Embedding 6.4 K
3 text_positional_embeddings Embedding 6.4 K
4 pose_projection Linear 8.8 K
5 text_encoder TransformerEncoder 21.0 K
6 pose_encoder TransformerEncoder 2.4 K
7 step_encoder Sequential 0
8 seq_length Linear 0
9 pose_diff_projection Sequential 0

51.9 K Trainable params
0 Non-trainable params
51.9 K Total params
0.208 Total estimated model params size (MB)

/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py:298: The number of training batches (2) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Epoch 0: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/XX/Desktop/project/asl/Ham2Pose/train.py", line 140, in
trainer.fit(model, train_dataloaders=train_loader)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run
results = self._run_stage()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
self.fit_loop.run()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 205, in run
self.advance()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 363, in advance
self.epoch_loop.run(self._data_fetcher)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 140, in run
self.advance(data_fetcher)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 250, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 183, in run
closure()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 144, in call
self._result = self.closure(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 129, in closure
step_output = self._step_fn()
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 318, in _training_step
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 390, in training_step
return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 642, in call
wrapper_output = wrapper_module(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 839, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 635, in wrapped_forward
out = method(
_args, **_kwargs)
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 232, in training_step
return self.step(batch, *unused_args, phase="train")
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 260, in step
pred, step_size = self.refinement_step(i, pose_sequence, text_encoding)
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 167, in refinement_step
change_pred = self.refine_pose_sequence(pose_sequence, text_encoding, step_encoding)
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 219, in refine_pose_sequence
pose_encoding = self.encode_pose(pose_sequence, text_encoding, step_encoding)
File "/home/XX/Desktop/project/asl/Ham2Pose/model.py", line 205, in encode_pose
pose_encoding = self.__get_text_pose_encoder()(
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 387, in forward
output = mod(output, src_mask=mask, is_causal=is_causal, src_key_padding_mask=src_key_padding_mask_for_layers)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 707, in forward
x = self.norm1(x + self._sa_block(x, src_mask, src_key_padding_mask, is_causal=is_causal))
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 715, in _sa_block
x = self.self_attn(x, x, x,
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1241, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/functional.py", line 5300, in multi_head_attention_forward
q, k, v = _in_projection_packed(query, key, value, in_proj_weight, in_proj_bias)
File "/home/XX/anaconda3/envs/Ham2Pose_38/lib/python3.8/site-packages/torch/nn/functional.py", line 4826, in _in_projection_packed
proj = proj.unflatten(-1, (3, E)).unsqueeze(0).transpose(0, -2).squeeze(-2).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 23.64 GiB of which 3.50 MiB is free. Process 73846 has 1.22 GiB memory in use. Including non-PyTorch memory, this process has 21.12 GiB memory in use. Process 160281 has 1.28 GiB memory in use. Of the allocated memory 47.37 MiB is allocated by PyTorch, and 4.63 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Dataset arrangements

Hi,

Thank you for your amazing work!

I would like to know more about the dataset layout that you used in the paper. You have mentioned 4 datasets in the paper but I only found 2 in the code.

Also, will there be a guideline on how to organize the dataset for training your method? (e.g. How to download, data format, path)

Many thanks,

Hamnosys text to model input

Hi authors,
thank you for sharing the code base.

How can I convert the the Hamnosys text say (hamextfingerl,hampalml,hamchest,hammoveur,hamplus,hamsymmlr) to the model input which is an utf-8 encoding like
(\xee\x83\xa9\xee\x80\x80\xee\x80\x8d\xee\x80\xa6\xee\x80\xba\xee\x81\xaa\xee\x83\xa0\xee\x83\x90\xee\x81\xa9\xee\x83\xa1\xee\x82\x89\xee\x83\x86)

How to get the text description from HamNoSys?

image

Hi,

I would like to know how did you extract the text description in the figure like: "Two flat hands with fingers closed, rotated towards each other, touching, then symmetrically moving diagonally downwards". Thank you very much!

Best

tokenizer issue

Hi! I am getting this issue after the installing the dependencies
RuntimeError: Failed to import transformers.models.auto because of the following error (look up to see its traceback): No module named 'tokenizers.pre_tokenizers'

Is anyone else facing this issue?

Saving Ham2Pose as json files

Hi authors,
Thank you for sharing your remarkable code base. I was trying to get the coco keypoints in json formats (similar to openpose outputs). But I am finding hard to extract it because of the pose_format library module. Can you guide me in getting the json file for every frame ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.