Git Product home page Git Product logo

m2t-segmentation's Introduction

Description

Official implementation of our paper for synchronized motion to text generation:

  • In this project, we introduce synchronized captioning of 3D human motion.

  • This work aims to present first experiments for progressive text generation in synchronization with motion action times.

KIT-ML
GIF 1 GIF 2 GIF 3
HumanML3D
GIF 4 GIF 5 GIF 6

If you find this code useful in your work, please cite:

@article{radouane23motion2language,
   author={Radouane, Karim and Tchechmedjiev, Andon and Lagarde, Julien and Ranwez, Sylvie},
   title={Motion2language, unsupervised learning of synchronized semantic motion segmentation},
   journal={Neural Computing and Applications},
   ISSN={1433-3058},
   url={http://dx.doi.org/10.1007/s00521-023-09227-z},
   DOI={10.1007/s00521-023-09227-z},
   publisher={Springer Science and Business Media LLC},
   year={2023},
   month=dec}

Installation

conda env create -f environment.yaml
conda activate raypy310
python -m spacy download en-core-web-sm

Preprocess dataset

  • Original KIT version

You can find the pre-processed version of this dataset here Pre-processed KIT-ML. Next, you can set the directories of each file in 'path_txt' and 'path_motion' variables of evaluate_m2l.py file.

If you use the KIT Motion-Language dataset, please cite the following paper : Original KIT

  • Augmented KIT-ML version and HumanML3D

You can download both datasets following the steps at this repository Augmented datasets.

In this case you need to perform these pre-processing steps for text corrections and motion normalization.

  1. Use the datasets/build_data.py to build the dataset, set the absolute path of the selected dataset in each case.
  2. The step 1 will save a numpy file to the specified path, set this path directory to path variable to dataset class (HumanML3D: datasets/h3d_m2t_dataset_.py or KIT-ML : datasets/kit_m2t_dataset.py ) and run this file.
  3. The step 2 will generate sentences_corrections.csv file, the absolute path of this file define path_txt and numpy file generated at step 1 define path_motion.

Download Pretrained Models

  • Original KIT version

Models with different configurations are available here: Models Original KIT

  • Augmented KIT-ML version and HumanML3D

Models for both datasets are available here: Models

In the following for each python script, all available choices can be displayed by running python name_script.py --help

Training

As described in the paper Motion2Language the attention mechanisms was done as follows:

Soft Attention: Experimented with a model employing a GRU encoder for both Cartesian and angles input types.

Local Attention: Explored a GRU encoder-based model specifically designed for Cartesian input.

Local Recurrent Attention: Conducted experiments with models utilizing various encoder types, all tailored for Cartesian input.

python tune_train.py --input_type INPUT_TYPE --encoder_type ENCODER_TYPE --attention_type ATTENTION_TYPE 

Or simply, you can set a configuration path that specifies the model to train and the hyperparameters to experiment with. Additional values can be modified by updating the configuration file of the selected model.

python tune_train.py  --config CONFIG_PATH 

Examples

Using joint angles/ Soft attention

python tune_train.py --input_type angles --encoder_type BiGRU --attention_type soft

or

python tune_train.py --config ../configs/BiGRU.yaml

Using cartesian coordinates/ Local recurrent attention

python tune_train.py  --config ./configs/MLP.yaml

python tune_train.py  --config ./configs_h3D/MLP_tune.yaml --dataset_name h3D

python tune_train.py  --config ./configs_kit_aug/MLP_train.yaml --dataset_name kit

Run evaluation

General

python evaluate_m2L.py --path PATH --input_type INPUT_TYPE --encoder_type ENCODER_TYPE --attention_type ATTENTION_TYPE --D D --mask MASK --subset SUBSET

D and mask arguments should be specified only in the case of local attention mode, default is D=5; mask=True

or using a config file :

python src/evaluate_m2L.py --config ./configs/soft/GRU.yaml

BLEU score : corpus level

First, to obtain a corpus-level BLEU score, the batch size should be the size of the evaluation subset. If memory is not sufficient, you can use the CSV file of the output predictions. Set this path in bleu_from_csv.py and run this script to compute NLP metric scores.

python src/evaluate_m2L.py --config  CONFIG_PATH

Config Path Format : f./configs/{attention_type}/{encoder_type}_{input_type}.yaml

The input_type ìs by default set to "cartesian".

Different values of D python src/evaluate_m2L.py --config ./configs/local_rec/MLP_D=9.yaml

Augmented KIT and HumanML3D

In this case the argument dataset_name should be specified, the default was kit2016.

  1. KIT-MLD Augmented dataset
  • MLP: python src/evaluate_m2L.py --config ./configs_kit_aug/MLP.yaml --dataset_name kit
  • deep-MLP: python src/evaluate_m2L.py --config ./configs_kit_aug/deep-MLP.yaml --dataset_name kit
  1. HumanML3D
  • MLP: python src/evaluate_m2L.py --config ./configs_h3D/MLP.yaml --dataset_name h3D

Visualizations

1. Human Pose Animation

Generate skeleton animation with synchronized text (specially to run with model based on local recurrent attention for better plot, local attention can be used to compare visually both synchronization performances)

  • Run with the default model
python visualizations/poses2concepts.py --n_map NUMBER_ATTENTION_MAP --n_gifs NUMBER_3D_ANIMATIONS --save_results DIRECTORY_SAVE_PLOTS

or using a config file

  • Examples
python visualizations/poses2concepts.py --config ./configs/local_rec/MLP.yaml --n_map 1 --n_gifs 105 --save_results ./gifs_map_orig

python visualizations/poses2concepts.py --config ./configs_kit_aug/MLP.yaml --dataset_name kit --n_map 5 --n_gifs 50 --save_results ./gifs_map_kit22_

python visualizations/poses2concepts.py --config ./configs_h3D/MLP.yaml --dataset_name h3D --n_map 1 --n_gifs 100 --save_results ./gifs_map_h3D_b_1

2. Frozen in Time

To visualize frozen motion for the analysis of motion-language synchronization perception, you can use froze_motion.py. More details will be added later on.

Beam search

Beam searching can simply be done by adding the argument of the beam size --beam_size

python src/evaluate_m2L.py --config ./configs/local_rec/deep-MLP.yaml --beam_size 1

BEAM_SIZE : (=1 default : Greedy search) (>1 for beam searching)
This script will print the BLEU-4 score for each beam and write beam predictions under the file result_beam_size_{BEAM_SIZE}_.txt

Segmentation scores

  • Only for the original KIT-ML dataset

The main script to run for separate segmentation results :

  • Evaluate segmentation results of one model python src/seg_eval.py --config CONFIG_PATH

  • Compare segmentation of multiple models

python src/segmentation_eval.py This script generates segmentation score curves and saves them as figures in the working directory.

Examples

  • MLP : python src/seg_eval.py --config ./configs/local_rec/MLP.yaml
  • Deep-MLP : python src/seg_eval.py --config ./configs/local_rec/deep-MLP.yaml

Main paper results

Soft attention :

  • GRU-Angles : python src/evaluate_m2L.py --config ./configs/soft/GRU_angles.yaml
  • BiGRU-Angles : python src/evaluate_m2L.py --config ./configs/soft/BiGRU_angles.yaml

Local attention :

  • GRU-Cartesian : python src/evaluate_m2L.py --config ./configs/local/GRU.yaml
  • BiGRU-Cartesian : python src/evaluate_m2L.py --config ./configs/local/BiGRU.yaml

Local recurrent attention :

  • GRU-Cartesian : python src/evaluate_m2L.py --config ./configs/local_rec/GRU.yaml
  • BiGRU-Cartesian : python src/evaluate_m2L.py --config ./configs/local_rec/BiGRU.yaml
  • MLP : python src/evaluate_m2L.py --config ./configs/local_rec/MLP.yaml
  • deep-MLP : python src/evaluate_m2L.py --config ./configs/local_rec/deep-MLP.yaml

Notebook

An interactive notebook demonstrating all functionalities of this project will be available soon.

License

This project is under MIT license.

m2t-segmentation's People

Contributors

rd20karim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

m2t-segmentation's Issues

Vocab size mismatch

I processed the data as mentioned in the README and get the following error when I run

 python visualizations/poses2concepts.py --config ./configs_kit_aug/MLP.yaml --dataset_name kit --n_map 5 --n_gifs 50 --save_results ./gifs_map_kit22_

ERROR MESSAGE:

    2024-05-01 18:12:11,523 [INFO] Building dataset ... 
    2024-05-01 18:12:11,823 [INFO] CORRECT TOKENS False
    2024-05-01 18:12:12,073 [INFO] Building vocabulary with minimum frequency : 3
    2024-05-01 18:12:12,422 [INFO] Std/Mean Normalization
    2024-05-01 18:12:13,426 [INFO] Convert token to numerical values
    2024-05-01 18:12:13,767 [INFO] Using official split
    2024-05-01 18:12:13,853 [INFO] Number  of samples VAL: 470 TRAIN: 7451, TEST: 1376 
    2024-05-01 18:12:16,133 [INFO] VOCAB SIZE  = 1709 
    2024-05-01 18:12:17,974 [INFO] Scaling 1.0
    2024-05-01 18:12:17,981 [INFO] Applying local_recurrent attention
    Traceback (most recent call last):
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/visualizations/poses2concepts.py", line 180, in <module>
        loaded_model, train_data_loader, test_data_loader,val_data_loader = load_model_config(device=device,args=args)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/src/evaluate_m2L.py", line 76, in load_model_config
        loaded_model.load_state_dict(new_dict)
      File "/home/dhegde/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for seq2seq:
            size mismatch for dec.embedding.weight: copying a param with shape torch.Size([878, 64]) from checkpoint, the shape in current model is torch.Size([1709, 64]).
            size mismatch for dec.fc.weight: copying a param with shape torch.Size([878, 192]) from checkpoint, the shape in current model is torch.Size([1709, 192]).
            size mismatch for dec.fc.bias: copying a param with shape torch.Size([878]) from checkpoint, the shape in current model is torch.Size([1709]).

Error in data pre-processing

When I run step 3 to generate sentence_corrections.csv, I get the following error:

    [INFO] CORRECT TOKENS True
    [INFO] INITIAL CORRECTIONS AND LOWER CASING ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Text to int, one --> 1 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 15 corrected perso ----> person 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, one --> 1 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 47 corrected  ----> i 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] EMPTY TOKEN [[ ]]
    [INFO] token   not found
    [INFO] Token index 61 corrected foward ----> forward 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, four --> 4 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 69 corrected sim ----> him 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 71 corrected  ----> i 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, one --> 1 
    [INFO] Text to int, three --> 3 
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, two --> 2 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 116 corrected counter- ----> counter 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 117 corrected  ----> i 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, three --> 3 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, two --> 2 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 134 corrected staight ----> straight 
    [INFO] Token index 134 corrected soemthing ----> something 
    [INFO] Token index 136 corrected ’s ----> is 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 151 corrected ... ----> None 
    Traceback (most recent call last):
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/kit_m2t_dataset.py", line 203, in <module>
        data = dataset_class(path, filter_data=True,min_freq=3)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/kit_m2t_dataset.py", line 46, in __init__
        self.lang = vocabulary(self.sentences, correct_tokens=correct_tokens, ask_user=False)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/vocabulary.py", line 94, in __init__
        self.token_correction(ask_user)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/vocabulary.py", line 158, in token_correction
        desc = " ".join(tokens)
    TypeError: sequence item 4: expected str instance, NoneType found

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.