Description

Official implementation of our paper for synchronized motion to text generation:

Motion2Language, unsupervised learning of synchronized semantic motion segmentation

In this project, we introduce synchronized captioning of 3D human motion.
This work aims to present first experiments for progressive text generation in synchronization with motion action times.

KIT-ML

HumanML3D

If you find this code useful in your work, please cite:

@article{radouane23motion2language,
   author={Radouane, Karim and Tchechmedjiev, Andon and Lagarde, Julien and Ranwez, Sylvie},
   title={Motion2language, unsupervised learning of synchronized semantic motion segmentation},
   journal={Neural Computing and Applications},
   ISSN={1433-3058},
   url={http://dx.doi.org/10.1007/s00521-023-09227-z},
   DOI={10.1007/s00521-023-09227-z},
   publisher={Springer Science and Business Media LLC},
   year={2023},
   month=dec}

Installation

conda env create -f environment.yaml
conda activate raypy310
python -m spacy download en-core-web-sm

Preprocess dataset

Original KIT version

You can find the pre-processed version of this dataset here Pre-processed KIT-ML. Next, you can set the directories of each file in 'path_txt' and 'path_motion' variables of evaluate_m2l.py file.

If you use the KIT Motion-Language dataset, please cite the following paper : Original KIT

Augmented KIT-ML version and HumanML3D

You can download both datasets following the steps at this repository Augmented datasets.

In this case you need to perform these pre-processing steps for text corrections and motion normalization.

Use the datasets/build_data.py to build the dataset, set the absolute path of the selected dataset in each case.
The step 1 will save a numpy file to the specified path, set this path directory to path variable to dataset class (HumanML3D: datasets/h3d_m2t_dataset_.py or KIT-ML : datasets/kit_m2t_dataset.py ) and run this file.
The step 2 will generate sentences_corrections.csv file, the absolute path of this file define path_txt and numpy file generated at step 1 define path_motion.

Download Pretrained Models

Original KIT version

Models with different configurations are available here: Models Original KIT

Augmented KIT-ML version and HumanML3D

Models for both datasets are available here: Models

In the following for each python script, all available choices can be displayed by running python name_script.py --help

Training

As described in the paper Motion2Language the attention mechanisms was done as follows:

Soft Attention: Experimented with a model employing a GRU encoder for both Cartesian and angles input types.

Local Attention: Explored a GRU encoder-based model specifically designed for Cartesian input.

Local Recurrent Attention: Conducted experiments with models utilizing various encoder types, all tailored for Cartesian input.

python tune_train.py --input_type INPUT_TYPE --encoder_type ENCODER_TYPE --attention_type ATTENTION_TYPE

Or simply, you can set a configuration path that specifies the model to train and the hyperparameters to experiment with. Additional values can be modified by updating the configuration file of the selected model.

python tune_train.py  --config CONFIG_PATH

Examples

Using joint angles/ Soft attention

python tune_train.py --input_type angles --encoder_type BiGRU --attention_type soft

python tune_train.py --config ../configs/BiGRU.yaml

Using cartesian coordinates/ Local recurrent attention

python tune_train.py  --config ./configs/MLP.yaml

python tune_train.py  --config ./configs_h3D/MLP_tune.yaml --dataset_name h3D

python tune_train.py  --config ./configs_kit_aug/MLP_train.yaml --dataset_name kit

Run evaluation

General

python evaluate_m2L.py --path PATH --input_type INPUT_TYPE --encoder_type ENCODER_TYPE --attention_type ATTENTION_TYPE --D D --mask MASK --subset SUBSET

D and mask arguments should be specified only in the case of local attention mode, default is D=5; mask=True

or using a config file :

python src/evaluate_m2L.py --config ./configs/soft/GRU.yaml

BLEU score : corpus level

First, to obtain a corpus-level BLEU score, the batch size should be the size of the evaluation subset. If memory is not sufficient, you can use the CSV file of the output predictions. Set this path in bleu_from_csv.py and run this script to compute NLP metric scores.

python src/evaluate_m2L.py --config  CONFIG_PATH

Config Path Format : f./configs/{attention_type}/{encoder_type}_{input_type}.yaml

The input_type ìs by default set to "cartesian".

Different values of D python src/evaluate_m2L.py --config ./configs/local_rec/MLP_D=9.yaml

Augmented KIT and HumanML3D

In this case the argument dataset_name should be specified, the default was kit2016.

KIT-MLD Augmented dataset

MLP: python src/evaluate_m2L.py --config ./configs_kit_aug/MLP.yaml --dataset_name kit
deep-MLP: python src/evaluate_m2L.py --config ./configs_kit_aug/deep-MLP.yaml --dataset_name kit

HumanML3D

MLP: python src/evaluate_m2L.py --config ./configs_h3D/MLP.yaml --dataset_name h3D

Visualizations

1. Human Pose Animation

Generate skeleton animation with synchronized text (specially to run with model based on local recurrent attention for better plot, local attention can be used to compare visually both synchronization performances)

Run with the default model

python visualizations/poses2concepts.py --n_map NUMBER_ATTENTION_MAP --n_gifs NUMBER_3D_ANIMATIONS --save_results DIRECTORY_SAVE_PLOTS

or using a config file

Examples

python visualizations/poses2concepts.py --config ./configs/local_rec/MLP.yaml --n_map 1 --n_gifs 105 --save_results ./gifs_map_orig

python visualizations/poses2concepts.py --config ./configs_kit_aug/MLP.yaml --dataset_name kit --n_map 5 --n_gifs 50 --save_results ./gifs_map_kit22_

python visualizations/poses2concepts.py --config ./configs_h3D/MLP.yaml --dataset_name h3D --n_map 1 --n_gifs 100 --save_results ./gifs_map_h3D_b_1

2. Frozen in Time

To visualize frozen motion for the analysis of motion-language synchronization perception, you can use froze_motion.py. More details will be added later on.

Beam search

Beam searching can simply be done by adding the argument of the beam size --beam_size

python src/evaluate_m2L.py --config ./configs/local_rec/deep-MLP.yaml --beam_size 1

BEAM_SIZE : (=1 default : Greedy search) (>1 for beam searching)
This script will print the BLEU-4 score for each beam and write beam predictions under the file result_beam_size_{BEAM_SIZE}_.txt

Segmentation scores

Only for the original KIT-ML dataset

The main script to run for separate segmentation results :

Evaluate segmentation results of one model python src/seg_eval.py --config CONFIG_PATH
Compare segmentation of multiple models

python src/segmentation_eval.py This script generates segmentation score curves and saves them as figures in the working directory.

Examples

MLP : python src/seg_eval.py --config ./configs/local_rec/MLP.yaml
Deep-MLP : python src/seg_eval.py --config ./configs/local_rec/deep-MLP.yaml

Main paper results

Soft attention :

GRU-Angles : python src/evaluate_m2L.py --config ./configs/soft/GRU_angles.yaml
BiGRU-Angles : python src/evaluate_m2L.py --config ./configs/soft/BiGRU_angles.yaml

Local attention :

GRU-Cartesian : python src/evaluate_m2L.py --config ./configs/local/GRU.yaml
BiGRU-Cartesian : python src/evaluate_m2L.py --config ./configs/local/BiGRU.yaml

Local recurrent attention :

GRU-Cartesian : python src/evaluate_m2L.py --config ./configs/local_rec/GRU.yaml
BiGRU-Cartesian : python src/evaluate_m2L.py --config ./configs/local_rec/BiGRU.yaml
MLP : python src/evaluate_m2L.py --config ./configs/local_rec/MLP.yaml
deep-MLP : python src/evaluate_m2L.py --config ./configs/local_rec/deep-MLP.yaml

Notebook

An interactive notebook demonstrating all functionalities of this project will be available soon.

License

This project is under MIT license.

Error in data pre-processing

When I run step 3 to generate sentence_corrections.csv, I get the following error:

    [INFO] CORRECT TOKENS True
    [INFO] INITIAL CORRECTIONS AND LOWER CASING ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Text to int, one --> 1 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 15 corrected perso ----> person 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, one --> 1 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 47 corrected  ----> i 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] EMPTY TOKEN [[ ]]
    [INFO] token   not found
    [INFO] Token index 61 corrected foward ----> forward 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, four --> 4 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 69 corrected sim ----> him 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 71 corrected  ----> i 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, one --> 1 
    [INFO] Text to int, three --> 3 
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, two --> 2 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 116 corrected counter- ----> counter 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 117 corrected  ----> i 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, three --> 3 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, two --> 2 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 134 corrected staight ----> straight 
    [INFO] Token index 134 corrected soemthing ----> something 
    [INFO] Token index 136 corrected ’s ----> is 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 151 corrected ... ----> None 
    Traceback (most recent call last):
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/kit_m2t_dataset.py", line 203, in <module>
        data = dataset_class(path, filter_data=True,min_freq=3)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/kit_m2t_dataset.py", line 46, in __init__
        self.lang = vocabulary(self.sentences, correct_tokens=correct_tokens, ask_user=False)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/vocabulary.py", line 94, in __init__
        self.token_correction(ask_user)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/vocabulary.py", line 158, in token_correction
        desc = " ".join(tokens)
    TypeError: sequence item 4: expected str instance, NoneType found

rd20karim / m2t-segmentation Goto Github PK

m2t-segmentation's Introduction

Description

Installation

Preprocess dataset

Download Pretrained Models

Training

Examples

Using joint angles/ Soft attention

Using cartesian coordinates/ Local recurrent attention

Run evaluation

General

BLEU score : corpus level

Augmented KIT and HumanML3D

Visualizations

Beam search

Segmentation scores

Main paper results

Notebook

License

m2t-segmentation's People

Contributors

Stargazers

Watchers

Forkers

m2t-segmentation's Issues

Recommend Projects

Recommend Topics

Recommend Org