Submission of Group 1 for 'Erweiterungsmodul Computerlinguistik' @ LMU Munich 2024
This term project on Sign Language Translation is a submission for the 'Erweiterungsmodul Computerlinguistik' course at the Ludwig-Maximilians-Universtät in Munich. The course is taught by Özge Alacam and Beiduo Chen. The authors perform Sign Language Sense Disambiguation on newly created and augmented datasets.
This project researches the improvement possibilities of disambiguation using a transformer-based model by enhancing the available sign language data with cut-outs of specific bodyparts. It is based on the prior research by Camgöz et al. (2020).
All dependencies required for the project can be installed using pip install -r .\requirements.txt
. To download the raw data for our project cd to the data
folder and run python ./download_raw_data.py
. This will download the RWTH-PHOENIX-Weather Database and unpack all files under data/raw_data
. Furthermore, the extracted bodyparts are part of the downloaded files. If you want to rerun the bodypart extraction manually on the dataset, you can do so by running python ./bodypart_extraction.py
.
To create the datasets for our experiments, you can then run the preprocessing using python ./run_preprocessing.py
. This will create datasets which combine the original images of the RWTH-PHOENIX-Weather Database with the extracted bodyparts. You can configure the preprocessing steps using the preprocess_config.yaml
file inside the data folder:
- gpt_subs: set this option to True for GPT substitutions for placeholders in the natural language translations (this option requires access to the OpenAI API. Therefore, you will need to set an OpenAI API key in your env variables)
- gpt_full: set this option to True for GPT generated replacements of the natural language translations (this option requires access to the OpenAI API. Therefore, you will need to set an OpenAI API key in your env variables)
- augmented: set this option to True for creating extra training data using data augmentation (image flipping, greyscaling)
- bleu: set this option to True to align the preprocessed gloss with the original gloss using bleu score (instead of worset overlaps)
If you do not want to run the preprocessing from scratch, you can download our best performing datasets (bleu alignment, with gpt substitutions and augmented data) using python ./download_preprocessed_datasets.py
Furthermore, we tried out finetuning the original transformer (as reported in Camgöz et al. (2020)) with our custom created datasets. For this, we pretrained the original transformer for 70 epochs. You can find checkpoints under: https://drive.google.com/file/d/11YX0lTdkRF09xdT9UzuZ42zTvMyldR1I/view?usp=sharing
The transformer can be trained with python -m signjoey train CONFIG
As configuration files, we provide two options:
configs/generate_config.py
: Script to create a config file for the desired dataset by editing the parameters within the script. You should adjust hyperparameters and the path to the train, test, dev files.configs/baseline.yaml.example
: Standard configuration by Camgöz et al. (2020).
The transformer can be finetuned with python -m signjoey fine_tune <your-config> --ckpt <your-checkpoint>
You can download the checkpoint from a pretrained version of the original transfomer under: https://drive.google.com/file/d/11YX0lTdkRF09xdT9UzuZ42zTvMyldR1I/view?usp=sharing
The links to all of the datasets can be found in the file data/links.md
. We list only the best performing dataset here:
Body Part | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | ROUGE | CHRF |
---|---|---|---|---|---|---|
Baseline | 68.60 | 66.37 | 65.48 | 64.93 | 70.60 | 71.98 |
HandsAndWholeData | 76.38 | 74.85 | 74.20 | 73.81 | 77.25 | 80.10 |
MouthAndWholeData | 68.65 | 66.79 | 66.12 | 65.80 | 68.58 | 72.24 |
HandsMouthAndWholeData | 73.27 | 71.14 | 70.25 | 69.76 | 74.06 | 75.73 |
We want to thank our supervisors Özge Alacam and Beiduo Chen for their support and advice in the developing stages of the project.
This project was finished on August 2nd, 2024.