Reproducability study on the "Text Summarization with Pretrained Encoders" paper by Yang Liu and Mirella Lapata. In contrast to the original implementation, we use the Julia Language with Flux.jl and Transformers.jl for building the model.
The study is conducted as part of the Data Mining lecture at Martin Luther University Halle-Wittenberg.
-
Install Julia.
-
Setup PyCall.jl to use a Conda environment.
julia --project=./ ./src/setup_python.jl
-
Start Pluto notebook.
julia --project=./ ./src/start_notebook.jl
To train the model, tick the corresponding checkbox from the notebook. Alternatively, run
julia --project=./ ./src/training.jl
(which just runs the training from the notebook).
If needed (e.g., on a GPU cluster), you can install Julia locally.
./install-julia.sh
You can then run Julia with ./julia
.
-
Install Docker.
-
Build a Docker container with this project.
docker build -t text-summarization-reproducability .
-
Start Pluto notebook.
docker run -p 1234:1234 -it text-summarization-reproducability
Note that Julia runs rather slow inside Docker.
To keep code quality at a high level, all commits are atomic, which can be checked using the following command.
git --no-pager log --all --graph --no-color --date=short --pretty='format:%h%d (%s, %ad)'
This project is MIT licensed, so you can use the code for whatever you want as long as you mention this repository.