Run ./normalize.rb
to normalize the scripts in /raw
and output a tarball.
This normalization / tokenization was a pre-processing step on the GoT data to plugin to the Tensor Flow LSTM example I built here
By inserting the tokens as I did, it was easier to standardize the structure that was common to the scripts, but denoted differently across scripts.