Java codes for EMNLP paper: A Log-Linear Model for Unsupervised Text Normalization
The training file has the following format:
<s> standardToken1 standardToken2 ... nonstandardToken1:nonstd ... </s>
For example,
<s> hello , nice 2:nonstd meet u:nonstd ! </s>
SGD training codes will be provided later.