#Bingo Bango Bongo
"""
-
Tokenize text
-
Remove infrequent words
-
Prepend special START and END tokens. --> Learn/Tokenize the words that tend to start / end sentences.
-
Build the training data metrics --> learn the indices of most common / remembered words.
-
Input to model is going to be a sequence of words, a matrix X which will have a one-hot vectors which will assign the word to the given indice. Otherwise "UNKNOWN_TOKEN" to the unknown_token word.
-
Initialize RNN Model, initialize instance variables, initialize parameters (U, V, W) --> Will implement a Theano version later...
--> For activation function TanH(Uxt + Wst-1), it is best to initialize our parameter weights from [-1/sqrt(n) , 1/sqrt(n)]... where n is the number of incoming connections from the previous layer.
hidden_dim => the size of our hidden layer that we choose word_dim => the size of our vocabulary
-
Forward Propagation
--> Predicting word probabilities...
--> We save all hidden states in s because we need them later. Initial hidden layer initialized = 0
-
-
Prediction
-
Loss function
--> Calculate the loss, measure errors the model makes.
--> Find the parameters U, V and W that minimize the loss function for our training data.
--> (Cross Entropy Loss): if we have N training examples (words in our text) and C classes (the size of our vocabulary),
then the loss with respect to our predictions o and the true labels y is given by:
--> L(y,o) = (- 1/N) * Sum(y'n*log(o'n))
Implement with calculate_total_loss
- Back Propagation Through Time (BPTT)
"""