Hello there, this is my github repository where I push my assignment for the natural langauge processin online bootcamp hosted by Cupoy. Below is a table showing the topic(s) for each of the task given. The tasks were originally designed to be completed on a daily basis, but as a student, I have to say that keeping the homeworks done on a daily basis is beyond my capacity. So, I do this whenever I can (mostly during winter and summer vacations), and I also try to review the previously completed notebooks/tasks when I'm not working on new assignments.
Task # | Task | Note |
---|---|---|
Day 1 | Python string operation | Changed filename formatting |
Day 2 | Python string operation | Changed filename formatting |
Day 3 | Regular expression (Regex) | Changed filename formatting |
Day 4 | Regular expression (Regex) in python | |
Day 5 | Word segmentation - introduction (Markov Model, HMM, Viterbi algorithm) |
Added comparison with solution |
Day 6 | Word segmentation with jieba | |
Day 7 | Word segmentation with Ckiptagger | |
Day 8 | N-gram (bigram counts, bigram probability) |
|
Day 9 | N-gram (basic language model / next word prediction using n-gram) |
|
Day 10 | Part-of-speech tagging - introduction | |
Day 11 | Part-of-speech tagging using jieba | |
Day 12 | Bag-of-words - introduction | |
Day 13 | Stemming and lemmatization - introduction | |
Day 14 | Text preprocessing (regex, text segmentation, stop words & stemming) | |
Day 15 | Term Frequency - Inverted Document Frequency (TF-IDF) | |
Day 16 | Word embedding & SVD (Singular value decomposition) | Added comments |
Day 17 | Word embedding, SVD, KNN, PPMI, TF-IDF & Co-occurrence matrix | |
Day 18 | Individual research - LDA/PCA/Supervised & unsupervised learning | |
Working on an individual research project for a mandatory course | ||
Day 19 | K-nearest neighbors algorithm practice with sklearn | |
Day 20 | K-nearest neighbors algorithm practice with sklearn | |
Day 21 | Naive Bayes - individual research assignment | |
Day 22 | Naive Bayes (hand craft) | Added comments |
Day 23 | Naive Bayes (with scikit learn) | |
Day 24 | Decision tree (Information gain) |
2021/08/28 Added comments |
Day 25 | Bias-variance tradeoff | |
Day 26 | Ensemble learning - Blending vs. Stacking | |
Day 27 | Implementation of random forest and decision tree | |
Day 28 | Tree-based models using Scikitlearn | |
Day 29 | Final project 1 n-gram based word recommendation system (Part 1) |
|
Day 30 | Final project 1 n-gram based word recommendation system (Part 2) |
interpolation/base-off smoothing |
Day 31 | Final project 2 News classifier (Part 1) |
POS, BOW, Cosine similarity |
Day 32 | Final project 2 News classifier (Part 2) |
TFIDF and PCA |
Day 33 | Final project 2 News classifier (Part 3) |
PPMI and SVD |
Day 34 | Final project 3 Spam filter (Part 1) |
Comparison of different classifiers |
Day 35 | Final project 3 Spam filter (Part 2) |
Implementation of filter |
Day 36 | Final project 4 Sentiment analysis |
|
Day 37 | Final project 5 Latent sentiment analysis |
|
Day 38 | Final project 6 Trigram application (Article spinner) |
Added non-probablistic replacement and 5-gram |
Day 39 | Final project 7 Rule-based chatbot (Single round) |
|
Day 40-42 | Final project 8 Rule-based chatbot (Multiple-round) |
Google Dialogflow and Line Bot integration |
Task # | Task | Note |
---|---|---|
Day 1 | Google colab setup | |
Day 2 | Tensor operation / Pytorch | |
Day 3 | Pytorch autograd / differentiation / backpropagation |
Added comments |
Day 4 | Pytorch data loading | Added comments |
Day 5 | Pytorch data loading | Added comments |
Day 6 | Pytorch Natural language data loading (using torchtext) | Added comments |
Day 7 | Pytorch neural network model building | Added comments |
Day 8 | Pytorch - model modification, register_forward/backward_hook , and weight initializatiion |
|
Day 9 | Pytorch - model builiding | Added notes on cross-entropy loss |
Day 10 | Pytorch - model training | |
Day 11 | (Individual research/reading assignment) Introduction to word2vec | |
Day 12 | (Individual research/reading assignment) Introduction to CBOW and skipgram | |
Day 13 | Implementing CBOW and skipgram with python | |
Day 14 | (Individual research/reading assignment) Introduction to accelerating word2vec | |
Day 15 | Implementing accelerated word2vec using subsampling/Training a skipgram model | |
Day 16 | Introduction to gensim natural language processing toolkit |
|
Day 17 | Using GloVe model with gensim |
|
Day 18 | Introduction to Recurrent Neural Network (RNN) |