The key takeaways from this section include:
- NLP has become increasingly popular over the past few years, and NLP researchers have achieved very insightful insights
- The Natural Language Tool Kit (NLTK) is one of the most popular Python libraries for NLP
- Regular Expressions are an important part of NLP, which can be used for pattern matching and filtering
- Regular Expressions can become confusing, so make sure to use our provided cheat sheet the first few times you work with regex
- It is strongly recommended you take some time to use regex tester websites to ensure you understand how changing your regex pattern affects your results when working towards a correct answer!
- Feature Engineering is essential when working with text data, and to understand the dynamics of your text
- Common feature engineering techniques are removing stop words, stemming, lemmatization, and n-grams
- When diving deeper into grammar and linguistics, context-free grammars and part-of-speech tagging is important
- In this context, parse trees can help computers when dealing with ambiguous words
- How you clean and preprocess your data will have a major effect on the conclusions you'll be able to draw in your NLP classification problems