Sentiment Prediction for Virgin America and United Airline tweet data using NLP. Data obtained from ZS for a hackerearth data scientist challenge.
Predict_Sentiments_of_Tweets contains the main code.
- Tweets cleaned for punctuations, @, link addresses
- Use BeautifulSoup from bs4 and WordPunctTokenizer from nltk.tokenize
- Classification of tweet sentiment (positive, neutral, negative).
- TfidfVectorizer w/ or w/o stop words and ngram range 1 to 2.
- RandomForestClassifier, LinearSVC, MultinomialNB, LogisticRegression
- 5-fold Cross-validation
- LinearSVC w/ Bigrams, no stopwords on cleaned tweets (Accuracy: 74.48%)