jonbeibeibei / ml_sentiment_analysis Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 1.93 MB

Developing automated systems for analyzing sentiment information associated with social media data

Python 100.00%

ml_sentiment_analysis's People

Contributors

Watchers

ml_sentiment_analysis's Issues

Estimating transition parameters

Write a function that estimates the transition parameters from the training set using MLE (maximum likelihood estimation):

q(y(i) | y(I−1)) = Count(y(i−1), y(i)) / Count(y(i−1))

Please make sure the following special cases are also considered: q(STOP|y(n)) and q(y(1)|START).

Adding #UNK# to training set

One problem with estimating the emission parameters is that some words that appear in the test set do not appear in the training set. One simple idea to handle this issue is as follows. First, replace those words that appear less than k times in the training set with a special token #UNK# before training. This leads to a “modified training set”. We then use such a modified training set to train our model.

During the testing phase, if the word does not appear in the “modified training set”, we replace that word with #UNK# as well.

Set k to 3, implement this fix into your function for computing the emission parameters.

Parser

Read training data into usable class

Function for emission parameters

Point 1 of part 2:

Write a function that estimates the emission parameters from the training set using MLE
(maximum likelihood estimation):
e(x|y) = Count(y → x) / Count(y)

Get work done

Sentiment analysis

Implement a simple sentiment analysis system that produces the tag
y* = argmax e(x|y)
for each word x in the sequence.

For all the four datasets EN, FR, CN, and SG, learn these parameters with train, and evaluate your system on the development set dev.in for each of the dataset. Write your output to dev.p2.out for the four datasets respectively. Compare your outputs and the gold-standard outputs in dev.out and report the precision, recall and F scores of such a baseline system for each dataset.

The precision score is defined as follows:
Precision = Total number of correctly predicted entities / Total number of predicted entities

The recall score is defined as follows:
Recall = Total number of correctly predicted entities / Total number of gold entities

where a gold entity is a true entity that is annotated in the reference output file, and a predicted entity is regarded as correct if and only if it matches exactly the gold entity (i.e., both their boundaries and sentiment are exactly the same).

Finally the F score is defined as follows:
F= 2 / (1/Precision + 1/Recall)

You can use the evaluation script shared with you to calculate such scores. However it is strongly encouraged that you understand how the scores are calculated.

Note: in some cases, you might have an output sequence that consists of a transition from O to I-negative (rather than B-negative). For example, “O I-negative I-negative O”. In this case, the second and third words should be regarded as one entity with negative sentiment.

Implement the Viterbi algorithm

Use the estimated transition and emission parameters, implement the Viterbi algorithm to
compute the following (for a sentence with n words):

y(1)∗,...,y(n)∗ = argmaxp(x(1),...,x(n),y(1),...,y(n))
y(1) ,...,y(n)

For all datasets, learn the model parameters with train. Run the Viterbi algorithm on the develop- ment set dev.in using the learned models, write your output to dev.p3.out for the four datasets respectively. Report the precision, recall and F scores of all systems.

Note: in case you encounter potential numerical underflow issue, think of a way to address such an issue in your implementation.

jonbeibeibei / ml_sentiment_analysis Goto Github PK

ml_sentiment_analysis's People

Contributors

Watchers

ml_sentiment_analysis's Issues

Estimating transition parameters

Adding #UNK# to training set

Parser

Function for emission parameters

Get work done

Sentiment analysis

Implement the Viterbi algorithm

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent