Git Product home page Git Product logo

yelp2018sentimentanalysis's Introduction

Yelp2018SentimentAnalysis

Insights:

Strippping punctuation seems to increase performance by 1% -> Better word tokenizing (Try Punkt tokenizer trained on text from nltk instead of default)

Class-balancing nukes the accuracy by 5%.

Ranking:

  1. 65.8%~~

70 words, no punctuation, no stopwords, FB vectors, CNN+LSTM

Epoch 200+ - 27s - loss: 0.4310 - percentageCorrect: 65.9728 - val_loss: 0.4347 - val_percentageCorrect: 65.7453


Layer (type) Output Shape Param #

embedding_1 (Embedding) (None, 70, 300) 57183900


dropout_1 (Dropout) (None, 70, 300) 0


conv1d_1 (Conv1D) (None, 66, 32) 48032


max_pooling1d_1 (MaxPooling1 (None, 13, 32) 0


lstm_1 (LSTM) (None, 70) 28840


dense_1 (Dense) (None, 200) 14200


dropout_2 (Dropout) (None, 200) 0


dense_2 (Dense) (None, 200) 40200


dense_3 (Dense) (None, 1) 201

Total params: 57,315,373 Trainable params: 57,315,373 Non-trainable params: 0


Predicting: Stars, Funny, Useful, Cool

Ranking:

  1. 66.2% accuracy

80 words, no punctuation, no stopwords, FB vectors, CNN

Epoch 123/200

  • 12s - loss: 0.6433 - percentageCorrect: 66.8893 - val_loss: 0.6942 - val_percentageCorrect: 66.2160

model = Sequential() model.add(embedding_layer) model.add(Conv1D(filters=32, kernel_size=10, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2)) SpatialDropout1D(0.5) model.add(Conv1D(filters=16, kernel_size=10, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2)) model.add(Conv1D(filters=8, kernel_size=10, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2)) SpatialDropout1D(0.5) model.add(Conv1D(filters=8, kernel_size=10, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2)) SpatialDropout1D(0.5) model.add(Flatten()) model.add(Dense(512, activation='relu')) model.add(Dropout(0.8)) model.add(Dense(512, activation='relu')) model.add(Dropout(0.8)) model.add(Dense(256, activation='relu')) model.add(Dropout(0.8)) model.add(Dense(128, activation='relu')) model.add(Dense(4)) model.compile(loss='mean_absolute_error', optimizer='adam', metrics=[percentageCorrect])


  1. 65.34% Accuracy

80 words, puntuaction, no stopwords FB vectors, CNN

Epoch 87/200

  • 11s - loss: 0.6843 - percentageCorrect: 65.4592 - val_loss: 0.6861 - val_percentageCorrect: 65.3440

model = Sequential() model.add(embedding_layer) model.add(Conv1D(filters=32, kernel_size=10, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2)) SpatialDropout1D(0.5) model.add(Conv1D(filters=16, kernel_size=10, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2)) model.add(Conv1D(filters=8, kernel_size=10, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2)) SpatialDropout1D(0.5) model.add(Conv1D(filters=8, kernel_size=10, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2)) SpatialDropout1D(0.5) model.add(Flatten()) model.add(Dense(512, activation='relu')) model.add(Dropout(0.8)) model.add(Dense(512, activation='relu')) model.add(Dropout(0.8)) model.add(Dense(256, activation='relu')) model.add(Dropout(0.8)) model.add(Dense(128, activation='relu')) model.add(Dense(4)) model.compile(loss='mean_absolute_error', optimizer='adam', metrics=[percentageCorrect])

yelp2018sentimentanalysis's People

Contributors

theclassypenguin avatar

Watchers

 avatar

yelp2018sentimentanalysis's Issues

About Yelp2018 dataset

Hello dear author,
Can you provide original Yelp2018 dataset? I want to do some test on this dataset,but I can't find that anywhere.
You have my eternal gratefulness

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.