You need to download the dataset and some tools:
-
Download Yelp dataset
-
Convert following datasets from json format to csv format by using
json_to_csv_converter.py
:yelp_academic_dataset_review.json
yelp_academic_dataset_user.json
-
Download LIBLINEAR
After downloading
liblinear
, you can refer toInstallation
to install it.It is suggested that you put
liblinear
under the directorySocializedWordEmbeddings
. -
Download Stanford CoreNLP
Only
stanford-corenlp.jar
is required.SocializedWordEmbeddings/preprocess/Split_NN.jar
andSocializedWordEmbeddings/preprocess/Split_PPL.jar
need to referencestanford-corenlp.jar
.It is suggested that after getting
stanford-corenlp.jar
, you put it under the directorySocializedWordEmbeddings/resources
, otherwise, you should modify the defaultClass-Path
inSplit_NN.jar
andSplit_PPL.jar
.
cd SocializedWordEmbeddings/preprocess
Modify ./run.py
by specifying --input
(Path to yelp dataset).
python run.py
cd SocializedWordEmbeddings/train
You may modify the following arguments in ./run.py
:
--para_lambda
The trade off parameter between log-likelihood and regularization term--para_r
The constraint of L2-norm of the user vector--yelp_round
The round number of yelp data, e.g. {8,9}
python run.py
cd SocializedWordEmbeddings/sentiment
You may modify the following arguments in ./run.py
:
--para_lambda
The trade off parameter between log-likelihood and regularization term--para_r
The constraint of L2-norm of the user vector--yelp_round
The round number of yelp data, e.g. {8,9}
python run.py
cd SocializedWordEmbeddings/perplexity
You may modify the following arguments in ./run.py
:
--para_lambda
The trade off parameter between log-likelihood and regularization term--para_r
The constraint of L2-norm of the user vector--yelp_round
The round number of yelp data, e.g. {8,9}
python run.py
We thank Tao Lei as our code is developed based on his code.
You can simply re-implement our results of different settings (Table 5 in the paper) by modifying the SocializedWordEmbeddings/attention/run.sh
:
[1] add user and word embeddings by specifying --user_embs
and --embedding
.
[2] add train/dev/test files by specifying --train
, --dev
, and --test
respectively.
[3] three settings for our experiments could be achieved by specifying --user_atten
and --user_atten_base
:
setting '--user_atten 0' for 'Without attention'.
setting '--user_atten 1 --user_atten_base 1' for 'Trained attention'
setting '--user_atten 1 --user_atten_base 0' for 'Fixed user vector as attention'.
- Python 2.7
- Theano >= 0.7
- Numpy
- Gensim
- PrettyTable