buptldy / sentiment-analysis Goto Github PK

Chinese Shopping Reviews sentiment analysis

Home Page: http://buptldy.github.io/2016/07/20/2016-07-20-sentiment%20analysis/

Python 100.00%

sentiment-analysis's Introduction

#环境要求

Unix/Linux系统
python 2.7
python包安装： keras,sklearn,gensim,jieba,h5py,numpy,pandas

sudo pip install -r requirements.txt

用法

使用SVM分类器进行情感分类：

python predict.py svm 这个手机质量很好，我很喜欢，不错不错

python predict.py svm 这书的印刷质量的太差了吧，还有缺页，以后再也不买了

使用LSTM进行情感分类：

python predict.py lstm 酒店的环境不错，空间挺大的

python predict.py lstm 电脑散热太差，偶尔还蓝屏，送货也太慢了吧

#程序

code/Sentiment_lstm.py 使用word2vec和LSTM训练和预测
code/Sentiment_svm.py 使用word2vec和svm训练和预测
predict.py 调用Sentiment_lstm.py及Sentiment_svm.py进行预测

#数据

./data/ 原始数据文件夹
- data/neg.xls 负样本原始数据
- data/pos.xls 正样本原始数据
./svm_data/ svm数据文件夹
- ./svm_data/*.npy 处理后的训练数据和测试数据
- ./svm_data/svm_model/ 保存训练好的svm模型
- ./svm_data/w2v_model/ 保存训练好的word2vec模型
./lstm_data/ lstm数据文件夹
- ./lstm_data/Word2vec_model.pkl 保存训练好的word2vec模型
- ./lstm_data/lstm.yml 保存训练网络的结构
- ./lstm_data/lstm.h5 保存网络训练到的权重

#详细介绍

购物评论情感分析的实现

sentiment-analysis's People

Contributors

Stargazers

Watchers

Forkers

yyljlyy romberli wuzhongdehua anthonyyuan lxj0276 chenjun0210 michaelkook piefu michaelfeng87 anndian jiefisher inetfun senmumu melody-xiaomi wxybdth guokeda wenbotse ophsysbilla pickou wangluoworld yangduoduo36 jkhlot yumiao1203 ailiuyan warrentseng salam11 searchmodel airkid mqrshiyan zhangherui89 shiyujiaaaa samhung7 guoliang1992 lechenhao xiaotangbao dengminna chendarming topgunforone jackeyou lan1tian berryhn scushanshanchen depressive kummar godlovefei yileye hxl1990 yangzsh lianxueyongshuang eight-corner qiongxiao vickzhang gdh756462786 xiaguangmin sunyotech cccshuang alicebupt osberntw nkmeng delphine0379 shifengjuan1 duoergun0729 leung1024 lukaschen1986 kevinannn heppytt hustercn sxliuxiu zhengli8341 xuliang102663 davidtore hmzhe lefugang huangpd wtbsw verigle 1234hello zxplkyy mynamezkj ttgit 410551546qq pigliangliang sishui198 gbusr xuelinchao dgo2dance kenhy shaunhenju moolighty holy16 pengerou swetmelon sihuajian dengyijiao0385 fancycheung nigestream angelinaa zxzwxdl baobaobaobaobao fudanyuan

sentiment-analysis's Issues

使用lstm时，出现dtype的错误

我用python2.7 出现一下错误其他环境依照要求配置，若该用python3.5 也是类似dtype的问题
File "predict.py", line 23, in
lstm_predict(sentence)
File "code/Sentiment_lstm.py", line 179, in lstm_predict
model = model_from_yaml(yaml_string)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/models.py", line 200, in model_from_yaml
return layer_from_config(config, custom_objects=custom_objects)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/utils/layer_utils.py", line 40, in layer_from_config
return layer_class.from_config(config['config'])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/models.py", line 1080, in from_config
model.add(layer)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/models.py", line 327, in add
output_tensor = layer(self.outputs[0])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/engine/topology.py", line 543, in call
self.build(input_shapes[0])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/layers/recurrent.py", line 763, in build
self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 1222, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1043, in concat
dtype=dtypes.int32).get_shape(
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 676, in convert_to_tensor
as_ref=False)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 741, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 113, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 374, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got <tf.Variable 'lstm_1_W_i:0' shape=(100, 50) dtype=float32_ref> of type 'Variable' instead.

Can't fork

I want to pull your code for my graduation design, but it fails....

多分类

楼主考虑过多分类吗

error

Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("embedding_1/random_uniform:0", shape=(8310, 100), dtype=float32

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Prefix dict has been built succesfully.
Traceback (most recent call last):
File "predict.py", line 23, in
lstm_predict(sentence)
File "code/Sentiment_lstm.py", line 187, in lstm_predict
data=input_transform(string)
File "code/Sentiment_lstm.py", line 173, in input_transform
model=gensim.models.Word2Vec.load_word2vec_format('lstm_data/Word2vec_model.pkl', binary = True, unicode_errors='ignore')
File "/anaconda3/lib/python3.6/site-packages/gensim/models/word2vec.py", line 1172, in load_word2vec_format
header = utils.to_unicode(fin.readline(), encoding=encoding)
File "/anaconda3/lib/python3.6/site-packages/gensim/utils.py", line 217, in any2unicode
return unicode(text, encoding, errors=errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Tried "model=gensim.models.Word2Vec.load_word2vec_format('lstm_data/Word2vec_model.pkl', unicode_errors='ignore')", still same error.

maxlen 和input_length

请问能解释下这两个参数是什么意思吗？？谢谢

请问一下tensorflow 使用的版本是多少？

我安装的是tensorflow (1.1.0)，在使用lstm_predict(string)函数时会报如下错误，大致判断是tensorflow 版本问题

运行报错 “参数长度错误！”

运行时用的是给出的例子卡在line14 报错： ‘参数长度错误！’

文章很棒，但是自己LSTM跑的结果好像不对哦

SVM is ok:

D:\program\Sentiment-Analysis-master>python predict.py svm 酒店的环境不错，空间挺大的
Using Theano backend.
Building prefix dict from the default dictionary ...
Loading model from cache c:\users\v-\appdata\local\temp\jieba.cache
Loading model cost 0.314 seconds.
Prefix dict has been built succesfully.
酒店的环境不错，空间挺大的 positive

D:\program\Sentiment-Analysis-master>python predict.py svm 电脑散热太差，偶尔还蓝屏，送货也太慢了吧
Using Theano backend.
Building prefix dict from the default dictionary ...
Loading model from cache c:\users\v-\appdata\local\temp\jieba.cache
Loading model cost 0.317 seconds.
Prefix dict has been built succesfully.
电脑散热太差，偶尔还蓝屏，送货也太慢了吧 negative

get the wrong result with LSTM:

D:\program\Sentiment-Analysis-master>python predict.py lstm 电脑散热太差，偶尔还蓝屏，送货也太慢了吧
Using Theano backend.
loading model......
loading weights......
Building prefix dict from the default dictionary ...
Loading model from cache c:\users\v-\appdata\local\temp\jieba.cache
Loading model cost 0.319 seconds.
Prefix dict has been built succesfully.
1/1 [==============================] - 0s
电脑散热太差，偶尔还蓝屏，送货也太慢了吧 positive

D:\program\Sentiment-Analysis-master>python predict.py lstm 酒店的环境不错，空间挺大的
Using Theano backend.
loading model......
loading weights......
Building prefix dict from the default dictionary ...
Loading model from cache c:\users\v-\appdata\local\temp\jieba.cache
Loading model cost 0.324 seconds.
Prefix dict has been built succesfully.
1/1 [==============================] - 0s
酒店的环境不错，空间挺大的 negative

TensorFlow verison problem

Hi:
I installed TensorFlow 1.0 as Keras engine, but got error 'TypeError: Expected int32, got...' . Then I changed TensorFlow to 0.12, the program runs well. So, I suggest you could add TensorFlow version in README. Coz there is no TensorFlow version information in requirement.txt.

运行问题

你好麻烦博主了
训练阶段每个epoch只显示一步运行了0秒没有print test score
1/1 [==============================] - 0s
这个是怎么回事呢？貌似就没有训练这个阶段？

使用LSTM模型做情感分类报错了

hi，你好
我跑你的demo时，报错了，不知道怎么弄。
报错信息如下：
AssertionError: Can't store in size_t for the bytes requested 50 * 4
Apply node that caused the error: GpuAllocEmpty(Elemwise{Composite{(Switch(LT(maximum(i0, i1), i2), (maximum(i0, i1) + i3), (maximum(i0, i1) - i3)) + i3)}}.0, Elemwise{Composite{((((i0 * Switch(EQ(i1, i2), i3, i1) * Switch(EQ(i4, i2), i3, i4)) // i5) * i6) // i7)}}[(0, 0)].0, TensorConstant{50})
Toposort index: 63
Inputs types: [TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(), (), ()]
Inputs strides: [(), (), ()]
Inputs values: [array(100), array(-42), array(50)]
Outputs clients: [[GpuIncSubtensor{InplaceSet;:int64:}(GpuAllocEmpty.0, GpuFromHost.0, Constant{1})]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

some problem

what is wrong with this ‘ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count‘ problem？thank you

No module named cross_validation

这个怎么回事？

请问各个依赖包对应的版本是？

你好
在运行代码的时候提示了DeprecationWarning，另外还有word2vec.py模块内部的Error

怀疑是不是安装的依赖包版本与代码的用法不匹配
能否写一下各个依赖包的版本？
谢谢！

ValueError: Shapes (4, 100, 50) and () are incompatible源程序报错

Traceback (most recent call last):
File "Sentiment_lstm.py", line 204, in
lstm_predict(string)
File "Sentiment_lstm.py", line 178, in lstm_predict
model = model_from_yaml(yaml_string)
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/models.py", line 187, in model_from_yaml
return layer_from_config(config, custom_objects=custom_objects)
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/utils/layer_utils.py", line 36, in layer_from_config
return layer_class.from_config(config['config'])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/models.py", line 1036, in from_config
model.add(layer)
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/models.py", line 312, in add
output_tensor = layer(self.outputs[0])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/engine/topology.py", line 487, in call
self.build(input_shapes[0])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/layers/recurrent.py", line 710, in build
self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 718, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1254, in concat
tensor_shape.scalar())
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 1023, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (4, 100, 50) and () are incompatible

请问训练数据是如何得到的，谢谢

如题

运行遇到错误

您好，我运行了主函数里面注释掉掉部分，在函数get_train_vecs 里面得到这个错误·～～。
get_train_vecs(x_train,x_test) #计算词向量并保存为train_vecs.npy,test_vecs.npy
File "code/Sentiment_svm.py", line 55, in get_train_vecs
imdb_w2v.train(x_train)
File "/data/home/guanggao/anaconda2/lib/python2.7/site-packages/gensim/models/word2vec.py", line 813, in train
raise ValueError("You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.")
ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.

why ‘我是’ and '你是' is postive

this need think

sentence length has imapct on classify of sentiment.

report

Using TensorFlow backend.
D:\Program Files (x86)\Python\lib\site-packages\gensim\utils.py:862: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
  warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
load model from disk...
load weights from disk......
2017-10-19 16:02:40.450367: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.450667: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.450933: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.451196: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.451488: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.451763: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.452028: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.452314: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:41.074373: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.65GiB
2017-10-19 16:02:41.074690: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:908] DMA: 0 
2017-10-19 16:02:41.074841: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:918] 0:   Y 
2017-10-19 16:02:41.075007: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
rebuild model from disk......
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\KIRIST~1\AppData\Local\Temp\jieba.cache
['你', '是', '傻', '逼', '。']
Loading model cost 1.078 seconds.
Prefix dict has been built succesfully.
1/1 [==============================] - 0s
[[1]]

svm

[0]

运行问题

我在虚拟机中装的centos 7，在执行 sudo pip install -r requirements.txt 成功安装文件中的几个包后，执行python predict.py svm 这个手机质量很好，我很喜欢，不错不错语句，结果显示没有 tensorflow。
安装tensorflow 1.0后问题解决，希望作者能更新下requirements。

训练模型的时候报错

Traceback (most recent call last):
File "/home/wj/malware/env/word2vec_test/Malware/code/malware_lstm.py", line 178, in
train()
File "/home/wj/malware/env/word2vec_test/Malware/code/malware_lstm.py", line 174, in train
train_lstm(n_symbols, embedding_weights, x_train, y_train, x_test, y_test)
File "/home/wj/malware/env/word2vec_test/Malware/code/malware_lstm.py", line 137, in train_lstm
model.add(LSTM(output_dim=50, activation='sigmoid', inner_activation='hard_sigmoid'))
File "/home/wj/malware/env/local/lib/python2.7/site-packages/keras/models.py", line 312, in add
output_tensor = layer(self.outputs[0])
File "/home/wj/malware/env/local/lib/python2.7/site-packages/keras/engine/topology.py", line 487, in call
self.build(input_shapes[0])
File "/home/wj/malware/env/local/lib/python2.7/site-packages/keras/layers/recurrent.py", line 710, in build
self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
File "/home/wj/malware/env/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 718, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1029, in concat
dtype=dtypes.int32).get_shape(
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 639, in convert_to_tensor
as_ref=False)
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 704, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 113, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 370, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got <tf.Variable 'lstm_1_W_i:0' shape=(50, 50) dtype=float32_ref> of type 'Variable' instead.

这是哪里出错呢，找不到解决办法

lstm结果是错误的是为什么呢？

python predict.py lstm 这书的印刷质量的太差了吧，还有缺页，以后再也不买了
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Using TensorFlow backend.
loading model......
loading weights......
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.271 seconds.
Prefix dict has been built succesfully.
1/1 [==============================] - 0s
这书的印刷质量的太差了吧，还有缺页，以后再也不买了 positive