Git Product home page Git Product logo

sentiment-analysis's Introduction

#环境要求

  • Unix/Linux系统
  • python 2.7
  • python包安装: keras,sklearn,gensim,jieba,h5py,numpy,pandas
sudo pip install -r requirements.txt

用法

使用SVM分类器进行情感分类:

python predict.py svm 这个手机质量很好,我很喜欢,不错不错

python predict.py svm 这书的印刷质量的太差了吧,还有缺页,以后再也不买了

使用LSTM进行情感分类:

python predict.py lstm 酒店的环境不错,空间挺大的
python predict.py lstm 电脑散热太差,偶尔还蓝屏,送货也太慢了吧

#程序

  • code/Sentiment_lstm.py 使用word2vec和LSTM训练和预测

  • code/Sentiment_svm.py 使用word2vec和svm训练和预测

  • predict.py 调用Sentiment_lstm.py及Sentiment_svm.py进行预测

#数据

  • ./data/ 原始数据文件夹

    • data/neg.xls 负样本原始数据
    • data/pos.xls 正样本原始数据
  • ./svm_data/ svm数据文件夹

    • ./svm_data/*.npy 处理后的训练数据和测试数据
    • ./svm_data/svm_model/ 保存训练好的svm模型
    • ./svm_data/w2v_model/ 保存训练好的word2vec模型
  • ./lstm_data/ lstm数据文件夹

    • ./lstm_data/Word2vec_model.pkl 保存训练好的word2vec模型
    • ./lstm_data/lstm.yml 保存训练网络的结构
    • ./lstm_data/lstm.h5 保存网络训练到的权重

#详细介绍

购物评论情感分析的实现

sentiment-analysis's People

Contributors

buptldy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sentiment-analysis's Issues

使用lstm时,出现dtype的错误

我用python2.7 出现一下错误 其他环境依照要求配置,若该用python3.5 也是类似dtype的问题
File "predict.py", line 23, in
lstm_predict(sentence)
File "code/Sentiment_lstm.py", line 179, in lstm_predict
model = model_from_yaml(yaml_string)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/models.py", line 200, in model_from_yaml
return layer_from_config(config, custom_objects=custom_objects)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/utils/layer_utils.py", line 40, in layer_from_config
return layer_class.from_config(config['config'])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/models.py", line 1080, in from_config
model.add(layer)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/models.py", line 327, in add
output_tensor = layer(self.outputs[0])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/engine/topology.py", line 543, in call
self.build(input_shapes[0])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/layers/recurrent.py", line 763, in build
self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 1222, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1043, in concat
dtype=dtypes.int32).get_shape(
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 676, in convert_to_tensor
as_ref=False)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 741, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 113, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 374, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/Users/gt/Downloads/enter/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got <tf.Variable 'lstm_1_W_i:0' shape=(100, 50) dtype=float32_ref> of type 'Variable' instead.

Can't fork

I want to pull your code for my graduation design, but it fails....

error

Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("embedding_1/random_uniform:0", shape=(8310, 100), dtype=float32

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Prefix dict has been built succesfully.
Traceback (most recent call last):
File "predict.py", line 23, in
lstm_predict(sentence)
File "code/Sentiment_lstm.py", line 187, in lstm_predict
data=input_transform(string)
File "code/Sentiment_lstm.py", line 173, in input_transform
model=gensim.models.Word2Vec.load_word2vec_format('lstm_data/Word2vec_model.pkl', binary = True, unicode_errors='ignore')
File "/anaconda3/lib/python3.6/site-packages/gensim/models/word2vec.py", line 1172, in load_word2vec_format
header = utils.to_unicode(fin.readline(), encoding=encoding)
File "/anaconda3/lib/python3.6/site-packages/gensim/utils.py", line 217, in any2unicode
return unicode(text, encoding, errors=errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Tried "model=gensim.models.Word2Vec.load_word2vec_format('lstm_data/Word2vec_model.pkl', unicode_errors='ignore')", still same error.

文章很棒,但是自己LSTM跑的结果好像不对哦

SVM is ok:

D:\program\Sentiment-Analysis-master>python predict.py svm 酒店的环境不错,空间挺大的
Using Theano backend.
Building prefix dict from the default dictionary ...
Loading model from cache c:\users\v-\appdata\local\temp\jieba.cache
Loading model cost 0.314 seconds.
Prefix dict has been built succesfully.
酒店的环境不错,空间挺大的 positive

D:\program\Sentiment-Analysis-master>python predict.py svm 电脑散热太差,偶尔还蓝屏,送货也太慢了吧
Using Theano backend.
Building prefix dict from the default dictionary ...
Loading model from cache c:\users\v-\appdata\local\temp\jieba.cache
Loading model cost 0.317 seconds.
Prefix dict has been built succesfully.
电脑散热太差,偶尔还蓝屏,送货也太慢了吧 negative

get the wrong result with LSTM:

D:\program\Sentiment-Analysis-master>python predict.py lstm 电脑散热太差,偶尔还蓝屏,送货也太慢了吧
Using Theano backend.
loading model......
loading weights......
Building prefix dict from the default dictionary ...
Loading model from cache c:\users\v-\appdata\local\temp\jieba.cache
Loading model cost 0.319 seconds.
Prefix dict has been built succesfully.
1/1 [==============================] - 0s
电脑散热太差,偶尔还蓝屏,送货也太慢了吧 positive

D:\program\Sentiment-Analysis-master>python predict.py lstm 酒店的环境不错,空间挺大的
Using Theano backend.
loading model......
loading weights......
Building prefix dict from the default dictionary ...
Loading model from cache c:\users\v-\appdata\local\temp\jieba.cache
Loading model cost 0.324 seconds.
Prefix dict has been built succesfully.
1/1 [==============================] - 0s
酒店的环境不错,空间挺大的 negative

TensorFlow verison problem

Hi:
I installed TensorFlow 1.0 as Keras engine, but got error 'TypeError: Expected int32, got...' . Then I changed TensorFlow to 0.12, the program runs well. So, I suggest you could add TensorFlow version in README. Coz there is no TensorFlow version information in requirement.txt.

运行问题

你好 麻烦博主了
训练阶段每个epoch只显示一步 运行了0秒 没有print test score
1/1 [==============================] - 0s
这个是怎么回事呢?貌似就没有训练这个阶段?

使用LSTM模型做情感分类报错了

hi,你好
我跑你的demo时,报错了,不知道怎么弄。
报错信息如下:
AssertionError: Can't store in size_t for the bytes requested 50 * 4
Apply node that caused the error: GpuAllocEmpty(Elemwise{Composite{(Switch(LT(maximum(i0, i1), i2), (maximum(i0, i1) + i3), (maximum(i0, i1) - i3)) + i3)}}.0, Elemwise{Composite{((((i0 * Switch(EQ(i1, i2), i3, i1) * Switch(EQ(i4, i2), i3, i4)) // i5) * i6) // i7)}}[(0, 0)].0, TensorConstant{50})
Toposort index: 63
Inputs types: [TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(), (), ()]
Inputs strides: [(), (), ()]
Inputs values: [array(100), array(-42), array(50)]
Outputs clients: [[GpuIncSubtensor{InplaceSet;:int64:}(GpuAllocEmpty.0, GpuFromHost.0, Constant{1})]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

some problem

what is wrong with this ‘ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count‘ problem?thank you

请问各个依赖包对应的版本是?

你好
在运行代码的时候提示了DeprecationWarning,另外还有word2vec.py模块内部的Error

怀疑是不是安装的依赖包版本与代码的用法不匹配
能否写一下各个依赖包的版本?
谢谢!

ValueError: Shapes (4, 100, 50) and () are incompatible源程序报错

Traceback (most recent call last):
File "Sentiment_lstm.py", line 204, in
lstm_predict(string)
File "Sentiment_lstm.py", line 178, in lstm_predict
model = model_from_yaml(yaml_string)
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/models.py", line 187, in model_from_yaml
return layer_from_config(config, custom_objects=custom_objects)
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/utils/layer_utils.py", line 36, in layer_from_config
return layer_class.from_config(config['config'])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/models.py", line 1036, in from_config
model.add(layer)
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/models.py", line 312, in add
output_tensor = layer(self.outputs[0])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/engine/topology.py", line 487, in call
self.build(input_shapes[0])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/layers/recurrent.py", line 710, in build
self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 718, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1254, in concat
tensor_shape.scalar())
File "/home/user/anaconda3/envs/gu_py27/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 1023, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (4, 100, 50) and () are incompatible

运行遇到错误

您好,我运行了主函数里面注释掉掉部分,在函数get_train_vecs 里面得到这个错误·~~。
get_train_vecs(x_train,x_test) #计算词向量并保存为train_vecs.npy,test_vecs.npy
File "code/Sentiment_svm.py", line 55, in get_train_vecs
imdb_w2v.train(x_train)
File "/data/home/guanggao/anaconda2/lib/python2.7/site-packages/gensim/models/word2vec.py", line 813, in train
raise ValueError("You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.")
ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.

why ‘我是**’ and '你是**' is postive

this need think

sentence length has imapct on classify of sentiment.

report

Using TensorFlow backend.
D:\Program Files (x86)\Python\lib\site-packages\gensim\utils.py:862: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
  warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
load model from disk...
load weights from disk......
2017-10-19 16:02:40.450367: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.450667: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.450933: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.451196: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.451488: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.451763: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.452028: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:40.452314: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-19 16:02:41.074373: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.65GiB
2017-10-19 16:02:41.074690: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:908] DMA: 0 
2017-10-19 16:02:41.074841: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:918] 0:   Y 
2017-10-19 16:02:41.075007: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
rebuild model from disk......
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\KIRIST~1\AppData\Local\Temp\jieba.cache
['你', '是', '傻', '逼', '。']
Loading model cost 1.078 seconds.
Prefix dict has been built succesfully.
1/1 [==============================] - 0s
[[1]]

svm

[0]

运行问题

我在虚拟机中装的centos 7,在执行 sudo pip install -r requirements.txt 成功安装文件中的几个包后,执行python predict.py svm 这个手机质量很好,我很喜欢,不错不错 语句,结果显示没有 tensorflow。
安装tensorflow 1.0后问题解决,希望作者能更新下requirements。

训练模型的时候报错

Traceback (most recent call last):
File "/home/wj/malware/env/word2vec_test/Malware/code/malware_lstm.py", line 178, in
train()
File "/home/wj/malware/env/word2vec_test/Malware/code/malware_lstm.py", line 174, in train
train_lstm(n_symbols, embedding_weights, x_train, y_train, x_test, y_test)
File "/home/wj/malware/env/word2vec_test/Malware/code/malware_lstm.py", line 137, in train_lstm
model.add(LSTM(output_dim=50, activation='sigmoid', inner_activation='hard_sigmoid'))
File "/home/wj/malware/env/local/lib/python2.7/site-packages/keras/models.py", line 312, in add
output_tensor = layer(self.outputs[0])
File "/home/wj/malware/env/local/lib/python2.7/site-packages/keras/engine/topology.py", line 487, in call
self.build(input_shapes[0])
File "/home/wj/malware/env/local/lib/python2.7/site-packages/keras/layers/recurrent.py", line 710, in build
self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
File "/home/wj/malware/env/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 718, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1029, in concat
dtype=dtypes.int32).get_shape(
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 639, in convert_to_tensor
as_ref=False)
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 704, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 113, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 370, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/wj/malware/env/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got <tf.Variable 'lstm_1_W_i:0' shape=(50, 50) dtype=float32_ref> of type 'Variable' instead.

这是哪里出错呢,找不到解决办法

lstm结果是错误的是为什么呢?

python predict.py lstm 这书的印刷质量的太差了吧,还有缺页,以后再也不买了
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Using TensorFlow backend.
loading model......
loading weights......
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.271 seconds.
Prefix dict has been built succesfully.
1/1 [==============================] - 0s
这书的印刷质量的太差了吧,还有缺页,以后再也不买了 positive

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.