zyymax / text-similarity Goto Github PK
View Code? Open in Web Editor NEW用TF特征向量和simhash指纹计算中文文本的相似度
用TF特征向量和simhash指纹计算中文文本的相似度
您好,这个想法是您自己的设计的? 还是哪篇论文的实现?
In tokens.py, it says 'str' object has no attribute 'decode'
text-similarity-master\src\tokens.py", line 20, in init
self.stopword_set.add(line.strip().decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
Can you help me on that? Thanks very much.
Traceback (most recent call last):
File "src/simhash_imp.py", line 191, in
feature_vec = [(int(item.split(':')[0]),float(item.split(':')[1])) for item in feature_vec]
ValueError: could not convert string to float: (0,
"['0:(0,', '1)', '1:(1,', '1)', '2:(2,', '1)', '3:(15,', '1)', '4:(18,', '1)', '5:(19,', '1)']"
$ cat data/temp/all.feat
0:(0, 1) 1:(1, 1) 2:(2, 1) 3:(15, 1) 4:(18, 1) 5:(19, 1)
0:(0, 1) 1:(3, 1) 2:(5, 1) 3:(9, 1) 4:(16, 1) 5:(17, 1)
0:(1, 1) 1:(6, 1) 2:(7, 1) 3:(13, 1) 4:(14, 1) 5:(20, 1)
0:(0, 2) 1:(2, 1) 2:(4, 1) 3:(11, 1) 4:(12, 1)
0:(8, 1) 1:(10, 1) 2:(21, 1)
大神你好, 在论文中希望计算一些文本的相似度, 但是看了下你的说明, 没明白这个.ori是个什么格式啊
code里没找到webcontent_filter
是忽略掉了么?
你是python 3.X的版本吗? 我的是python 2.7
python src/isSimilar.py data/1.ori data/key.ori data/stopwords.txt data/word.dict -c 0.8
会有Error:
src/features.py", line 20, in compute
feature[self.word_dict[token]] += 1
KeyError: u'\u62a5\u8003'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.