Git Product home page Git Product logo

iunderstand / swe Goto Github PK

View Code? Open in Web Editor NEW
51.0 51.0 12.0 23.17 MB

SWE Toolkit. Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints. A general framework to incorporate semantic knowledge into the popular data-driven learning process of word vectors. Applications including word similarity, sentence completion, etc. ACL-2015, Beijing, China

License: Apache License 2.0

Makefile 0.32% C 85.14% C++ 4.51% Perl 9.86% Shell 0.02% Batchfile 0.01% Prolog 0.15%

swe's People

Contributors

iunderstand avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

swe's Issues

SWE_Train.c有bug?出现Segmentation fault

补充:使用word2vec训练词向量正常。

#hello,请问如何训练词向量?假设现在要训练SWE+Synon-Anton,我尝试如下步骤:

机器配置:Ubuntu 14.04 128G RAM

  1. 提供wikipedia语料train.txt
  2. 将 semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton划分为sem.train.txt和sem.valid.txt
  3. 运行
    ./SWE_Train -train train.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 0 -cbow 0 -iter 3 -sem-train sem.train.txt -sem-valid sem.valid.txt -sem-coeff 0.1 -sem-hinge 0.0 -sem-addtime 0 -weight-decay 0 -delta-left 1 -delta-right 1

出现问题:
读取语料后出现Segmentation fault!

#另外请问划分WordNet数据train和valid的比例?论文中并无提及

log如下:
Semantic Word Embedding (SWE) ToolkitTrain Setting embedding size: 200
Train Setting window size: 5
Train Setting sample value: 0.000100
Train Setting negative num: 5
Running Threads: 12
Iteration Times: 3
SemWE Qsem train file: ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.train
SemWE Qsem valid file: ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.valid
SemWE Add Time(/%): 0.000000
SemWE Weight Decay: 0.000000
SemWE Inter Coeff: 0.100000
SemWE Norm Hinge Margin: 0.000000
SemWE Inequation Delta Left: 1
SemWE Inequation Delta Right: 1
Training Starting @time: Fri Feb 24 19:21:08 2017

Starting training using file wikicorpus.1b
Vocab size: 218317
Words in train file: 123353508
Load Training Word Knowledge from file ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.train
--- InEquation Nums: 424732
--- Finish reading the Knowledge Database
Load CV Test Word Knowledge from file ../semantics/SWE.EN.KnowDB.WordNet-Book.Synon-Anton.valid
--- CV set InEquation Nums: 1000
./run.sh: line 5: 25479 Segmentation fault (core dumped) ./SWE_Train -train ${TRAIN_FILE} -output vec.bin -size 200 -window 5 -sample 1e-4 -negative 5 -hs 0 -binary 1 -cbow 0 -iter 3 -sem-train ${SEW_FILE} -sem-valid ${SEW_CV_FILE} -sem-coeff 0.1 -sem-hinge 0.0 -sem-addtime 0 -weight-decay 0 -delta-left 1 -delta-right 1

How these set synon-anton, Hyper-Hypon extracted the ontology such as WordNet?

The idea is excellent!
1. Could you provide some clues how these set synon-anton,
Hyper-Hypon (e.g "../SWE.EN.KnowDB.WordNet-Book.Synon-Anton" ) extracted the ontology such as WordNet?
According to the Semantic Category Rule in your document, "A semantic category may be defined as a synset in WordNet, a hypernym in a semantic hierarchy, or an entity category in knowledge graphs." Is the synset used come from WordNet or get extracted by you?
2. Contrary to the example "similarity(Mallet, Hammer) > similarity(Mallet, Tool)" Does all words in knowledge graph need to be considered ?? e.g similarity(Mallet, Hammer) > similarity(Mallet, WordFaraway)
Thanks for your sharing!

Segmentation Fault for SWE_Train

When I run the SWE_Train using the following command:

./SWE/bin/SWE_Train -debug 2 -size 100 -train ./corpora/corpus.txt -read-vocab ./corpora/vocabulary.txt -cbow 0 -hs 0 -alpha 0.025 -window 5 -sample 0.0001 -negative 5 -threads 1 -output ./word_embed.txt -sem-coeff 0.005 -sem-addtime 0 -sem-hinge 0.0 -weight-decay 0.0 -sem-train ./semantics/knowledge_constraints.train -sem-valid ./semantics/knowledge_constraints.valid -iter 2

The ./semantics/knowledge_constraints.train and ./semantics/knowledge_constraints.valid are the same file as SemWE.EN.KnowDB.COM1.inTEXT8.train and SemWE.EN.KnowDB.COM1.inTEXT8.valid in semantics/TEXT8 directory.

The output I got is:

Semantic Word Embedding (SWE) ToolkitTrain Setting embedding size: 100
Train Setting window size: 5
Train Setting sample value: 0.000100
Train Setting negative num: 5
Running Threads: 1
Iteration Times: 2
SemWE Qsem train file: ./semantics/knowledge_constraints.train
SemWE Qsem valid file: ./semantics/knowledge_constraints.valid
SemWE Add Time(/%): 0.000000
SemWE Weight Decay: 0.000000
SemWE Inter Coeff: 0.005000
SemWE Norm Hinge Margin: 0.000000
SemWE Inequation Delta Left: 1
SemWE Inequation Delta Right: 1

Training Starting @time: Sat Nov 11 12:04:57 2017

Starting training using file ./corpora/corpus.txt
Vocab size: 47091
Words in train file: 9614559

Load Training Word Knowledge from file ./semantics/knowledge_constraints.train
--- InEquation Nums: 324817
--- Finish reading the Knowledge Database
Load CV Test Word Knowledge from file ./semantics/knowledge_constraints.valid
--- CV set InEquation Nums: 2999
--- Finish reading the CV Knowledge Database
--- Alpha: 0.025000 Progress: 0.00% WordCount: 0 Train_Qsem: inf Train_SatisfyRate: 0.0000 Valid_Qsem: inf Valid_SatisfyRate: 0.0000
Segmentation fault

Did I use the SWE_Train incorrectly? Btw, I suggest you provide a documentation for explaining how to use the SWE_Train file.

Thanks for your help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.