This is the code base for our IJCAI 2016 paper.
- cmake (~2.8)
- git (~1.8)
- g++ (~4.6 for c++11 features, 4.8 is used in this paper)
- boost (~1.57)
Execute the following command to compile.
./configure
make
You should find the following executable files:
bin/
└── experimental
├── hc_depparser_cstep : Ranker for C-step
└── hc_depparser_hstep_arcstanda : Parser for H-step
The format for input file in the H-step is similar to that of CoNLLX format which contains 8 (or more) columns.
Words in the sentence is counting from 1
and 0
corresponds to the pseudo node.
NOTE
Please fill the 5th column with gold standard postag which are used in getting the loss since punctuations (have the ``'':,. gold postag) are ignored in our loss computation.
After training the H-step model, you can use ./bin/experimental/hc_depparser_hstep_arcstandard prepare
to generate
the training/testing data for the C-step. The C-step instances are separated by empty space. Each instance has a
header like #id forms postags oracle-hstep-score output1-hstep-score output2-hstep-score ...
. Following the header
shows the dependency relations for oracle and candidates. Dependency head and relation are separated by /
. Here is
a example for the C-step input.
#id forms postags 3.51423e+09 3.54676e+09 3.54634e+09 ...
1 Influential JJ 2/NMOD 2/NMOD 2/NMOD ...
2 members NNS 10/SUB 10/SUB 10/SUB ...
3 of IN 2/NMOD 2/NMOD 2/NMOD ...
4 the DT 9/NMOD 9/NMOD 9/NMOD ...
5 House NNP 9/NMOD 9/NMOD 9/VMOD ...
- 20-way jackknifing your dependency training data. For the ith fold, name them as
train.fold$i.conll.train
andtrain.fold$i.conll.test
. ./bin/experimental/hc_depparser_hstep_arcstandard learn --train train.fold$i.conll.train --devel devel.conll --model model.hstep.$i --algorithm pa
to train the model for ith fold../bin/experimental/hc_depparser_hstep_arcstandard prepare --input train.fold$i.conll.test --output train.cstep.$i --model model.hstep.$i
to prepare the C-step input for ith fold- merge
train.cstep.$i
intotrain.cstep
to generate the C-step training data. ./bin/experimental/hc_depparser_hstep_arcstandard learn --train train.conll --devel devel.conll --model model.hstep --algorithm pa
to train the overall model../bin/experimental/hc_depparser_hstep_arcstandard prepare --input devel.conll --output devel.cstep --model model.hstep
to prepare development input for the C-step../bin/experimental/hc_depparser_hstep_arcstandard prepare --input test.conll --output test.cstep --model model.hstep
to prepare test input for the C-step.
./bin/experimental/hc_depparser_cstep learn --train train.cstep --devel devel.cstep --model model.cstep --script "./script/dependency/evaluate.sh en ./devel.conll "
to train the C-step model../bin/experimental/hc_depparser_cstep test --input test.cstep --model model.cstep --script "./script/dependency/evaluate.sh chen2014en ./devel.conll "
to test the C-step model.
Since the loss computation and evaluation for Chinese dependency is different from English, for Chinese experiments, please set the language option in both the H-step and C-step and the evaluation script.
The provided parameters include:
- the H-step:
--neg-sample
the negative sample selectionstrategy
,baseline
,best
, orworst
. - the C-step:
--ranker
the ranking strategy,fine
orcoars
. - the H-step and C-step:
language
to specify the language.
Use --help
option in the executable binaries to get more help. Or write to Yijia Liu [email protected].