Git Product home page Git Product logo

treelstm.pytorch's Introduction

Tree-Structured Long Short-Term Memory Networks

This is a PyTorch implementation of Tree-LSTM as described in the paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, and Christopher Manning. On the semantic similarity task using the SICK dataset, this implementation reaches:

  • Pearson's coefficient: 0.8492 and MSE: 0.2842 using hyperparameters --lr 0.010 --wd 0.0001 --optim adagrad --batchsize 25
  • Pearson's coefficient: 0.8674 and MSE: 0.2536 using hyperparameters --lr 0.025 --wd 0.0001 --optim adagrad --batchsize 25 --freeze_embed
  • Pearson's coefficient: 0.8676 and MSE: 0.2532 are the numbers reported in the original paper.
  • Known differences include the way the gradients are accumulated (normalized by batchsize or not).

Requirements

  • Python (tested on 3.6.5, should work on >=2.7)
  • Java >= 8 (for Stanford CoreNLP utilities)
  • Other dependencies are in requirements.txt Note: Currently works with PyTorch 0.4.0. Switch to the pytorch-v0.3.1 branch if you want to use PyTorch 0.3.1.

Usage

Before delving into how to run the code, here is a quick overview of the contents:

  • Use the script fetch_and_preprocess.sh to download the SICK dataset, Stanford Parser and Stanford POS Tagger, and Glove word vectors (Common Crawl 840) -- Warning: this is a 2GB download!), and additionally preprocees the data, i.e. generate dependency parses using Stanford Neural Network Dependency Parser.
  • main.pydoes the actual heavy lifting of training the model and testing it on the SICK dataset. For a list of all command-line arguments, have a look at config.py.
    • The first run caches GLOVE embeddings for words in the SICK vocabulary. In later runs, only the cache is read in during later runs.
    • Logs and model checkpoints are saved to the checkpoints/ directory with the name specified by the command line argument --expname.

Next, these are the different ways to run the code here to train a TreeLSTM model.

Local Python Environment

If you have a working Python3 environment, simply run the following sequence of steps:

- bash fetch_and_preprocess.sh
- pip install -r requirements.txt
- python main.py

Pure Docker Environment

If you want to use a Docker container, simply follow these steps:

- docker build -t treelstm .
- docker run -it treelstm bash
- bash fetch_and_preprocess.sh
- python main.py

Local Filesystem + Docker Environment

If you want to use a Docker container, but want to persist data and checkpoints in your local filesystem, simply follow these steps:

- bash fetch_and_preprocess.sh
- docker build -t treelstm .
- docker run -it --mount type=bind,source="$(pwd)",target="/root/treelstm.pytorch" treelstm bash
- python main.py

NOTE: Setting the environment variable OMP_NUM_THREADS=1 usually gives a speedup on the CPU. Use it like OMP_NUM_THREADS=1 python main.py. To run on a GPU, set the CUDA_VISIBLE_DEVICES instead. Usually, CUDA does not give much speedup here, since we are operating at a batchsize of 1.

Notes

  • (Apr 02, 2018) Added Dockerfile
  • (Apr 02, 2018) Now works on PyTorch 0.3.1 and Python 3.6, removed dependency on Python 2.7
  • (Nov 28, 2017) Added frozen embeddings, closed gap to paper.
  • (Nov 08, 2017) Refactored model to get 1.5x - 2x speedup.
  • (Oct 23, 2017) Now works with PyTorch 0.2.0.
  • (May 04, 2017) Added support for sparse tensors. Using the --sparse argument will enable sparse gradient updates for nn.Embedding, potentially reducing memory usage.
    • There are a couple of caveats, however, viz. weight decay will not work in conjunction with sparsity, and results from the original paper might not be reproduced using sparse embeddings.

Acknowledgements

Shout-out to Kai Sheng Tai for the original LuaTorch implementation, and to the Pytorch team for the fun library.

Contact

Riddhiman Dasgupta

This is my first PyTorch based implementation, and might contain bugs. Please let me know if you find any!

License

MIT

treelstm.pytorch's People

Contributors

dasguptar avatar huangshenno1 avatar jizg avatar soumith avatar vinhdv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

treelstm.pytorch's Issues

Does current TreeLSTM support batch size?

It seems batch size is still not supported from the code? In the forward function of ChildSumTreeLSTM, it seems that it only support process a single tree in one forward.

`

 def forward(self, tree, inputs):
    for idx in range(tree.num_children):
        self.forward(tree.children[idx], inputs)

    if tree.num_children == 0:
        child_c = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
        child_h = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
    else:
        child_c, child_h = zip(* map(lambda x: x.state, tree.children))
        child_c, child_h = torch.cat(child_c, dim=0), torch.cat(child_h, dim=0)

    tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
    return tree.state

`

IndexError: index 54 is out of bounds for dimension 0 with size 54

tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)

len(inputs) == 54
tree.idx == 54

tree.idx = idx - 1

more informations

inputs[tree.idx] tensor([[ 3.7410e-02,  5.7619e-02,  3.3822e-01,  ..., -3.5774e-02,
         -7.8579e-02,  1.0644e-02],
        [-2.5287e-02, -2.5835e-01, -7.5715e-02,  ...,  1.2864e-01,
          1.3856e-01,  3.3581e-01],
        [-5.4430e-02, -1.6442e-01, -6.7605e-02,  ...,  1.7388e-01,
         -3.9886e-01, -1.3006e-02],
        ...,
        [-2.5433e-02, -8.0709e-02,  6.2163e-01,  ...,  2.7345e-01,
         -5.6782e-02,  1.8956e-01],
        [-2.4587e-01,  8.9087e-03, -1.5240e-03,  ..., -3.2474e-01,
          1.1630e-02, -1.3252e-01],
        [ 4.9405e-04, -3.5795e-01, -2.2226e-01,  ..., -9.1428e-02,
          2.2649e-01, -2.0806e-01]], device='cuda:0',
       grad_fn=<EmbeddingBackward>)
Traceback (most recent call last):
  File "main.py", line 185, in <module>
    main()
  File "main.py", line 155, in main
    train_loss = trainer.train(train_dataset)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/trainer.py", line 29, in train
    output = self.model(linput, rtree, rinput)
  File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 90, in forward
    rstate, rhidden = self.childsumtreelstm(rtree, rinputs)
  File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
    self.forward(tree.children[idx], inputs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
    self.forward(tree.children[idx], inputs)
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
    self.forward(tree.children[idx], inputs)
  [Previous line repeated 10 more times]
  File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 48, in forward
    tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
IndexError: index 54 is out of bounds for dimension 0 with size 54

map_label_to_target should init zero tensor

Your map_label_to_target for SICK dataset init random tensor.

def map_label_to_target(label,num_classes):
    target = torch.Tensor(1,num_classes) # this is not zero tensor
    ceil = int(math.ceil(label))
    floor = int(math.floor(label))
    if ceil==floor:
        target[0][floor-1] = 1
    else:
        target[0][floor-1] = ceil - label
        target[0][ceil-1] = label - floor
    return target

However, in treelstm , the author init zero tensor

local targets = torch.zeros(batch_size, self.num_classes)
for j = 1, batch_size do
  local sim = dataset.labels[indices[i + j - 1]] * (self.num_classes - 1) + 1
  local ceil, floor = math.ceil(sim), math.floor(sim)
  if ceil == floor then
    targets[{j, floor}] = 1
  else
    targets[{j, floor}] = ceil - sim
    targets[{j, ceil}] = sim - floor
  end

Any plans to support consituency trees?

Constituency trees showed slightly lower performance in the paper, but there should be people wanting to use them under the belief that phrase structures are appropriate for different purposes, including myself :)

Why zero out embeddings for special words if they are absent in vocab

Hi,

I noticed that in main.py, you zero out the embeddings for special words if they are absent in vocabulary:

# zero out the embeddings for padding and other special words if they are absent in vocab
for idx, item in enumerate([Constants.PAD_WORD, Constants.UNK_WORD, Constants.BOS_WORD, Constants.EOS_WORD]):
    emb[idx].zero_()

Is there any reason for doing so? Why not using random normal vectors?

Thanks.

Docker image is broken!

OS: macOS Mojave
Docker Edition: Version 18.03.1-ce-mac65 (24312)
Channel: stable

I tried to build the docker image in order to run the lib without being dependent on my mac setup. The image is actually broken due to not updating the links and procedures to fetch those dependencies:

Step 10/11 : RUN ["/bin/bash", "-c", "pip install -r requirements.txt"] ---> Running in 90832d3e48fe torch-0.4.0-cp36-cp36m-linux_x86_64.whl is not a supported wheel on this platform. You are using pip version 10.0.1, however version 18.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. The command '/bin/bash -c pip install -r requirements.txt' returned a non-zero code: 1

Fixed the issue by changing the docker build file:
1- Remove this line:
RUN ["/bin/bash", "-c", "pip install -r requirements.txt"]

2- Run the container in the interactive mode:
docker run -it [IMAGE-NAME]

3- Install the python-3.5, pip3 and their dependencies manually and run the main.py.

This how I ve done it to make sure I am installing the right things, however the best solution would be to do all these changes on the image building level.

Checkpoint saving may be not appropriate.

In your code:

        if best < test_pearson:
            best = test_pearson
            checkpoint = {
                'model': trainer.model.state_dict(), 
                'optim': trainer.optimizer,
                'pearson': test_pearson, 'mse': test_mse,
                'args': args, 'epoch': epoch
                }
            logger.debug('==> New optimum found, checkpointing everything now...')
            torch.save(checkpoint, '%s.pt' % os.path.join(args.save, args.expname))

The test_pearson is used instead of dev_pearson and the test_pearson should not be used to choose your best model.
I got test result (Pearson: 0.8616 MSE: 0.2626) which had the highest dev_pearson score.

how to run it in GPU???

i run pip install -r req....

but ,couldnt run it with python main.py --cuda

with the trace back:
AssertionError: Torch not compiled with CUDA enabled

Matrix problem

  File ".../treelstm.pytorch/model.py", line 36, in node_forward
    u = F.tanh(self.ux(inputs)+self.uh(child_h_sum))
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.py", line 54, in forward
    return self._backend.Linear.apply(input, self.weight, self.bias)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/linear.py", line 12, in forward
    output.addmm_(0, 1, input, weight.t())
RuntimeError: matrix and matrix expected at 

I think you may miss the unsqueeze operation?

        i = F.sigmoid(self.ix(inputs)+self.ih(child_h_sum.unsqueeze(0)))
        o = F.sigmoid(self.ox(inputs)+self.oh(child_h_sum.unsqueeze(0)))
        u = F.tanh(self.ux(inputs)+self.uh(child_h_sum.unsqueeze(0)))

Two differences from the original implementation

I got the same result as you, ~0.846 Pearson score. After checking the original implementation, I found two differences.

  • In your trainer.py file,
def train(self, dataset):
        self.model.train()
        self.optimizer.zero_grad()
        loss, k = 0.0, 0
        indices = torch.randperm(len(dataset))
        for idx in tqdm(range(len(dataset)),desc='Training epoch '+str(self.epoch+1)+''):
            ltree,lsent,rtree,rsent,label = dataset[indices[idx]]
            linput, rinput = Var(lsent), Var(rsent)
            target = Var(map_label_to_target(label,dataset.num_classes))
            if self.args.cuda:
                linput, rinput = linput.cuda(), rinput.cuda()
                target = target.cuda()
            output = self.model(ltree,linput,rtree,rinput)
            err = self.criterion(output, target)
            loss += err.data[0]
            err.backward()           # <------------
            k += 1
            if k%self.args.batchsize==0:
                self.optimizer.step()
                self.optimizer.zero_grad()
        self.epoch += 1
        return loss/len(dataset)

You call .backward() for each sample in the mini-batch, and then perform one step update with self.optimizer.step(). Since the backward() function accumulate the gradients automatically, it seems you need to average both the losses and the gradients over the mini-batch. So I think the arrow line above should be changed to

(err/self.args.batchsize).backward()
  • The original implementation does not really update the embeddings. It does not include the embedding parameters into the model, and all the parameters of the model are optimized with Adagrad. It updates the embedding parameters with the gradients*learning_rate directly, but the learning_rate is set to 0.
    Furthermore, I did some simple calculations. The number of embedding parameters is more than 700000, and 286505 for the other model parameters. Consider the size of the training set is just 4500, it is too small to fine-tune the embeddings.

After I made the two above modifications, I can get 0.854 Pearson score and 0.274 MSE with Adagrad(learning_rate=0.05)

can not find packages

lib\CollapseUnaryTransformer.java:3: 错误: 程序包edu.stanford.nlp.ling不存在
import edu.stanford.nlp.ling.Label;
^
lib\CollapseUnaryTransformer.java:4: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.Tree;
^
lib\CollapseUnaryTransformer.java:5: 错误: 程序包edu.stanford.nlp.trees不存在
import edu.stanford.nlp.trees.TreeTransformer;
^
lib\CollapseUnaryTransformer.java:6: 错误: 程序包edu.stanford.nlp.util不存在
import edu.stanford.nlp.util.Generics;

...
What can I do ??

Trying to understand cparents.txt in Constituency parsing

I have downloaded the SICK data and obtained the dependency and constituency parsing with the fetch_and_preprocess.sh file.

I am now trying to understand what is the information that is generated in the cparents.txt file.
This is an example:

a.txt -> Two dogs are fighting
a.cparents.txt -> 5 5 7 7 6 0 6

If I am not mistaken, from the cparents.txt I should be able to build the parse tree. Is that right? And how would the tree for this example look like?

Thanks for any help in advance

classpath error

my current environment:
windows 10
python 3.6
pytorch 0.4
IDE pycharm
I try to run the code preprocess-sick.py and get an error cannot find or load the class
then I try to copy the java cmd to windows cmd window there is an same error raised
error code line:

    cmd = ('java -cp %s DependencyParse -tokpath %s -parentpath %s -relpath %s %s < %s'
           % (cp, tokpath, parentpath, relpath, tokenize_flag, filepath))
    os.system(cmd)

How to make it with dynamic batching?

This implementation can only process one sample at a time. The performance is limited since the usage of the GPU is low. Is there possibility to make treelstm support dynamic batching such that the GPU can be fully utilized?

ChildSumTreeLSTM : fx and fh linear layer are declare but is not used

Line 21, 22

self.fx = nn.Linear(self.in_dim,self.mem_dim)
self.fh = nn.Linear(self.mem_dim,self.mem_dim)

But it is never use

I think you intend to use in line 38, 39.
(perhaps typo ix with fx )

fx = F.torch.unsqueeze(self.ix(inputs),1)
f = F.torch.cat([self.ih(child_hi)+fx for child_hi in child_h], 0)

download

Why I can't access the nlp.stanford.edu, could you send me the a copy? thank you

Sizes do not match

When I run
python main.py
I met the following error message

Namespace(batchsize=25, cuda=True, data='data/sick/', epochs=15, expname='test',
glove='data/glove/', hidden_dim=50, input_dim=150, lr=0.01, mem_dim=75, num_classes=5, >optim='adagrad', save='checkpoints/', seed=123, sparse=False, wd=0.0001)
==> SICK vocabulary size : 2412
==> Size of train data : 4500
==> Size of dev data : 500
==> Size of test data : 4927
Traceback (most recent call last):
File "main.py", line 157, in
main()
File "main.py", line 126, in main
model.childsumtreelstm.emb.state_dict()['weight'].copy_(emb)

RuntimeError: sizes do not match at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THC/THCTensorCopy.cu:31

The platform is Arch Linux and CUDA8.0

I would appreciate it for any reply.

How can I get the parsing in the same format for sentences in German

Hi,

I am trying to use this model for parse sentences in German with the dependency parser that is used in this code.

So, the DependencyParse.java file has the following lines:

public static final String TAGGER_MODEL = "standford-tagger/models/english-left3words-distsim.tagger"; public static final String PARSER_MODEL = "edu/standford/nlp/models/nndep/english_SD.gz";

Is it enough to change those lines in order to specify a German tagger and parser?

Thanks in advance for any help,

Error while Compiling

Ubuntu 18.04 Java 11.0.3
running (as part of fetch_and_preprocess.sh)
javac -cp $CLASSPATH lib/*.java -Xlint:unchecked

lib/CollapseUnaryTransformer.java:17: error: error while writing CollapseUnaryTransform
er: /home/eduard_ergenzinger/treelstm.pytorch/lib/CollapseUnaryTransformer.class
public class CollapseUnaryTransformer implements TreeTransformer {
       ^
lib/ConstituencyParse.java:58: warning: [unchecked] unchecked call to PTBTokenizer(Read
er,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
      PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
okenFactory(), "");
                                     ^
  where T is a type-variable:
    T extends HasWord declared in class PTBTokenizer
lib/ConstituencyParse.java:58: warning: [unchecked] unchecked conversion
      PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
okenFactory(), "");
                                     ^
  required: PTBTokenizer<Word>
  found:    PTBTokenizer
lib/DependencyParse.java:57: warning: [unchecked] unchecked call to PTBTokenizer(Reader
,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
        PTBTokenizer<Word> tokenizer = new PTBTokenizer(
                                       ^
  where T is a type-variable:
    T extends HasWord declared in class PTBTokenizer
lib/DependencyParse.java:57: warning: [unchecked] unchecked conversion
        PTBTokenizer<Word> tokenizer = new PTBTokenizer(
                                       ^
  required: PTBTokenizer<Word>
  found:    PTBTokenizer
1 error
4 warnings

Nodes' hidden representations?

Hello, not an issue, but what's the easiest way to extract the learned hidden embeddings for each node in a ChildSum tree? New to PyTorch, so forgive my ignorance.

Thanks!

lr is different with the original paper

I noticed that lr in code is 0.01 while paper is 0.05 with adagrad, and I tried with 0.05 to train the model but train loss doesn't decrease at all, may be due to the high lr ?

Why you set lr to 0.01 ? And since lr is different, may be there is a bug in code ?

Can the

I run the sentiment model successfully. My gpus are double 1080ti, and get a 14% in gpu0. Is there an extra way to run it on multigpu? I implement a model in tensorflow fold, but it seems that it can't support multigpu.

Why move output to cpu?

I noticed that in test function of trainer module
you write

output = output.data.squeeze().cpu()

Why you move output to cpu ?

By the way
in SimilarityTreeLSTM of model module
output = self.similarity(lstate, rstate)

Why don't use
output = self.similarity(lhidden, rhidden)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.