Git Product home page Git Product logo

bidaf's Introduction

Junki Ohmura (jojonki)

  • Linkedin
  • Google Scholar
  • Twitter
  • Blog (Japanese)
  • Please feel free to contact me. junki dot ohmura at gmail.
  • Career
    • Waseda University (master's degree in computer science)
    • NTT: Researcher Intern. (Peer-to-Peeer communication)
    • Bizreach: Software Engineer Intern.
    • Sony Interactive Entertainment: Software Engineer (PlayStation 4's voice user interface).
    • LTI, Carnegie Mellon University: Research about dialog systems supervised by Maxine Eskenazi.
    • Sony: NLP Researcher (current).

trophy

bidaf's People

Contributors

jojonki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

bidaf's Issues

Very small gradients causing no weight update from your model

Thanks for your code.
It helps me understand the BiDAF in details.

However, I found the model had no performance increasing.
Every epoch, the metric is always the same.
And then, I found it's the optimized gradients too small.
it's the order of 10^-3~10^-8.

I can't find what's wrong.
And I think your code is good to understand.
So, what may be the problem?

Core dumped during training

I followed the steps in the guide, downloaded and unzipped the glove as guided, and run prepro, that is no problem.

But when training, I got "core dumped" immediately when using an empty GPU to train.

~/BiDAF$ CUDA_VISIBLE_DEVICES=1 python3 main.py
Segmentation fault (core dumped)

Why "Runtime error" like this?

When I run the "python main.py --resume ./checkpoints/Epoch-12.model --test 1"
----Test---
0%| | 0/528 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 234, in
test(model, test_data)
File "main.py", line 186, in test
p1, p2 = model(c, cc, q, cq)
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/Users/liuyancen/Desktop/NLP/BiDAF/layers/attention_net.py", line 50, in forward
embd_context = self.build_contextual_embd(ctx_c, ctx_w) # (N, T, 2d)
File "/Users/liuyancen/Desktop/NLP/BiDAF/layers/attention_net.py", line 31, in build_contextual_embd
char_embd = self.char_embd_net(x_c) # (N, seq_len, embd_size)
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/Users/liuyancen/Desktop/NLP/BiDAF/layers/char_embedding.py", line 35, in forward
x = [F.relu(conv(x)) for conv in self.conv] # (N, Cout, seq_len, c_embd_size-filter_w+1). stride == 1
File "/Users/liuyancen/Desktop/NLP/BiDAF/layers/char_embedding.py", line 35, in
x = [F.relu(conv(x)) for conv in self.conv] # (N, Cout, seq_len, c_embd_size-filter_w+1). stride == 1
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 237, in forward
self.padding, self.dilation, self.groups)
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/functional.py", line 40, in conv2d
return f(input, weight, bias)
RuntimeError: Given input size: (1x161x1x8). Calculated output size: (100x161x-3x-15). Output size is too small at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/THNN/generic/VolumetricConvolutionMM.c:65

How to fix it?

Running on multiple GPUs

Thanks for the code!

I am trying to run the training on 8 GPUs but it seems that the code uses only one of them. Is it required to make some changes?

cuda out of memory

Thanks for your code, it helps me a lot. And I try to write on my own but I meet some questions.
When I rewrite the loss function as follows:
`class Custom_Loss(nn.Module):
def init(self):
super(Custom_Loss, self).init()

def loss_function(self, data, labels):
    loss = Variable(torch.zeros(1))
    for d, l in zip(data, labels):
        loss -= torch.log(d[l]).cpu()
    loss /= data.size(0)
    return loss

def forward(self, p1, p2, S, E):
    """
    N for batch and T for length of context

    :param p1: A tensor (N,T) represents for possibility of  choosing each word as answer(start)
    :param p2: A tensor (N,T) represents for possibility of  choosing each word as answer(end)
    :param S: A tensor for each query's start position
    :param E: A tensor for each query's end position
    :return: Loss of the BiDAF model
    """

    l1 = self.loss_function(p1, S)
    l2 = self.loss_function(p2, E)
    loss=l1+l2
    return loss`

I meet the error: cuda out of memory, I check my code and could not find the reason, can you help me?

Dose the train data and test data use different dictionary?

Firstly, thanks for your sharing, I am a little confused about this code. The train data and the test data are preprocessed respectively, and they have their own shared_json files. When train or test, load data are:

train_json, train_shared_json = load_processed_json('./dataset/data_train.json', './dataset/shared_train.json')
test_json, test_shared_json = load_processed_json('./dataset/data_test.json', './dataset/shared_test.json')
train_data = DataSet(train_json, train_shared_json)
test_data = DataSet(test_json, test_shared_json)

and when word to index ,:

w2i_train, c2i_train = train_data.get_word_index()
w2i_test, c2i_test = test_data.get_word_index()

So, this confuses me a lot, It seems to using different dictionary to transform the word to index?

Getting error on windows 10

Hi, im trying to testing o window 10 and now I'm getting the follow error when I execute "python main.py" :


n_train 87599
n_test 10570
ctx_maxlen 3706
vocab_size_w: 75807
vocab_size_c: 212
ctx_sent_maxlen: 866
query_sent_maxlen: 60
Traceback (most recent call last):
File "main.py", line 73, in
glove_embd_w = torch.from_numpy(load_glove_weights('./dataset', args.w_embd_size, len(w2i), w2i)).type(torch.FloatTensor)
File "C:\Users\myuserName\BiDAF\process_data.py", line 109, in load_glove_weights
for line in f:
File "C:\Users\myuserName\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2776: character maps to

Thanks

Different operations in Attention Layer with official code

Hi @jojonki,
Your code is very clear, so it was very helpful.

I figure out some different operation with official code.

  • your code
# Context2Query
        c2q = torch.bmm(F.softmax(S, dim=-1), embd_query) # (N, T, 2d) = bmm( (N, T, J), (N, J, 2d) )
  • official code

u_logits == S

# Context2Query
u_a = softsel(u_aug, u_logits)  # [N, M, JX, d]

I checked softsel function in offical code.

This operation is contain element-wise multiplication, but the operation in your code is torch.bmm
(batch matrix-matrix multiplication). (same as Query2Context part)

I think this operation may be the reason of issue #1 .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.