jojonki / bidaf Goto Github PK

Bidirectional Attention Flow for Machine Comprehension, Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi https://arxiv.org/abs/1611.01603

License: Apache License 2.0

Shell 0.93% Python 99.07%

bidaf's Introduction

Junki Ohmura (jojonki)

Linkedin
Google Scholar
Blog (Japanese)
Please feel free to contact me. junki dot ohmura at gmail.
Career
- Waseda University (master's degree in computer science)
- NTT: Researcher Intern. (Peer-to-Peeer communication)
- Bizreach: Software Engineer Intern.
- Sony Interactive Entertainment: Software Engineer (PlayStation 4's voice user interface).
- LTI, Carnegie Mellon University: Research about dialog systems supervised by Maxine Eskenazi.
- Sony: NLP Researcher (current).

bidaf's People

Contributors

Stargazers

Watchers

bidaf's Issues

Very small gradients causing no weight update from your model

Thanks for your code.
It helps me understand the BiDAF in details.

However, I found the model had no performance increasing.
Every epoch, the metric is always the same.
And then, I found it's the optimized gradients too small.
it's the order of 10^-3~10^-8.

I can't find what's wrong.
And I think your code is good to understand.
So, what may be the problem?

Core dumped during training

I followed the steps in the guide, downloaded and unzipped the glove as guided, and run prepro, that is no problem.

But when training, I got "core dumped" immediately when using an empty GPU to train.

~/BiDAF$ CUDA_VISIBLE_DEVICES=1 python3 main.py
Segmentation fault (core dumped)

Why "Runtime error" like this?

When I run the "python main.py --resume ./checkpoints/Epoch-12.model --test 1"
----Test---
0%| | 0/528 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 234, in
test(model, test_data)
File "main.py", line 186, in test
p1, p2 = model(c, cc, q, cq)
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/Users/liuyancen/Desktop/NLP/BiDAF/layers/attention_net.py", line 50, in forward
embd_context = self.build_contextual_embd(ctx_c, ctx_w) # (N, T, 2d)
File "/Users/liuyancen/Desktop/NLP/BiDAF/layers/attention_net.py", line 31, in build_contextual_embd
char_embd = self.char_embd_net(x_c) # (N, seq_len, embd_size)
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/Users/liuyancen/Desktop/NLP/BiDAF/layers/char_embedding.py", line 35, in forward
x = [F.relu(conv(x)) for conv in self.conv] # (N, Cout, seq_len, c_embd_size-filter_w+1). stride == 1
File "/Users/liuyancen/Desktop/NLP/BiDAF/layers/char_embedding.py", line 35, in
x = [F.relu(conv(x)) for conv in self.conv] # (N, Cout, seq_len, c_embd_size-filter_w+1). stride == 1
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 237, in forward
self.padding, self.dilation, self.groups)
File "/Users/liuyancen/anaconda/envs/python36/lib/python3.6/site-packages/torch/nn/functional.py", line 40, in conv2d
return f(input, weight, bias)
RuntimeError: Given input size: (1x161x1x8). Calculated output size: (100x161x-3x-15). Output size is too small at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/THNN/generic/VolumetricConvolutionMM.c:65

How to fix it?

Character Level Embedding

BiDAF/layers/char_embedding.py

Line 28 in 3e5ac9c

x = x.sum(2) # (N, seq_len, c_embd_size)

I feel strange to see this code.
Why do you sum over word_len dimension?
Why don't you apply 1D filter over word_len dimension?

Thank you.

Running on multiple GPUs

Thanks for the code!

I am trying to run the training on 8 GPUs but it seems that the code uses only one of them. Is it required to make some changes?

cuda out of memory

Thanks for your code, it helps me a lot. And I try to write on my own but I meet some questions.
When I rewrite the loss function as follows:
`class Custom_Loss(nn.Module):
def init(self):
super(Custom_Loss, self).init()

def loss_function(self, data, labels):
    loss = Variable(torch.zeros(1))
    for d, l in zip(data, labels):
        loss -= torch.log(d[l]).cpu()
    loss /= data.size(0)
    return loss

def forward(self, p1, p2, S, E):
    """
    N for batch and T for length of context

    :param p1: A tensor (N,T) represents for possibility of  choosing each word as answer(start)
    :param p2: A tensor (N,T) represents for possibility of  choosing each word as answer(end)
    :param S: A tensor for each query's start position
    :param E: A tensor for each query's end position
    :return: Loss of the BiDAF model
    """

    l1 = self.loss_function(p1, S)
    l2 = self.loss_function(p2, E)
    loss=l1+l2
    return loss`

I meet the error: cuda out of memory, I check my code and could not find the reason, can you help me?

Dose the train data and test data use different dictionary?

Firstly, thanks for your sharing, I am a little confused about this code. The train data and the test data are preprocessed respectively, and they have their own shared_json files. When train or test, load data are:

train_json, train_shared_json = load_processed_json('./dataset/data_train.json', './dataset/shared_train.json')
test_json, test_shared_json = load_processed_json('./dataset/data_test.json', './dataset/shared_test.json')
train_data = DataSet(train_json, train_shared_json)
test_data = DataSet(test_json, test_shared_json)

and when word to index ,:

w2i_train, c2i_train = train_data.get_word_index()
w2i_test, c2i_test = test_data.get_word_index()

So, this confuses me a lot, It seems to using different dictionary to transform the word to index?

Getting error on windows 10

Hi, im trying to testing o window 10 and now I'm getting the follow error when I execute "python main.py" :

n_train 87599
n_test 10570
ctx_maxlen 3706
vocab_size_w: 75807
vocab_size_c: 212
ctx_sent_maxlen: 866
query_sent_maxlen: 60
Traceback (most recent call last):
File "main.py", line 73, in
glove_embd_w = torch.from_numpy(load_glove_weights('./dataset', args.w_embd_size, len(w2i), w2i)).type(torch.FloatTensor)
File "C:\Users\myuserName\BiDAF\process_data.py", line 109, in load_glove_weights
for line in f:
File "C:\Users\myuserName\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2776: character maps to

Thanks

Different operations in Attention Layer with official code

Hi @jojonki,
Your code is very clear, so it was very helpful.

I figure out some different operation with official code.

your code

# Context2Query
        c2q = torch.bmm(F.softmax(S, dim=-1), embd_query) # (N, T, 2d) = bmm( (N, T, J), (N, J, 2d) )

official code

u_logits == S

# Context2Query
u_a = softsel(u_aug, u_logits)  # [N, M, JX, d]

I checked softsel function in offical code.

This operation is contain element-wise multiplication, but the operation in your code is torch.bmm
(batch matrix-matrix multiplication). (same as Query2Context part)

I think this operation may be the reason of issue #1 .

jojonki / bidaf Goto Github PK

bidaf's Introduction

Junki Ohmura (jojonki)

bidaf's People

Contributors

Stargazers

Watchers

Forkers

bidaf's Issues

Very small gradients causing no weight update from your model

Core dumped during training

Why "Runtime error" like this?

Character Level Embedding

Running on multiple GPUs

cuda out of memory

Dose the train data and test data use different dictionary?

Getting error on windows 10

Different operations in Attention Layer with official code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent