allenai / bilm-tf Goto Github PK

Tensorflow implementation of contextualized word representations from bi-directional language models

License: Apache License 2.0

Python 99.03% Shell 0.14% Dockerfile 0.84%

bilm-tf's Issues

Trainable parameters in TF Hub release

I had a few questions about the set of trainable parameters in the TF Hub releases of ELMo. The initial release mentions that the LSTM cell parameters are trainable (and this is what I expected, fine-tuning on the downstream task's supervised labels). However, I recently came across this paper which mentioned that the LSTM parameters in ELMo are fixed, and it also seems to be the case in the current release of ELMo on TF Hub.

Were the LSTM parameters kept fixed during the experiments described in the paper? (and the only fine-tuning done ignored the supervised labels, and used the training set of the downstream task for language modelling?)
Did you notice significant performance drops keeping the LSTM cell parameters trainable during the downstream task?

(n_sentences, 3, max_sentence_length, 1024)

After running inference with the batch, the return biLM embeddings are a numpy array with shape (n_sentences, 3, max_sentence_length, 1024), after removing the special begin/end tokens.
I assume that "(n_sentences, 3, max_sentence_length, 1024)" should be "(n_sentences, max_sentence_length, 1024)"?

Training does not end

I am training Elmo on a 30k sentence dataset for the last 24 hours and it is still not finished. The training perplexity is 2.12 for a while and it is not changing. I am also not sure what the output log means. I am getting something like this.

Batch 85900, train_perplexity=2.1179967
Total time: 87088.18443918228
Loading data from: ./data/elmo/small/data/part2.txt
Loaded 1014 sentences.
Finished loading
Loading data from: ./data/elmo/small/data/part1.txt
Loaded 29000 sentences.
Finished loading
Loading data from: ./data/elmo/small/data/part1.txt
Loaded 29000 sentences.
Finished loading

When the training is going to stop? Do I need to terminate training of my own?

Also, I have 2 files, part1.txt (for training) and part2.txt (for validation). I am not sure if Elmo is actually using part1 for training and part2 for validation. How can I ensure that?

Problem with handling encoding failure

I noticed that the method _convert_word_to_char_ids found in bilm/data.py can't handle encoding errors under certain conditions. The problem is in the code chunk below:

        word_encoded = word.encode('utf-8', 'ignore')[:(self.max_word_length-2)]
        code[0] = self.bow_char
        for k, chr_id in enumerate(word_encoded, start=1):
            code[k] = chr_id
        code[k + 1] = self.eow_char

As you can see, if a token consisted in a single character that failed to encode, then the word_encoded variable is going to be an empty string. When this goes into the enumerate for-loop, it exists without initializing the k variable and therefore the last line fails with the following error:

UnboundLocalError: local variable 'k' referenced before assignment

This can be handled with an exception, which could flag the failed token and print a warning. Since I haven't gone deep into the specifics of the library, I am not sure if this is a proper solution, so I thought I might as well bring this to your attention.

EDIT:

Another thing I have noticed is that empty files in the training data folder would cause the training to fail, once processed; meaning the training could go on for days, only to fail on an empty file. So just to save users the trouble, it would be very kind of you to notify them that empty files will cause a problem, or may be add some logic to safely skip such failures.

InvalidArgumentError (see above for traceback): Sampler's range is too small.

hello, I encounter an problem, “tensorflow.python.framework.errors_impl.InvalidArgumentError: Sampler's range is too small.
[[Node: lm/sampled_softmax_loss_1/LogUniformCandidateSampler = LogUniformCandidateSampler[num_sampled=8192, num_true=1, range_max=6603, seed=0, seed2=0, unique=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lm/Reshape_6, ^lm/dropout_1/mul)]]
”。
I want to train based on words without chars.So I changed the code like "load_vocab(args.vocab_file)"
And I remove CNN in dict.I don't know why.

L2 Norm

Sorry to borrow again!
In the paper, you said that the L2 norm is add while training the model, but I didn't find the code in the training code(training.py). Would you like to tell me where the L2 norm is added in your training code?

How to look up the tokens based on the word representation?

Hi~
I'm trying to use the ELMO replacing the original embedding lookup word representation in my model. And I see that you have a model in tensorflow https://tfhub.dev/google/elmo/2. But based on the example, I can only generate word representation and there is no api for the representation back to words. Could u help me with that?
Thanks

How to provide heldout files to ELMo during training

It says to train ELMo on a new dataset, we need to provide a set of heldout files. But how? Here, using the command we can only provide the train files.

The perplexity for bi-direction LM

In the paper, you said that the perplexity of the forward and backward lm is 39.6. Did you use golden word when you are validating the bi-lm?

Whether there is a memory leak in the training code

I run the training code, but memory usagel continues increasing until run out. My computer has 128G memory and total train files occupy only 1.3G in disk.

loss calculation

why the loss is multiplied by "unroll steps " here?

bilm-tf/bilm/training.py

Line 705 in 81a4b54

loss * options['unroll_steps'],

. wasn't the effect of loss, taken into account when it was calculated originally here?

bilm-tf/bilm/training.py

Line 427 in 81a4b54

def _build_loss(self, lstm_outputs):

, reason being is, the second dimension of next token id is equal to unroll step defined here :

bilm-tf/bilm/training.py

Line 441 in 81a4b54

def _get_next_token_placeholders(suffix):

NER performance with Ontonotes and number-related ELMo embeddings

Thanks a lot for this work and making it available!

I used ELMo contextualized embeddings in my Keras framework (DeLFT) and I could reproduce the excellent results for CoNLL 2003 NER task - actually slightly better than what you reported in your NAACL 2018 paper (92.47 averaged over 10 training, using the 5.5B ELMo model, warm-up, concatenation with Glove embeddings with a Lample 2016 BiLSTM-CRF architecture).

However when using ELMo embeddings with NER Ontonotes CoNLL-2012 dataset, I have a large drop of -5.0 points for f-score as compared to Glove only. The drop is the same when using ELMo only or ELMo embeddings concatenated with Glove.

Here is the evaluation with Glove without ELMo:

Evaluation on test set:
        f1 (micro): 86.17
                 precision    recall  f1-score   support

       QUANTITY     0.7321    0.7810    0.7558       105
          EVENT     0.6275    0.5079    0.5614        63
           NORP     0.9193    0.9215    0.9204       841
       CARDINAL     0.8294    0.7487    0.7870       935
        ORDINAL     0.7982    0.9128    0.8517       195
            ORG     0.8451    0.8635    0.8542      1795
       LANGUAGE     0.7059    0.5455    0.6154        22
           TIME     0.6000    0.5943    0.5972       212
        PRODUCT     0.7333    0.5789    0.6471        76
            FAC     0.6630    0.4519    0.5374       135
           DATE     0.8015    0.8571    0.8284      1602
          MONEY     0.8714    0.8631    0.8672       314
            LAW     0.6786    0.4750    0.5588        40
        PERCENT     0.8808    0.8682    0.8745       349
    WORK_OF_ART     0.6480    0.4880    0.5567       166
            LOC     0.7500    0.7709    0.7603       179
            GPE     0.9494    0.9388    0.9441      2240
         PERSON     0.9038    0.9306    0.9170      1988

    avg / total     0.8618    0.8615    0.8617     11257

And here are the results with ELMo:

Evaluation on test set:
	f1 (micro): 79.62
             precision    recall  f1-score   support

WORK_OF_ART     0.5510    0.6506    0.5967       166
    PRODUCT     0.6582    0.6842    0.6710        76
      MONEY     0.8116    0.8503    0.8305       314
        FAC     0.7130    0.5704    0.6337       135
   LANGUAGE     0.7778    0.6364    0.7000        22
   QUANTITY     0.1361    0.8000    0.2327       105
       TIME     0.6370    0.4387    0.5196       212
        GPE     0.9535    0.9437    0.9486      2240
      EVENT     0.6316    0.7619    0.6906        63
    PERCENT     0.8499    0.8596    0.8547       349
        ORG     0.9003    0.8758    0.8879      1795
        LOC     0.7611    0.7654    0.7632       179
     PERSON     0.9297    0.9452    0.9374      1988
    ORDINAL     0.8148    0.1128    0.1982       195
        LAW     0.5405    0.5000    0.5195        40
       NORP     0.9191    0.9322    0.9256       841
   CARDINAL     0.8512    0.1102    0.1951       935
       DATE     0.8537    0.5137    0.6415      1602

avg / total     0.8423    0.7548    0.7962     11257

I see that the drop is always for named entity classes related somehow to numbers (ORDINAL -65, CARDINAL -58, QUANTITY -53, DATE -18, etc.), and the recognition of all the other classes are actually improving with ELMo.

I am wondering what could cause this behavior (apart an implementation error from me), did you observe something similar?
Are you using special normalization of numbers on the corpus before training the BiLM?
I am using the default tokenization of Onotnotes/CoNLL-2012, should I use maybe another particular tokenization?

Is there a way to put Elmo as a Keras layer and integrate it into a Keras model?

I used my own corpus to trained Elmo provided here. I wonder if there is a way to put Elmo as a Keras layer and integrate it into a Keras model. If yes, could you please provide an example just like usage_character.py? Thank you very much.

the hyperparameter is not clear

。。。。i didnt find the location to adjust the dim of input or output

Using ELMo in SQuAD with spaCy tokenizing way

Hi, I meet a problem about using ELMo with spaCy.

I use spaCy to preprocessing the text data, and without ELMo, the result looks fine. However, when I use the model with both spaCy and ELMo, I have gotten a very bad result, 0.08. There are many NAN and inf occurring when I see TensorFlow debugger. If I use NLTK and ELMo, the result is what I expect.

I think maybe there is something wrong when I using ELMo. However, when I saw the source code about ELMo, I didn't think there is relationship between ELMo and the tokenizing way(NLTK, spaCy). And I used the pre-training ELMo data for SQuAD. I've been plagued by this problem for a long time, I really want to know if it's something I missed. Is it necessary to train new ELMo data when I change into spaCy?

How to fine tune

In the paper "Deep contextualized word representations" there was a supplemental section about fine tuning biLM.

I would like to know how to do it, specifically:

How to load pre-trained model on a data set and train further, for example 1 epoch, on a different data set?

I guess restart_ckpt_file argument can be used, but don't know how to use it.
Thanks in advance!

why is the embeddings shape (n_sentences, 3, max_sentence_length, 1024)

Can you please make it clearer why the second dimension in the embeddings is 3?

not an issue - question regarding trained models

is it possible to hack models to spit out a question from an arbitrary given text?
are you aware of any research here?

Release

Looks like the project hasn't been released. Is that correct?

Thank you

the config need to be more clear !!!!

I try the batch of sentence with 160 ,break ,find in model.py max batchsize frozen to 128.....

ImportError: No module named bilm.training

with the training command in the readme ，i got this error

Is the small model used by the example codes trained on some corpus? Or is it some dummy model with randomly generated weights?

I am thinking of using Elmo in my project, but the model provided in the readme is too large for our application. Thanks!

OOM when allocating tensor for large vocabulary

Hi!
When I try to run bin / run_test.py on gpus, I get:

....
018-06-18 12:27:48.890298: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:                                                                                                             [35/1118]
Limit:                 15922230068
InUse:                 15585278208
MaxInUse:              15922230016
NumAllocs:                     376
MaxAllocSize:          15173454848

2018-06-18 12:27:48.890313: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *****************************************************************************xxxxxxxxxxxxxxxxxxxxxxx
2018-06-18 12:27:48.890342: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[512,5555540]
Traceback (most recent call last):
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,5555540]
         [[Node: lm/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](lm/softmax/W/read/_111, lm/transpose/sub_1)]]
         [[Node: lm/mul_8/_141 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_nam
e="edge_565_lm/mul_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bin/run_test.py", line 42, in <module>
    main(args)
  File "bin/run_test.py", line 29, in main
    test(options, ckpt_file, data, batch_size=args.batch_size)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 1024, in test
    feed_dict=feed_dict
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,5555540]
         [[Node: lm/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](lm/softmax/W/read/_111, lm/transpose/sub_1)]]
         [[Node: lm/mul_8/_141 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_nam
e="edge_565_lm/mul_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'lm/transpose', defined at:
  File "bin/run_test.py", line 42, in <module>
    main(args)
  File "bin/run_test.py", line 29, in main
    test(options, ckpt_file, data, batch_size=args.batch_size)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 970, in test
    model = LanguageModel(test_options, False)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 71, in __init__
    self._build()
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 425, in _build
    self._build_loss(lstm_outputs)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 507, in _build_loss
    tf.transpose(self.softmax_W)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1278, in transpose
    ret = gen_array_ops.transpose(a, perm, name=name)
  File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3658, in transpose
  result = _op_def_lib.apply_op("Transpose", x=x, perm=perm, name=name)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
  op_def=op_def)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
  original_op=self._default_original_op, op_def=op_def)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
  self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[512,5555540]
       [[Node: lm/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](lm/softmax/W/read/_111, lm/transpose/sub_1)]]
       [[Node: lm/mul_8/_141 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_nam
e="edge_565_lm/mul_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

I have 5555540 tokens of my vocabulary.
It runs on the processor (export CUDA_VISIBLE_DEVICES =" "), but is too slow. I can not change the size of the vocabulary.

Tensorflow version down-compatibility

Constrained by the computational resource available, we have to work with TF v1.0 (the manager of supercomputer cluster wants to keep it that way for the benefit of other users). I was wondering if there's any way we could still be able to use ELMo with v1.0.

Thanks!

Generate ELMO from Glove

I have a model that performs sentiment analysis task and that uses Glove as word embedding, in the beginning, I load the Glove file glove.xxxB.yyyd.txt(xxx---token,yyy---dimension). Now I need instead of that to load the ELMO file that's equivalent to this glove. In another word, I need to map between Glove and ELMO one to one mapping is that possible? And if that possible what's the exported dimension of ELMO?

Value of "n_characters" in char embedding

In train_model.py, "n_characters" is defined as 261. However, in pretrained models's configs, n_characters is set to 262. Any particular reason?

Test model : https://raw.githubusercontent.com/allenai/bilm-tf/master/tests/fixtures/model/options.json
Pretrained model : https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json

Both models have n_characters=262

Moreover, while reading a pre-trained model, we increase the size by one to add padding

bilm-tf/bilm/model.py

Line 220 in 81a4b54

# Have added a special 0 index for padding not present

But we already have a special char for padding

bilm-tf/bilm/data.py

Line 120 in 81a4b54

self.pad_char = 260 # <padding>

Invalid shape initializing char_embed.

Hi!
After saving a checkpoint i tried load weights.hdf5 and got this error:
...
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/some_path/bilm-tf/bilm/model.py", line 238, in ret
varname_in_file, shape, weights.shape)
ValueError: Invalid shape initializing char_embed, got [261, 16], expected (262, 16)

Can anybody help me?

max_batch_size issue

Hi,

While I am trying to create embeddings for Questions like shown in usage_token.py, I am getting an error from the tensor because of having a different size than the max_batch_size?

How can I handle that case? Have you encountered such a problem in the project?

Thank you

a strange error occurred

'ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [16384] and type float'

I modified the code of train_elmo.py like this:

options = {
'bidirectional': True,

 # 'char_cnn': {'activation': 'relu',
 #  'embedding': {'dim': 16},
 #  'filters': [[1, 32],
 #   [2, 32],
 #   [3, 64],
 #   [4, 128],
 #   [5, 256],
 #   [6, 512],
 #   [7, 1024]],
 #  'max_characters_per_token': 50,
 #  'n_characters': 6707,
 #  'n_highway': 2},

Cause I want to run this code only based on word_emb without char_emb
Then this error called 'ResourceExhaustedError' occurred
Could you tell me how to fix that?
THX !!!

Hub version to train new embedding

May I use hub version from tensorflow to train my own elmo embedding? And my corpus is Chinese.
If it's OK , can you give me a simple example?
Thank you so much.

What's the relationship of "BidirectionalLanguageModel" and "LanguageModel"?

And in train_elmo.py , it use "LanguageModel". But in usage_character.py and usage_token.py, the "BidirectionalLanguageModel" class is used.

so, what's the relationship of class "BidirectionalLanguageModel" in bilm/model.py and class "LanguageModel" in bilm/training.py? thanks

LSTM final states as the initial states of next batch

Hi!

It seems to me from the code provided that the final states of each batch are fed as the initial states of the next batch. However, in data.py the examples in a batch seem to be the continuation of the previous example in the same batch (when the sentence is greater than the BPTT rollout steps). If what I'm saying is correct, we are feeding the final states to a new batch that is not the continuation of the sentences in the previous batch.

Why is that so? What am I missing?

Converging to 25 ppl after 7 days?

Hi,

I've been training the model on the 1 million benchmark for 7 days now on 4 tesla k80 gpus and it seems to be converging to a perplexity around 25 (it has not improved for 24h now). See tail of log below.
Is this expected behaviour? Has it converged?

Batch 142200, train_perplexity=24.746037
Total time: 585740.8309390545
Batch 142300, train_perplexity=25.46843
Total time: 586129.8147296906
Batch 142400, train_perplexity=25.55357
Total time: 586523.1840500832
Loading data from: ../../data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00026-of-00100
Loaded 306324 sentences.
Finished loading
Loading data from: ../../data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00028-of-00100
Loaded 305485 sentences.
Finished loading
Batch 142500, train_perplexity=26.139242
Total time: 586992.946965456
WARNING:tensorflow:Error encountered when serializing lstm_output_embeddings.
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'list' object has no attribute 'name'
Batch 142600, train_perplexity=24.84199
Total time: 587395.0743260384
Batch 142700, train_perplexity=25.43104
Total time: 587794.4523823261
Batch 142800, train_perplexity=25.182297
Total time: 588190.2893879414
Batch 142900, train_perplexity=24.556465
Total time: 588584.6505479813
Batch 143000, train_perplexity=25.966608
Total time: 588982.1930603981
Batch 143100, train_perplexity=25.03588
Total time: 589376.4338204861
Batch 143200, train_perplexity=25.981043
Total time: 589773.4447641373
Loading data from: ../../data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00065-of-00100
Loaded 305213 sentences.
Finished loading
Batch 143300, train_perplexity=25.373167
Total time: 590195.4948370457
Loading data from: ../../data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00041-of-00100
Loaded 306092 sentences.

Details about the training process.

I'm really sorry to broth you again. There are two ways to get the perplexity of your language model. (1) you input the really words in the sentence to the model and the model is going to predict the next word, which is called as training. (2) you input the word that the model just predict to the model and the model predict the next, which is called as inference.
So would you like to tell me, which way do you use in getting the perplexity of 39.4?
I'm not good at expressing my view in English and thanks for your patience!
By the way, would you like to tell me about the learning rate you use in training elmo?

Question about the weight_layers

Hi, thanks for the great paper and nice implementation.

After reading through the paper and the code, I still feel confused about the weight_layers.

I saw that the dump_token_embeddings only return the intermediate LSTM status. It is not the final weighted ELMo vector. As mentioned in the paper, the weight_layers should be trained with the downstream task. However, in usage_token.py, the code directly creates the weight_layers and use it without training. Then, I was confused by the usage of the weight layers. Could you please explain it a little bit?

Also, in the test_elmo.py, why is the expected_elmo calculated this way? I don't understand why the actual_elmo will be close to the values following these calculations. Could you please also explain it?

Thanks a lot.

How to fine tune the existing weights on new data ?

I converted the hdf5 file back as a ckpt file (using the custom_getter method in bilm/model.py) and tried to use it with architecture in bilm/training.py but the loaded weights give very bad perplexity on heldout data when I do run_test.py. Are the architectures in bilm/model.py and bilm/training.py compatible. If you feel I m doing something wrong, is it possible for you to share the ckpt file of the given hdf5 file.

Thanks

Can you please release the training code?

Hi,
I would like to train the elmo for my own dataset. Can you please release the training code so that I can use the weights generated by it into the bilm-tf application? I would be thankful, if I get some meanings from your training code although it is not ready for github?

Thank you.

if i could train the code in 2 GPU with lstm dim 1200,pro dim 150？

Can I train a language model and generate weights file by myself？

Thank you !

How long does it take to train the model?

How long does it take to train the model from the ELMO paper? I read that you used 3 GPUs. Which ones?
I want to get a rough idea before I can train my own.
This is not an issue per se, so if there's a different forum to discuss these things please let me know.

And congratulations on winning the best paper award at NAACL!

details about parameters and hyperparameters

Hi, could you share some hyper-parameters details while training biLM？like optimizer, dropout rate? Thanks.

Code loads only two of the data shards while training.

I have a gigantic dataset to train Elmo on. So I split the training set into 1000 separate files. While loading data for training I see that only two of the files are loaded (reverse=True and False). Why is that? Or Am I missing something?

And btw Congratulations on winning the best paper award at NAACL!

Thanks,

question about cnn embedding dim and lstm dim

In code both cnn embedding dim and individual lstm outputs dim are 512.
The paper says it would compute a task specific weighting of all biLM layers.
The biLM layers embedding is concatenation of [foreward-lstm, backward-lstm] , so the dim should be 1024.
So how to compute a weighting between biLM layers embedding(1024) and cnn embedding(512)? How to add them with different dim

Can I train Elmo for my own language？

Thank you very much！
Best Regards！

Why are embeddings of feature length 1024?

(More of a question than an issue)
The embeddings are of shape (None, 3, None, 1024). Is there any specific reason why embeddings have a size of 1024? Which hyper parameter should I change if I want to reduce the embedding size?

Which benchmark do you use in training elmo?

Sorry to bother again. I find there are two benchmarks in https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark, the big one(9.9G) and the small one(1.7G). Would you like to tell me which benchmark do you use in training elmo.

What happens in case of OOV character?

Does the model break if it sees an out of vocabulary character?

use before assign error

in data.py

j=0
for k, chr_id in enumerate(word_encoded, start=1):
code[k] = chr_id
j=k
k=j

no.of GPUs used for training 1 Billion Word Benchmark ?

I getting OOM with using single GTX 1080Ti

How to concatenate a ELMo vector with the corresponding context-independent token representation?

As described in the paper "deep contextualized word representations", before being fed into NLP tasks, elmo vectors, ELMo, are concatenated with context-independent token representations X like this: [X; ELMo]

But, how exactly are they concatenated? is it element-wise or we just combine the two vectors end-to-end?

I saw from the source codes that the lstm layers' outputs in the bilm are concatenated element-wise with tf.concat([lstm_output1, lstm_output2], axis=-1), so I feel like the concatenation between ELMo and X should be also element-wise.
But, if it is combined element-wise, then does X always have to follow the dimension of ELMo's internal lstm layers?
For example, i see that given 2 sentences and max_length of sentences being 10, vectors created by weight_layers are in shape of (2, 10, 32) with 32 being the concatenated unit of two lstm layers(forward and backward) whose dimension is 16(16x2 = 32). However, if we were to combine ELMO with X element-wise as introduced in the paper, X also needs to be in shape of (num_sentences, max_sentence_length, 32), which sort of limits the probability of X's embedding dimension size being different than 32.

As far as I understand options.json file correctly, "projection_dim" hyperparameter determines the internal lstm layer dimension.
Then, is they any way to manipulate the lstm layer dimension in the bilm (possibly through lstm { ... projection_dim = ? ... } in options.json file)? or am I missing something?
(I ask this question because when I tried to change projection_dim and ran, I came across the following error)

======================================================================
ERROR: test_weighted_layers (main.TestWeightedLayers)

Traceback (most recent call last):
File "elmo.py", line 136, in test_weighted_layers
self._check_weighted_layer(1.0, do_layer_norm=True, use_top_only=False)
File "elmo.py", line 36, in _check_weighted_layer
bilm_ops = model(character_ids)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 97, in call
max_batch_size=self._max_batch_size)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 286, in init
self._build()
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 290, in _build
self._build_word_char_embeddings()
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 415, in _build_word_char_embeddings
dtype=DTYPE)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1317, in get_variable
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1079, in get_variable
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 417, in get_variable
return custom_getter(**custom_getter_kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 275, in custom_getter
return getter(name, *args, **kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 786, in _get_single_variable
use_resource=use_resource)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2220, in variable
use_resource=use_resource)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2210, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2193, in default_variable_creator
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 235, in init
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 343, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 770, in
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 246, in ret
varname_in_file, shape, weights.shape)
ValueError: Invalid shape initializing CNN_proj/W_proj, got [124, 8], expected (124, 16)

Ran 1 test in 0.099s

FAILED (errors=1)

I'm currently studying CNN so it was kinda hard for me to trace back through this error, but it looks like projection_dim depends on some other value.

To sum up, all I want to know is how to manipulate elmo's embedding dimension in order to match the size of ELMo with that of context-independent token representations.

Please correct or ask me if any of my questions is unclear or mistaken.
Thank you for any help you may provide!!

allenai / bilm-tf Goto Github PK

bilm-tf's Issues

====================================================================== ERROR: test_weighted_layers (main.TestWeightedLayers)

Recommend Projects

Recommend Topics

Recommend Org

======================================================================
ERROR: test_weighted_layers (main.TestWeightedLayers)