allenai / bilm-tf Goto Github PK
View Code? Open in Web Editor NEWTensorflow implementation of contextualized word representations from bi-directional language models
License: Apache License 2.0
Tensorflow implementation of contextualized word representations from bi-directional language models
License: Apache License 2.0
I had a few questions about the set of trainable parameters in the TF Hub releases of ELMo. The initial release mentions that the LSTM cell parameters are trainable
(and this is what I expected, fine-tuning on the downstream task's supervised labels). However, I recently came across this paper which mentioned that the LSTM parameters in ELMo are fixed, and it also seems to be the case in the current release of ELMo on TF Hub.
trainable
during the downstream task?After running inference with the batch, the return biLM embeddings are a numpy array with shape (n_sentences, 3, max_sentence_length, 1024), after removing the special begin/end tokens.
I assume that "(n_sentences, 3, max_sentence_length, 1024)" should be "(n_sentences, max_sentence_length, 1024)"?
I am training Elmo on a 30k sentence dataset for the last 24 hours and it is still not finished. The training perplexity is 2.12 for a while and it is not changing. I am also not sure what the output log means. I am getting something like this.
Batch 85900, train_perplexity=2.1179967
Total time: 87088.18443918228
Loading data from: ./data/elmo/small/data/part2.txt
Loaded 1014 sentences.
Finished loading
Loading data from: ./data/elmo/small/data/part1.txt
Loaded 29000 sentences.
Finished loading
Loading data from: ./data/elmo/small/data/part1.txt
Loaded 29000 sentences.
Finished loading
When the training is going to stop? Do I need to terminate training of my own?
Also, I have 2 files, part1.txt (for training) and part2.txt (for validation). I am not sure if Elmo is actually using part1 for training and part2 for validation. How can I ensure that?
I noticed that the method _convert_word_to_char_ids found in bilm/data.py can't handle encoding errors under certain conditions. The problem is in the code chunk below:
word_encoded = word.encode('utf-8', 'ignore')[:(self.max_word_length-2)]
code[0] = self.bow_char
for k, chr_id in enumerate(word_encoded, start=1):
code[k] = chr_id
code[k + 1] = self.eow_char
As you can see, if a token consisted in a single character that failed to encode, then the word_encoded variable is going to be an empty string. When this goes into the enumerate for-loop, it exists without initializing the k variable and therefore the last line fails with the following error:
UnboundLocalError: local variable 'k' referenced before assignment
This can be handled with an exception, which could flag the failed token and print a warning. Since I haven't gone deep into the specifics of the library, I am not sure if this is a proper solution, so I thought I might as well bring this to your attention.
EDIT:
Another thing I have noticed is that empty files in the training data folder would cause the training to fail, once processed; meaning the training could go on for days, only to fail on an empty file. So just to save users the trouble, it would be very kind of you to notify them that empty files will cause a problem, or may be add some logic to safely skip such failures.
hello, I encounter an problem, “tensorflow.python.framework.errors_impl.InvalidArgumentError: Sampler's range is too small.
[[Node: lm/sampled_softmax_loss_1/LogUniformCandidateSampler = LogUniformCandidateSampler[num_sampled=8192, num_true=1, range_max=6603, seed=0, seed2=0, unique=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lm/Reshape_6, ^lm/dropout_1/mul)]]
”。
I want to train based on words without chars.So I changed the code like "load_vocab(args.vocab_file)"
And I remove CNN in dict.I don't know why.
Sorry to borrow again!
In the paper, you said that the L2 norm is add while training the model, but I didn't find the code in the training code(training.py). Would you like to tell me where the L2 norm is added in your training code?
Hi~
I'm trying to use the ELMO replacing the original embedding lookup word representation in my model. And I see that you have a model in tensorflow https://tfhub.dev/google/elmo/2. But based on the example, I can only generate word representation and there is no api for the representation back to words. Could u help me with that?
Thanks
It says to train ELMo on a new dataset, we need to provide a set of heldout files. But how? Here, using the command we can only provide the train files.
In the paper, you said that the perplexity of the forward and backward lm is 39.6. Did you use golden word when you are validating the bi-lm?
I run the training code, but memory usagel continues increasing until run out. My computer has 128G memory and total train files occupy only 1.3G in disk.
Thanks a lot for this work and making it available!
I used ELMo contextualized embeddings in my Keras framework (DeLFT) and I could reproduce the excellent results for CoNLL 2003 NER task - actually slightly better than what you reported in your NAACL 2018 paper (92.47 averaged over 10 training, using the 5.5B ELMo model, warm-up, concatenation with Glove embeddings with a Lample 2016 BiLSTM-CRF architecture).
However when using ELMo embeddings with NER Ontonotes CoNLL-2012 dataset, I have a large drop of -5.0 points for f-score as compared to Glove only. The drop is the same when using ELMo only or ELMo embeddings concatenated with Glove.
Here is the evaluation with Glove without ELMo:
Evaluation on test set:
f1 (micro): 86.17
precision recall f1-score support
QUANTITY 0.7321 0.7810 0.7558 105
EVENT 0.6275 0.5079 0.5614 63
NORP 0.9193 0.9215 0.9204 841
CARDINAL 0.8294 0.7487 0.7870 935
ORDINAL 0.7982 0.9128 0.8517 195
ORG 0.8451 0.8635 0.8542 1795
LANGUAGE 0.7059 0.5455 0.6154 22
TIME 0.6000 0.5943 0.5972 212
PRODUCT 0.7333 0.5789 0.6471 76
FAC 0.6630 0.4519 0.5374 135
DATE 0.8015 0.8571 0.8284 1602
MONEY 0.8714 0.8631 0.8672 314
LAW 0.6786 0.4750 0.5588 40
PERCENT 0.8808 0.8682 0.8745 349
WORK_OF_ART 0.6480 0.4880 0.5567 166
LOC 0.7500 0.7709 0.7603 179
GPE 0.9494 0.9388 0.9441 2240
PERSON 0.9038 0.9306 0.9170 1988
avg / total 0.8618 0.8615 0.8617 11257
And here are the results with ELMo:
Evaluation on test set:
f1 (micro): 79.62
precision recall f1-score support
WORK_OF_ART 0.5510 0.6506 0.5967 166
PRODUCT 0.6582 0.6842 0.6710 76
MONEY 0.8116 0.8503 0.8305 314
FAC 0.7130 0.5704 0.6337 135
LANGUAGE 0.7778 0.6364 0.7000 22
QUANTITY 0.1361 0.8000 0.2327 105
TIME 0.6370 0.4387 0.5196 212
GPE 0.9535 0.9437 0.9486 2240
EVENT 0.6316 0.7619 0.6906 63
PERCENT 0.8499 0.8596 0.8547 349
ORG 0.9003 0.8758 0.8879 1795
LOC 0.7611 0.7654 0.7632 179
PERSON 0.9297 0.9452 0.9374 1988
ORDINAL 0.8148 0.1128 0.1982 195
LAW 0.5405 0.5000 0.5195 40
NORP 0.9191 0.9322 0.9256 841
CARDINAL 0.8512 0.1102 0.1951 935
DATE 0.8537 0.5137 0.6415 1602
avg / total 0.8423 0.7548 0.7962 11257
I see that the drop is always for named entity classes related somehow to numbers (ORDINAL -65, CARDINAL -58, QUANTITY -53, DATE -18, etc.), and the recognition of all the other classes are actually improving with ELMo.
I am wondering what could cause this behavior (apart an implementation error from me), did you observe something similar?
Are you using special normalization of numbers on the corpus before training the BiLM?
I am using the default tokenization of Onotnotes/CoNLL-2012, should I use maybe another particular tokenization?
I used my own corpus to trained Elmo provided here. I wonder if there is a way to put Elmo as a Keras layer and integrate it into a Keras model. If yes, could you please provide an example just like usage_character.py
? Thank you very much.
。。。。i didnt find the location to adjust the dim of input or output
Hi, I meet a problem about using ELMo with spaCy.
I use spaCy to preprocessing the text data, and without ELMo, the result looks fine. However, when I use the model with both spaCy and ELMo, I have gotten a very bad result, 0.08. There are many NAN and inf occurring when I see TensorFlow debugger. If I use NLTK and ELMo, the result is what I expect.
I think maybe there is something wrong when I using ELMo. However, when I saw the source code about ELMo, I didn't think there is relationship between ELMo and the tokenizing way(NLTK, spaCy). And I used the pre-training ELMo data for SQuAD. I've been plagued by this problem for a long time, I really want to know if it's something I missed. Is it necessary to train new ELMo data when I change into spaCy?
In the paper "Deep contextualized word representations" there was a supplemental section about fine tuning biLM.
I would like to know how to do it, specifically:
I guess restart_ckpt_file
argument can be used, but don't know how to use it.
Thanks in advance!
Can you please make it clearer why the second dimension in the embeddings is 3?
is it possible to hack models to spit out a question from an arbitrary given text?
are you aware of any research here?
Looks like the project hasn't been released. Is that correct?
Thank you
I try the batch of sentence with 160 ,break ,find in model.py max batchsize frozen to 128.....
with the training command in the readme ,i got this error
I am thinking of using Elmo in my project, but the model provided in the readme is too large for our application. Thanks!
Hi!
When I try to run bin / run_test.py
on gpus, I get:
....
018-06-18 12:27:48.890298: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: [35/1118]
Limit: 15922230068
InUse: 15585278208
MaxInUse: 15922230016
NumAllocs: 376
MaxAllocSize: 15173454848
2018-06-18 12:27:48.890313: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *****************************************************************************xxxxxxxxxxxxxxxxxxxxxxx
2018-06-18 12:27:48.890342: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[512,5555540]
Traceback (most recent call last):
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
next(self.gen)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,5555540]
[[Node: lm/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](lm/softmax/W/read/_111, lm/transpose/sub_1)]]
[[Node: lm/mul_8/_141 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_nam
e="edge_565_lm/mul_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "bin/run_test.py", line 42, in <module>
main(args)
File "bin/run_test.py", line 29, in main
test(options, ckpt_file, data, batch_size=args.batch_size)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 1024, in test
feed_dict=feed_dict
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,5555540]
[[Node: lm/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](lm/softmax/W/read/_111, lm/transpose/sub_1)]]
[[Node: lm/mul_8/_141 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_nam
e="edge_565_lm/mul_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'lm/transpose', defined at:
File "bin/run_test.py", line 42, in <module>
main(args)
File "bin/run_test.py", line 29, in main
test(options, ckpt_file, data, batch_size=args.batch_size)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 970, in test
model = LanguageModel(test_options, False)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 71, in __init__
self._build()
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 425, in _build
self._build_loss(lstm_outputs)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/bilm-0.1-py3.6.egg/bilm/training.py", line 507, in _build_loss
tf.transpose(self.softmax_W)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1278, in transpose
ret = gen_array_ops.transpose(a, perm, name=name)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3658, in transpose
result = _op_def_lib.apply_op("Transpose", x=x, perm=perm, name=name)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "<some_path>/env/allen_elmo/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[512,5555540]
[[Node: lm/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](lm/softmax/W/read/_111, lm/transpose/sub_1)]]
[[Node: lm/mul_8/_141 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_nam
e="edge_565_lm/mul_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
I have 5555540 tokens of my vocabulary.
It runs on the processor (export CUDA_VISIBLE_DEVICES =" "
), but is too slow. I can not change the size of the vocabulary.
Constrained by the computational resource available, we have to work with TF v1.0 (the manager of supercomputer cluster wants to keep it that way for the benefit of other users). I was wondering if there's any way we could still be able to use ELMo with v1.0.
Thanks!
I have a model that performs sentiment analysis task and that uses Glove as word embedding, in the beginning, I load the Glove file glove.xxxB.yyyd.txt(xxx---token,yyy---dimension). Now I need instead of that to load the ELMO file that's equivalent to this glove. In another word, I need to map between Glove and ELMO one to one mapping is that possible? And if that possible what's the exported dimension of ELMO?
In train_model.py, "n_characters" is defined as 261. However, in pretrained models's configs, n_characters is set to 262. Any particular reason?
Test model : https://raw.githubusercontent.com/allenai/bilm-tf/master/tests/fixtures/model/options.json
Pretrained model : https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json
Both models have n_characters=262
Moreover, while reading a pre-trained model, we increase the size by one to add padding
Line 220 in 81a4b54
Line 120 in 81a4b54
Hi!
After saving a checkpoint i tried load weights.hdf5 and got this error:
...
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/some_path/bilm-tf/bilm/model.py", line 238, in ret
varname_in_file, shape, weights.shape)
ValueError: Invalid shape initializing char_embed, got [261, 16], expected (262, 16)
Can anybody help me?
Hi,
While I am trying to create embeddings for Questions like shown in usage_token.py, I am getting an error from the tensor because of having a different size than the max_batch_size?
How can I handle that case? Have you encountered such a problem in the project?
Thank you
'ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [16384] and type float'
I modified the code of train_elmo.py like this:
options = {
'bidirectional': True,
# 'char_cnn': {'activation': 'relu',
# 'embedding': {'dim': 16},
# 'filters': [[1, 32],
# [2, 32],
# [3, 64],
# [4, 128],
# [5, 256],
# [6, 512],
# [7, 1024]],
# 'max_characters_per_token': 50,
# 'n_characters': 6707,
# 'n_highway': 2},
Cause I want to run this code only based on word_emb without char_emb
Then this error called 'ResourceExhaustedError' occurred
Could you tell me how to fix that?
THX !!!
May I use hub version from tensorflow to train my own elmo embedding? And my corpus is Chinese.
If it's OK , can you give me a simple example?
Thank you so much.
And in train_elmo.py , it use "LanguageModel". But in usage_character.py and usage_token.py, the "BidirectionalLanguageModel" class is used.
so, what's the relationship of class "BidirectionalLanguageModel" in bilm/model.py and class "LanguageModel" in bilm/training.py? thanks
Hi!
It seems to me from the code provided that the final states of each batch are fed as the initial states of the next batch. However, in data.py
the examples in a batch seem to be the continuation of the previous example in the same batch (when the sentence is greater than the BPTT rollout steps). If what I'm saying is correct, we are feeding the final states to a new batch that is not the continuation of the sentences in the previous batch.
Why is that so? What am I missing?
Hi,
I've been training the model on the 1 million benchmark for 7 days now on 4 tesla k80 gpus and it seems to be converging to a perplexity around 25 (it has not improved for 24h now). See tail of log below.
Is this expected behaviour? Has it converged?
Batch 142200, train_perplexity=24.746037
Total time: 585740.8309390545
Batch 142300, train_perplexity=25.46843
Total time: 586129.8147296906
Batch 142400, train_perplexity=25.55357
Total time: 586523.1840500832
Loading data from: ../../data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00026-of-00100
Loaded 306324 sentences.
Finished loading
Loading data from: ../../data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00028-of-00100
Loaded 305485 sentences.
Finished loading
Batch 142500, train_perplexity=26.139242
Total time: 586992.946965456
WARNING:tensorflow:Error encountered when serializing lstm_output_embeddings.
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'list' object has no attribute 'name'
Batch 142600, train_perplexity=24.84199
Total time: 587395.0743260384
Batch 142700, train_perplexity=25.43104
Total time: 587794.4523823261
Batch 142800, train_perplexity=25.182297
Total time: 588190.2893879414
Batch 142900, train_perplexity=24.556465
Total time: 588584.6505479813
Batch 143000, train_perplexity=25.966608
Total time: 588982.1930603981
Batch 143100, train_perplexity=25.03588
Total time: 589376.4338204861
Batch 143200, train_perplexity=25.981043
Total time: 589773.4447641373
Loading data from: ../../data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00065-of-00100
Loaded 305213 sentences.
Finished loading
Batch 143300, train_perplexity=25.373167
Total time: 590195.4948370457
Loading data from: ../../data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00041-of-00100
Loaded 306092 sentences.
I'm really sorry to broth you again. There are two ways to get the perplexity of your language model. (1) you input the really words in the sentence to the model and the model is going to predict the next word, which is called as training. (2) you input the word that the model just predict to the model and the model predict the next, which is called as inference.
So would you like to tell me, which way do you use in getting the perplexity of 39.4?
I'm not good at expressing my view in English and thanks for your patience!
By the way, would you like to tell me about the learning rate you use in training elmo?
Hi, thanks for the great paper and nice implementation.
After reading through the paper and the code, I still feel confused about the weight_layers.
I saw that the dump_token_embeddings only return the intermediate LSTM status. It is not the final weighted ELMo vector. As mentioned in the paper, the weight_layers should be trained with the downstream task. However, in usage_token.py, the code directly creates the weight_layers and use it without training. Then, I was confused by the usage of the weight layers. Could you please explain it a little bit?
Also, in the test_elmo.py, why is the expected_elmo calculated this way? I don't understand why the actual_elmo will be close to the values following these calculations. Could you please also explain it?
Thanks a lot.
I converted the hdf5 file back as a ckpt file (using the custom_getter method in bilm/model.py) and tried to use it with architecture in bilm/training.py but the loaded weights give very bad perplexity on heldout data when I do run_test.py. Are the architectures in bilm/model.py and bilm/training.py compatible. If you feel I m doing something wrong, is it possible for you to share the ckpt file of the given hdf5 file.
Thanks
Hi,
I would like to train the elmo for my own dataset. Can you please release the training code so that I can use the weights generated by it into the bilm-tf application? I would be thankful, if I get some meanings from your training code although it is not ready for github?
Thank you.
Thank you !
How long does it take to train the model from the ELMO paper? I read that you used 3 GPUs. Which ones?
I want to get a rough idea before I can train my own.
This is not an issue per se, so if there's a different forum to discuss these things please let me know.
And congratulations on winning the best paper award at NAACL!
Hi, could you share some hyper-parameters details while training biLM?like optimizer, dropout rate? Thanks.
I have a gigantic dataset to train Elmo on. So I split the training set into 1000 separate files. While loading data for training I see that only two of the files are loaded (reverse=True and False). Why is that? Or Am I missing something?
And btw Congratulations on winning the best paper award at NAACL!
Thanks,
In code both cnn embedding dim and individual lstm outputs dim are 512.
The paper says it would compute a task specific weighting of all biLM layers.
The biLM layers embedding is concatenation of [foreward-lstm, backward-lstm] , so the dim should be 1024.
So how to compute a weighting between biLM layers embedding(1024) and cnn embedding(512)? How to add them with different dim
Thank you very much!
Best Regards!
(More of a question than an issue)
The embeddings are of shape (None, 3, None, 1024). Is there any specific reason why embeddings have a size of 1024? Which hyper parameter should I change if I want to reduce the embedding size?
Sorry to bother again. I find there are two benchmarks in https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark, the big one(9.9G) and the small one(1.7G). Would you like to tell me which benchmark do you use in training elmo.
Does the model break if it sees an out of vocabulary character?
in data.py
j=0
for k, chr_id in enumerate(word_encoded, start=1):
code[k] = chr_id
j=k
k=j
I getting OOM with using single GTX 1080Ti
As described in the paper "deep contextualized word representations", before being fed into NLP tasks, elmo vectors, ELMo, are concatenated with context-independent token representations X like this: [X; ELMo]
But, how exactly are they concatenated? is it element-wise or we just combine the two vectors end-to-end?
I saw from the source codes that the lstm layers' outputs in the bilm are concatenated element-wise with tf.concat([lstm_output1, lstm_output2], axis=-1), so I feel like the concatenation between ELMo and X should be also element-wise.
But, if it is combined element-wise, then does X always have to follow the dimension of ELMo's internal lstm layers?
For example, i see that given 2 sentences and max_length of sentences being 10, vectors created by weight_layers are in shape of (2, 10, 32) with 32 being the concatenated unit of two lstm layers(forward and backward) whose dimension is 16(16x2 = 32). However, if we were to combine ELMO with X element-wise as introduced in the paper, X also needs to be in shape of (num_sentences, max_sentence_length, 32), which sort of limits the probability of X's embedding dimension size being different than 32.
As far as I understand options.json file correctly, "projection_dim" hyperparameter determines the internal lstm layer dimension.
Then, is they any way to manipulate the lstm layer dimension in the bilm (possibly through lstm { ... projection_dim = ? ... } in options.json file)? or am I missing something?
(I ask this question because when I tried to change projection_dim and ran, I came across the following error)
Traceback (most recent call last):
File "elmo.py", line 136, in test_weighted_layers
self._check_weighted_layer(1.0, do_layer_norm=True, use_top_only=False)
File "elmo.py", line 36, in _check_weighted_layer
bilm_ops = model(character_ids)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 97, in call
max_batch_size=self._max_batch_size)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 286, in init
self._build()
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 290, in _build
self._build_word_char_embeddings()
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 415, in _build_word_char_embeddings
dtype=DTYPE)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1317, in get_variable
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1079, in get_variable
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 417, in get_variable
return custom_getter(**custom_getter_kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 275, in custom_getter
return getter(name, *args, **kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 786, in _get_single_variable
use_resource=use_resource)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2220, in variable
use_resource=use_resource)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2210, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2193, in default_variable_creator
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 235, in init
constraint=constraint)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 343, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 770, in
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/home/youngmokcho/note_recognition/venv/lib/python3.5/site-packages/bilm-0.1-py3.5.egg/bilm/model.py", line 246, in ret
varname_in_file, shape, weights.shape)
ValueError: Invalid shape initializing CNN_proj/W_proj, got [124, 8], expected (124, 16)
Ran 1 test in 0.099s
FAILED (errors=1)
I'm currently studying CNN so it was kinda hard for me to trace back through this error, but it looks like projection_dim depends on some other value.
To sum up, all I want to know is how to manipulate elmo's embedding dimension in order to match the size of ELMo with that of context-independent token representations.
Please correct or ask me if any of my questions is unclear or mistaken.
Thank you for any help you may provide!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.