ngrams maybe contains bug

def GenerateNgrams(words, ngrams):
    nglist = []
    for ng in ngrams:
        for word in words:
            nglist.extend([word[n:n+ng] for n in range(len(word)-ng+1)])
    return ngli

maybe it should like following

def GenerateNgrams(words, ngrams):
    nglist = []
    for ng in ngrams:
        nglist.extend(''.join([words[n:n+ng]) for n in range(len(words)-ng+1)])
    return ngli

Horovod checkpointing

Should probably get each process to save checkpoint to its own directory.

I think multiple processes are in a race to write checkpoints to the same file.

`DataLossError: corrupted record` during training

Hello, during training the following error occurs:

tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0
[[{{node read_batch_features/read/ReaderReadUpToV2}} = ReaderReadUpToV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](read_batch_features/read/TFRecordReaderV2, read_batch_features/file_name_queue, read_batch_features/read/ReaderReadUpToV2/num_records)]]

My environment is:
Python 3.6,
tensorflow 1.12.0,
macOS 10.13

Here is detailed information:

FastTrain 1000
WARNING:tensorflow:From RunConfig.init (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
ParseSpec {'text': VarLenFeature(dtype=tf.string), 'label': FixedLenFeature(shape=(), dtype=tf.string, default_value=None)}
Input file: data/train-label-text.txt
WARNING:tensorflow:From /Users/hans/repos/simple-tests/classify/workspace/ read_batch_features (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version.
Instructions for updating:
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/ read_keyed_batch_features (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version.
Instructions for updating:
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/ read_keyed_batch_examples (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version.
Instructions for updating:
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/ string_input_producer (from is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by Use, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...).
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/ input_producer (from is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by Use, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...).
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/ limit_epochs (from is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by Use
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/ QueueRunner.init (from is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the module.
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/ add_queue_runner (from is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the module.
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/ TFRecordReader.init (from tensorflow.python.ops.io_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by Use
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/ shuffle_batch_join (from is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by Use
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_io/ queue_parsed_features (from tensorflow.contrib.learn.python.learn.learn_io.graph_io) is deprecated and will be removed in a future version.
Instructions for updating:
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/ sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead.
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/ start_queue_runners (from is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the module.
Traceback (most recent call last):
File "", line 199, in
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/platform/", line 125, in run
File "", line 193, in main
File "", line 167, in FastTrain
estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=hooks)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/", line 1241, in _train_model_default
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/estimator/", line 1471, in _train_with_estimator_spec
_, loss =[estimator_spec.train_op, estimator_spec.loss])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/", line 783, in exit
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/", line 821, in _close_internal
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/", line 1069, in close
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/", line 1229, in close
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/", line 389, in join
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/", line 692, in reraise
raise value.with_traceback(tb)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/training/", line 257, in _run
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/", line 1257, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/", line 1407, in _call_tf_sessionrun
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0
[[{{node read_batch_features/read/ReaderReadUpToV2}} = ReaderReadUpToV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](read_batch_features/read/TFRecordReaderV2, read_batch_features/file_name_queue, read_batch_features/read/ReaderReadUpToV2/num_records)]]

ValueError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], []


I'm trying to train text classifier using train_langdetect.shwithout horovod installed.
And I'm getting this error:

  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/", line 1607, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], [].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 199, in <module>
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/platform/", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/absl/", line 299, in run
    _run_main(main, args)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/absl/", line 250, in _run_main
  File "", line 193, in main
  File "", line 167, in FastTrain
    estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=hooks)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/", line 1191, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "", line 118, in model_fn
    input_layer = tf.concat([text_embedding, ngram_embedding], -1)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/util/", line 180, in wrapper
    return target(*args, **kwargs)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/ops/", line 1420, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/ops/", line 1257, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/", line 794, in _apply_op_helper
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/util/", line 507, in new_func
    return func(*args, **kwargs)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/", line 3426, in _create_op_internal
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/", line 1770, in __init__
  File "/Users/kodlan/tensorflow_venv/lib/python3.7/site-packages/tensorflow_core/python/framework/", line 1610, in _create_c_op
    raise ValueError(str(e))
ValueError: Shape must be rank 2 but is rank 3 for 'concat' (op: 'ConcatV2') with input shapes: [?,16], [?,1,16], [].```

Do you have any suggestions what could be causing this?
Thank you.

Error during training


Thank you for making this code available. I am attempting to run against a simple dataset of mine but I am running into the following error (any pointers):

`Processing training dataset file
Processing test dataset file
INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_master': '', '_evaluation_master': '', '_model_dir': 'data/models/mydataset', '_save_checkpoints_secs': None, '_task_id': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_tf_random_seed': None, '_num_worker_replicas': 0, '_save_checkpoints_steps': 1000, '_session_config': , '_save_summary_steps': 100, '_is_chief': True, '_cluster_spec': < object at 0x117c32e80>, '_num_ps_replicas': 0, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1
, '_environment': 'local', '_keep_checkpoint_every_n_hours': 10000}
ParseSpec {'label': FixedLenFeature(shape=(1,), dtype=tf.int64, default_value=None), 'text': VarLenFeature(dtype=tf.string)}
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into data/models/mydataset/model.ckpt.
INFO:tensorflow:loss = 3.87008, step = 1
2017-08-23 14:27:17.797448: W tensorflow/core/framework/] Invalid argument: Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37
Traceback (most recent call last):
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/", line 1327, in _do_call
return fn(*args)
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/", line 1306, in _run_fn
status, run_metadata)
File "/anaconda3/lib/python3.5/", line 66, in exit
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/", line 466, in raise_exception_on_not_ok_status
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37
[[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](SparseSoftmaxCrossEntropyWithLogits/Reshape, SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 218, in
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "", line 208, in main
File "", line 181, in FastTrain
estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=None)
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/", line 241, in train
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/", line 686, in _train_model
_, loss =[estimator_spec.train_op, estimator_spec.loss])
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/", line 518, in run
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/", line 862, in run
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/", line 818, in run
return*args, **kwargs)
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/", line 972, in run
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/", line 818, in run
return*args, **kwargs)
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/", line 895, in run
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/", line 1321, in _do_run
options, run_metadata)
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37
[[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](SparseSoftmaxCrossEntropyWithLogits/Reshape, SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]]

Caused by op 'SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits', defined at:
File "", line 218, in
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "", line 208, in main
File "", line 181, in FastTrain
estimator.train(input_fn=train_input, steps=FLAGS.train_steps, hooks=None)
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/", line 241, in train
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/", line 630, in _train_model
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/estimator/", line 615, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "", line 121, in model_fn
labels=labels, logits=logits))
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/", line 1706, in sparse_softmax_cross_entropy_with_logits
precise_logits, labels, name=name)
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/", line 2491, in _sparse_softmax_cross_entropy_with_logits
features=features, labels=labels, name=name)
File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/", line 767, in apply_op
File "
/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "~/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Received a label value of -1 which is outside the valid range of [0, 47). Label values: 19 41 22 19 31 1 39 8 22 4 43 12 27 39 19 43 22 44 21 19 4 42 19 21 27 9 41 6 41 44 1 14 5 6 37 6 41 1 6 16 42 39 4 0 25 14 4 30 6 31 9 19 41 41 41 6 23 1 19 19 9 17 26 41 43 19 41 23 22 14 14 9 6 41 1 1 -1 6 23 31 16 14 20 6 41 19 4 1 21 31 23 34 4 6 11 6 1 4 30 32 44 17 43 4 44 32 13 9 44 4 41 41 4 6 9 19 22 40 9 23 4 21 41 0 6 5 20 37
[[Node: SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](SparseSoftmaxCrossEntropyWithLogits/Reshape, SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]]

The same mydataset.train / mydataset.test is processed fine by fasttext (C++ version). Thank yoiiu in advance!

