atpaino / deep-text-corrector Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 264.0 303 KB

Deep learning models trained to correct input errors in short, message-like text

License: Apache License 2.0

Jupyter Notebook 29.92% Python 70.08%

deep-text-corrector's People

Contributors

Stargazers

Watchers

Forkers

xcopyco wavelets hbcbh1999 igorcosta vyraun neuroradiology cristiberceanu nmstoker dantodor gdtm86 vibster richardknop abbi031892 little1tow euwen laxas kjeanclaude ematvey wuzhongdehua djlzq bradparks metricle xsongx allensmile fancyerii nilopc-tensorflow-learning snakeroot91 ml-lab codeaudit benjamesbabala neo4reo andysdc caili5104 dtsukiyama tpys leezqcst kingofoz pombredanne cdo03c 1beb codezixo rajivpoddar zhangs06 ml-ai-nlp-ir jwilk-forks floydhub nagyistge xushenkun shuvayan karthi2016 raghavendranpm trampolinerocket prakash19921206 zgsxwsdxg terrytowne sr-vz zhoudan0215 yasutaka nbgroupp janbussieck kmugash youngcube leavesster colinsongf s4sarath lumiqai jinjiaji512 qsong4 jasonhoo95 b2220333 ieee820 jinyeong gooklim ramaswamym1987 qinbill binhnq94 ravibansal keyboardwitch thangduong sikisikiliu sreendra bigrlab lsq357 xxueo sadhumangal songchenli xiaoqiangkx skybirdhe daijianxin ufukhurriyetoglu lab930boss wximo emersonzyh flyland68 minsu-daniel-kim catcatrun arianpasquali hothanhluan bnuside satadru5

deep-text-corrector's Issues

I am getting movie_dialog_train.txt not found Error

when I run this command python correct_text.py --train_path /movie_dialog_train.txt --val_path /movie_dialog_val.txt \ --config DefaultMovieDialogConfig \ --data_reader_type MovieDialogReader \ --model_path /movie_dialog_model
IOError: [Errno 2] No such file or directory: '/movie_dialog_train.txt' this error is showing up.
Am I missing something here? I cannot find this text file in Cornell corpus also. I'm trying to build a grammar checker for my project. Can anyone help me with this issue?

'zip' object is not subscriptable

When i tried running
python correct_text.py --train_path /movie_dialog_train.txt
--val_path /movie_dialog_val.txt
--config DefaultMovieDialogConfig
--data_reader_type MovieDialogReader
--model_path /movie_dialog_model

it gives me error

File "correct_text.py", line 438, in
tf.app.run()
File "/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "correct_text.py", line 413, in main
data_reader = MovieDialogReader(config, FLAGS.train_path)
File "/home/abhinavsingh/deep-text-corrector-master/text_corrector_data_readers.py", line 82, in init
dataset_copies=dataset_copies)
File "/home/abhinavsingh/deep-text-corrector-master/data_reader.py", line 46, in init
self.token_to_id = dict(full_token_and_id[:max_vocabulary_size])
TypeError: 'zip' object is not subscriptable

'zip' object is not subscriptable

I am getting this error when I try to run data_reader.
"TypeError: 'zip' object is not subscriptable"

file not found error

when i m trying to run python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt
it gives me an error like file not found -
Traceback (most recent call last):
File "preprocessors/preprocess_movie_dialogs.py", line 23, in
tf.app.run()
File "C:\Users\Sarve\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\Sarve\AppData\Roaming\Python\Python37\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\Users\Sarve\AppData\Roaming\Python\Python37\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "preprocessors/preprocess_movie_dialogs.py", line 14, in main
open(FLAGS.out_file, "w") as out:
FileNotFoundError: [Errno 2] No such file or directory: ''

can u explain me how to run this?

train_path required for decode?

In the example at the end of the README, decode is called with test_path but not train_path. (That makes sense to me.)

However, in correct_text.py main, FLAGS.train_path is still required even for the code path that runs when FLAGS.decode is true.

Should I change the README, or correct_text.py?

Cannot replicate

I trained the model as specified in the readme but cannot replicate the results. The following is what I get.

Input: you must have girlfriend
Output: than than than than than than than than than than

Is this because of the training/dataset?

KeyError: 'UNK'

def init(self, config, train_path=None, token_to_id=None,
dropout_prob=0.25, replacement_prob=0.25, dataset_copies=2):
super(MovieDialogReader, self).init(
config, train_path=train_path, token_to_id=token_to_id,
special_tokens=[
PAD_TOKEN, GO_TOKEN, EOS_TOKEN,
MovieDialogReader.UNKNOWN_TOKEN],
dataset_copies=dataset_copies)

    self.dropout_prob = dropout_prob
    self.replacement_prob = replacement_prob
    self.UNKNOWN_ID = self.token_to_id[MovieDialogReader.UNKNOWN_TOKEN]

#last line gives error
#I dont understand where UNKNOWN_ID is coming from and what token_to_id actually is

'Variable proj_w does not exist, or was not created with tf.get_variable(). ' on Google Colab

The code works on my local environment, while the training is too slow so I move it to Google Colab. Then I got 'Variable proj_w already exists, disallowed. ' while the 4th block of the code executing.

I searched and found that it always uses with tf.variable_scope while using tf.get_variable, then I thought it might be worked if I change tf.get_variable to tf.Varable but it didn't. The error became:

ValueError Traceback (most recent call last)
in ()
----> 1 train(data_reader, train_path, val_path, model_path)

/content/drive/My Drive/ColabNotebooks/grammarCorrection/correct_text.py in train(data_reader, train_path, test_path, model_path)
145 "Creating %d layers of %d units." % (
146 config.num_layers, config.size))
--> 147 model = create_model(sess, False, model_path, config=config)
148
149 # Read data into buckets and compute their sizes.

/content/drive/My Drive/ColabNotebooks/grammarCorrection/correct_text.py in create_model(session, forward_only, model_path, config)
122 use_lstm=config.use_lstm,
123 forward_only=forward_only,
--> 124 config=config)
125 ckpt = tf.train.get_checkpoint_state(model_path)
126 if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path):

/content/drive/My Drive/ColabNotebooks/grammarCorrection/text_corrector_models.py in init(self, source_vocab_size, target_vocab_size, buckets, size, num_layers, max_gradient_norm, batch_size, learning_rate, learning_rate_decay_factor, use_lstm, num_samples, forward_only, config, corrective_tokens_mask)
108 if self.target_vocab_size > num_samples > 0:
109 # w = tf.get_variable("proj_w", [size, self.target_vocab_size])
--> 110 w = tf.Variable([size, self.target_vocab_size], 'proj_w')
111 w_t = tf.transpose(w)
112 # b = tf.get_variable("proj_b", [self.target_vocab_size])

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1485 constraint=constraint,
1486 synchronization=synchronization,
-> 1487 aggregation=aggregation)
1488
1489

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1235 constraint=constraint,
1236 synchronization=synchronization,
-> 1237 aggregation=aggregation)
1238
1239 def _get_partitioned_variable(self,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
538 constraint=constraint,
539 synchronization=synchronization,
--> 540 aggregation=aggregation)
541
542 def _get_partitioned_variable(self,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, constraint, synchronization, aggregation)
490 constraint=constraint,
491 synchronization=synchronization,
--> 492 aggregation=aggregation)
493
494 # Set trainable value based on synchronization value.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
877 raise ValueError("Variable %s does not exist, or was not created with "
878 "tf.get_variable(). Did you mean to set "
--> 879 "reuse=tf.AUTO_REUSE in VarScope?" % name)
880
881 # Create the tensor to initialize the variable with default value.

ValueError: Variable proj_w does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?

I'm still stuck in this error, anyone can help?

I run decode ,then has a error ?

(env-0.12.0) root@op-System-Product-Name:/home/github/deep-text-corrector# ./predict.sh
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "correct_text.py", line 439, in
tf.app.run()
File "/home/env/python3.5/env-0.12.0/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "correct_text.py", line 414, in main
data_reader = MovieDialogReader(config, FLAGS.train_path)
File "/home/github/deep-text-corrector/text_corrector_data_readers.py", line 82, in init
dataset_copies=dataset_copies)
File "/home/github/deep-text-corrector/data_reader.py", line 32, in init
for tokens in self.read_tokens(train_path):
File "/home/github/deep-text-corrector/text_corrector_data_readers.py", line 114, in read_tokens
with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'train'

run script ?
python correct_text.py --test_path ./test.txt --config DefaultMovieDialogConfig --data_reader_type MovieDialogReader --model_path ./movie_dialog_model --decode

why???????????????????? but FLAGS.train_path is None

Result err

Hi atpaino,
I have run your project,but I cannot get the right result like the examples you give.My result likes below:
input:you must have girlfriend
output:you must have

Could you help me to analysis the reason about it,
thanks a lot

Why lowercase in preproc?

I noticed that the code lowers in preproc.

https://github.com/atpaino/deep-text-corrector/search?utf8=%E2%9C%93&q=lower%28%29&type=

Because of this:

The system can't use case as a clue.
The system can't correct case.

Did you try it without lowering at first, and there were problems?

(My instinct would be to avoid canonicalisation, and fight the out-of-dataset tokens with data.)

Cannot execute your code due to missing attribute '_linear'

Hi Alex, thanks for your great work!! I tried executing your main execution file ("textcorrector.ipnyb"), but I keep getting this error message: AttributeError: module 'tensorflow.python.ops.rnn_cell' has no attribute '_linear'. I ran your code using Jupyter Notebook, with Python's 3.5 version (latest), and tensorflow's 1.2.1 version (latest too). I don't understand why it keeps saying certain module lacks of the essential attribute to run your code. Could you please help explain why this happens, Alex?

txt files and model.

How do I create cleaned_dialog_val.txt.,cleaned_dialog_test.txt,this model :dialog_correcter_model_testnltk

Plurals?

Would it be harder to make this work (and yield "This tool helps")? Great stuff!

Problem in Run

Can someone tell me how to run this project???

'zip' object is not subscriptable

I have the same problem as here

I changed line 46 to self.token_to_id = dict((k, self.full_token_to_id[k]) for k in list(self.full_token_to_id.keys())[:max_vocabulary_size])

But still got the error:

     44             full_token_and_id = zip(vocabulary, range(len(vocabulary)))
     45             self.full_token_to_id = dict(full_token_and_id)
---> 46             self.token_to_id = dict((k, self.full_token_to_id[k]) for k in list(self.full_token_to_id.keys())[:max_vocabulary_size])
     47 
     48         self.id_to_token = {v: k for k, v in self.token_to_id.items()}

TypeError: 'zip' object is not subscriptable

What version of tensorflow does this code work on?

I tried running this code with multiple tensorflow versions (1.13, 1.1, 0.12) but it keeps giving some error or the other, specifically related to rnn_cell. (cannot import name rnn_cell). Even if I resolve it using contrib package, then I keep getting subsequent errors.
Can someone please tell me which version of tensorflow does this code work with without any errors?
Also, does it work with a specific version of python as well?

Thanks
Aayushee

sampled_loss() got an unexpected keyword argument 'logits'

@atpaino This error occured when running text_corrector_models.py

How many steps does it need to run for to get decent results ?

Have run it for 30K steps, but I am not getting a corrected output. I get the same output as whats fed into the input.

Input : this is table
Output : this is table

I am expecting it to insert the article and give me "this is a table"
How many more steps should I run it for ?

I run decode ,then has a error

(env-0.12.0) root@op-System-Product-Name:/home/github/deep-text-corrector# cat predict.sh
python correct_text.py --test_path ./test.txt --config DefaultMovieDialogConfig --data_reader_type MovieDialogReader --model_path ./movie_dialog_model --decode

may I run "python correct_text.py --train_path ./movie_dialog_train.txt --test_path ./test.txt --config DefaultMovieDialogConfig --data_reader_type MovieDialogReader --model_path ./movie_dialog_model --decode"????

add -train_path ./movie_dialog_train.txt
????

Shape (10, ?, 1, 512) must have rank 0

I am getting this error:
In Seq2seq
828 top_states = [array_ops.reshape(e, [-1, 1, cell.output_size])
829 for e in encoder_outputs]
--> 830 attention_states = array_ops.concat(1, top_states)

Module not found

Can someone provide me with a compiled and executable version of the project for i can not compile the file as it shows error of module not found for tensorflow and I need the project urgently?

ModuleNotFoundError: No module named 'text_correcter_data_readers'

I tried to play with TextCorrector.ipynb but it doesn't work.

After line
from text_correcter_data_readers import PTBDataReader, MovieDialogReader

I got the next error:
ModuleNotFoundError: No module named 'text_correcter_data_readers'

I tried to fix it to adding a path:

import sys
sys.path.append('C:\\my_path\\deep-text-corrector-master')

And adding an empty __init__.py file in deep-text-corrector-master' directory.

But it didn't help either.

Decoding is repeating the same word

Hello,
I have an issue

decoded = decode_sentence(sess, model, data_reader, "you must have girlfriend", corrective_tokens=corrective_tokens)
Input: you must have girlfriend
Output: you you you you you you you you you you

Any one has an idea please?
Many thanks

KeyError: 'UNK'

When I run your project ,this error occurs. How to solve this problem?

Traceback (most recent call last):
File "correct_text.py", line 438, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "correct_text.py", line 413, in main
data_reader = MovieDialogReader(config, FLAGS.train_path)
File "/opt/yangzhanku/correct_text/deep-text-corrector-master/text_corrector_data_readers.py", line 88, in init
self.UNKNOWN_ID = self.token_to_id[MovieDialogReader.UNKNOWN_TOKEN]
KeyError: 'UNK'

'str' object has no attribute 'decode'

When i tried running
python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt
--out_file preprocessed_movie_lines.txt

it gives me error
python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt --out_file preprocessed_movie_lines.txt
/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "preprocessors/preprocess_movie_dialogs.py", line 24, in
tf.app.run()
File "/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "preprocessors/preprocess_movie_dialogs.py", line 18, in main
s = dialog_line.strip().lower().decode("utf-8", "ignore")
AttributeError: 'str' object has no attribute 'decode'

But this is obvious as each line is string but if i remove decode then it dosen't working.

how to train customize word :

Hi, I like your model but I want to know how to train customize word :
like U.S..S.A -> U.S.A

module '_pywrap_tensorflow_internal' has no attribute 'TF_ListPhysicalDevices'

Getting this error when i run the preprocessing python itself.

atpaino / deep-text-corrector Goto Github PK

deep-text-corrector's People

Contributors

Stargazers

Watchers

Forkers

deep-text-corrector's Issues

Recommend Projects

Recommend Topics

Recommend Org