Git Product home page Git Product logo

deep-text-corrector's People

Contributors

atpaino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-text-corrector's Issues

I am getting movie_dialog_train.txt not found Error

when I run this command python correct_text.py --train_path /movie_dialog_train.txt --val_path /movie_dialog_val.txt \ --config DefaultMovieDialogConfig \ --data_reader_type MovieDialogReader \ --model_path /movie_dialog_model
IOError: [Errno 2] No such file or directory: '/movie_dialog_train.txt' this error is showing up.
Am I missing something here? I cannot find this text file in Cornell corpus also. I'm trying to build a grammar checker for my project. Can anyone help me with this issue?

'zip' object is not subscriptable

When i tried running
python correct_text.py --train_path /movie_dialog_train.txt
--val_path /movie_dialog_val.txt
--config DefaultMovieDialogConfig
--data_reader_type MovieDialogReader
--model_path /movie_dialog_model

it gives me error

File "correct_text.py", line 438, in
tf.app.run()
File "/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "correct_text.py", line 413, in main
data_reader = MovieDialogReader(config, FLAGS.train_path)
File "/home/abhinavsingh/deep-text-corrector-master/text_corrector_data_readers.py", line 82, in init
dataset_copies=dataset_copies)
File "/home/abhinavsingh/deep-text-corrector-master/data_reader.py", line 46, in init
self.token_to_id = dict(full_token_and_id[:max_vocabulary_size])
TypeError: 'zip' object is not subscriptable

file not found error

when i m trying to run python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt
it gives me an error like file not found -
Traceback (most recent call last):
File "preprocessors/preprocess_movie_dialogs.py", line 23, in
tf.app.run()
File "C:\Users\Sarve\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\Sarve\AppData\Roaming\Python\Python37\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\Users\Sarve\AppData\Roaming\Python\Python37\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "preprocessors/preprocess_movie_dialogs.py", line 14, in main
open(FLAGS.out_file, "w") as out:
FileNotFoundError: [Errno 2] No such file or directory: ''

can u explain me how to run this?

train_path required for decode?

In the example at the end of the README, decode is called with test_path but not train_path. (That makes sense to me.)

However, in correct_text.py main, FLAGS.train_path is still required even for the code path that runs when FLAGS.decode is true.

Should I change the README, or correct_text.py?

Cannot replicate

I trained the model as specified in the readme but cannot replicate the results. The following is what I get.

Input: you must have girlfriend
Output: than than than than than than than than than than

Is this because of the training/dataset?

KeyError: 'UNK'

def init(self, config, train_path=None, token_to_id=None,
dropout_prob=0.25, replacement_prob=0.25, dataset_copies=2):
super(MovieDialogReader, self).init(
config, train_path=train_path, token_to_id=token_to_id,
special_tokens=[
PAD_TOKEN, GO_TOKEN, EOS_TOKEN,
MovieDialogReader.UNKNOWN_TOKEN],
dataset_copies=dataset_copies)

    self.dropout_prob = dropout_prob
    self.replacement_prob = replacement_prob
    self.UNKNOWN_ID = self.token_to_id[MovieDialogReader.UNKNOWN_TOKEN]

#last line gives error
#I dont understand where UNKNOWN_ID is coming from and what token_to_id actually is

'Variable proj_w does not exist, or was not created with tf.get_variable(). ' on Google Colab

The code works on my local environment, while the training is too slow so I move it to Google Colab. Then I got 'Variable proj_w already exists, disallowed. ' while the 4th block of the code executing.
image
I searched and found that it always uses with tf.variable_scope while using tf.get_variable, then I thought it might be worked if I change tf.get_variable to tf.Varable but it didn't. The error became:

ValueError Traceback (most recent call last)
in ()
----> 1 train(data_reader, train_path, val_path, model_path)

/content/drive/My Drive/ColabNotebooks/grammarCorrection/correct_text.py in train(data_reader, train_path, test_path, model_path)
145 "Creating %d layers of %d units." % (
146 config.num_layers, config.size))
--> 147 model = create_model(sess, False, model_path, config=config)
148
149 # Read data into buckets and compute their sizes.

/content/drive/My Drive/ColabNotebooks/grammarCorrection/correct_text.py in create_model(session, forward_only, model_path, config)
122 use_lstm=config.use_lstm,
123 forward_only=forward_only,
--> 124 config=config)
125 ckpt = tf.train.get_checkpoint_state(model_path)
126 if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path):

/content/drive/My Drive/ColabNotebooks/grammarCorrection/text_corrector_models.py in init(self, source_vocab_size, target_vocab_size, buckets, size, num_layers, max_gradient_norm, batch_size, learning_rate, learning_rate_decay_factor, use_lstm, num_samples, forward_only, config, corrective_tokens_mask)
108 if self.target_vocab_size > num_samples > 0:
109 # w = tf.get_variable("proj_w", [size, self.target_vocab_size])
--> 110 w = tf.Variable([size, self.target_vocab_size], 'proj_w')
111 w_t = tf.transpose(w)
112 # b = tf.get_variable("proj_b", [self.target_vocab_size])

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1485 constraint=constraint,
1486 synchronization=synchronization,
-> 1487 aggregation=aggregation)
1488
1489

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1235 constraint=constraint,
1236 synchronization=synchronization,
-> 1237 aggregation=aggregation)
1238
1239 def _get_partitioned_variable(self,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
538 constraint=constraint,
539 synchronization=synchronization,
--> 540 aggregation=aggregation)
541
542 def _get_partitioned_variable(self,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, constraint, synchronization, aggregation)
490 constraint=constraint,
491 synchronization=synchronization,
--> 492 aggregation=aggregation)
493
494 # Set trainable value based on synchronization value.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
877 raise ValueError("Variable %s does not exist, or was not created with "
878 "tf.get_variable(). Did you mean to set "
--> 879 "reuse=tf.AUTO_REUSE in VarScope?" % name)
880
881 # Create the tensor to initialize the variable with default value.

ValueError: Variable proj_w does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?

I'm still stuck in this error, anyone can help?

I run decode ,then has a error ?

(env-0.12.0) root@op-System-Product-Name:/home/github/deep-text-corrector# ./predict.sh
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "correct_text.py", line 439, in
tf.app.run()
File "/home/env/python3.5/env-0.12.0/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "correct_text.py", line 414, in main
data_reader = MovieDialogReader(config, FLAGS.train_path)
File "/home/github/deep-text-corrector/text_corrector_data_readers.py", line 82, in init
dataset_copies=dataset_copies)
File "/home/github/deep-text-corrector/data_reader.py", line 32, in init
for tokens in self.read_tokens(train_path):
File "/home/github/deep-text-corrector/text_corrector_data_readers.py", line 114, in read_tokens
with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'train'

run script ?
python correct_text.py --test_path ./test.txt --config DefaultMovieDialogConfig --data_reader_type MovieDialogReader --model_path ./movie_dialog_model --decode

why???????????????????? but FLAGS.train_path is None

Result err

Hi atpaino,
I have run your project,but I cannot get the right result like the examples you give.My result likes below:
input:you must have girlfriend
output:you must have

Could you help me to analysis the reason about it,
thanks a lot

Cannot execute your code due to missing attribute '_linear'

Hi Alex, thanks for your great work!! I tried executing your main execution file ("textcorrector.ipnyb"), but I keep getting this error message: AttributeError: module 'tensorflow.python.ops.rnn_cell' has no attribute '_linear'. I ran your code using Jupyter Notebook, with Python's 3.5 version (latest), and tensorflow's 1.2.1 version (latest too). I don't understand why it keeps saying certain module lacks of the essential attribute to run your code. Could you please help explain why this happens, Alex?

txt files and model.

How do I create cleaned_dialog_val.txt.,cleaned_dialog_test.txt,this model :dialog_correcter_model_testnltk

Plurals?

image

Would it be harder to make this work (and yield "This tool helps")? Great stuff!

'zip' object is not subscriptable

I have the same problem as here

I changed line 46 to self.token_to_id = dict((k, self.full_token_to_id[k]) for k in list(self.full_token_to_id.keys())[:max_vocabulary_size])

But still got the error:

     44             full_token_and_id = zip(vocabulary, range(len(vocabulary)))
     45             self.full_token_to_id = dict(full_token_and_id)
---> 46             self.token_to_id = dict((k, self.full_token_to_id[k]) for k in list(self.full_token_to_id.keys())[:max_vocabulary_size])
     47 
     48         self.id_to_token = {v: k for k, v in self.token_to_id.items()}

TypeError: 'zip' object is not subscriptable

What version of tensorflow does this code work on?

Hi

I tried running this code with multiple tensorflow versions (1.13, 1.1, 0.12) but it keeps giving some error or the other, specifically related to rnn_cell. (cannot import name rnn_cell). Even if I resolve it using contrib package, then I keep getting subsequent errors.
Can someone please tell me which version of tensorflow does this code work with without any errors?
Also, does it work with a specific version of python as well?

Thanks
Aayushee

How many steps does it need to run for to get decent results ?

Have run it for 30K steps, but I am not getting a corrected output. I get the same output as whats fed into the input.

Input : this is table
Output : this is table

I am expecting it to insert the article and give me "this is a table"
How many more steps should I run it for ?

I run decode ,then has a error

(env-0.12.0) root@op-System-Product-Name:/home/github/deep-text-corrector# cat predict.sh
python correct_text.py --test_path ./test.txt --config DefaultMovieDialogConfig --data_reader_type MovieDialogReader --model_path ./movie_dialog_model --decode

may I run "python correct_text.py --train_path ./movie_dialog_train.txt --test_path ./test.txt --config DefaultMovieDialogConfig --data_reader_type MovieDialogReader --model_path ./movie_dialog_model --decode"????

add -train_path ./movie_dialog_train.txt
????

Shape (10, ?, 1, 512) must have rank 0

I am getting this error:
In Seq2seq
828 top_states = [array_ops.reshape(e, [-1, 1, cell.output_size])
829 for e in encoder_outputs]
--> 830 attention_states = array_ops.concat(1, top_states)

Module not found

Can someone provide me with a compiled and executable version of the project for i can not compile the file as it shows error of module not found for tensorflow and I need the project urgently?

ModuleNotFoundError: No module named 'text_correcter_data_readers'

I tried to play with TextCorrector.ipynb but it doesn't work.

After line
from text_correcter_data_readers import PTBDataReader, MovieDialogReader

I got the next error:
ModuleNotFoundError: No module named 'text_correcter_data_readers'

I tried to fix it to adding a path:

import sys
sys.path.append('C:\\my_path\\deep-text-corrector-master')

And adding an empty __init__.py file in deep-text-corrector-master' directory.

But it didn't help either.

Decoding is repeating the same word

Hello,
I have an issue

decoded = decode_sentence(sess, model, data_reader, "you must have girlfriend", corrective_tokens=corrective_tokens)
Input: you must have girlfriend
Output: you you you you you you you you you you

Any one has an idea please?
Many thanks

KeyError: 'UNK'

When I run your project ,this error occurs. How to solve this problem?

Traceback (most recent call last):
File "correct_text.py", line 438, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "correct_text.py", line 413, in main
data_reader = MovieDialogReader(config, FLAGS.train_path)
File "/opt/yangzhanku/correct_text/deep-text-corrector-master/text_corrector_data_readers.py", line 88, in init
self.UNKNOWN_ID = self.token_to_id[MovieDialogReader.UNKNOWN_TOKEN]
KeyError: 'UNK'

'str' object has no attribute 'decode'

When i tried running
python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt
--out_file preprocessed_movie_lines.txt

it gives me error
python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt --out_file preprocessed_movie_lines.txt
/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "preprocessors/preprocess_movie_dialogs.py", line 24, in
tf.app.run()
File "/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "preprocessors/preprocess_movie_dialogs.py", line 18, in main
s = dialog_line.strip().lower().decode("utf-8", "ignore")
AttributeError: 'str' object has no attribute 'decode'

But this is obvious as each line is string but if i remove decode then it dosen't working.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.