Comments (11)
It can't right now, because I don't have the dataset for it. But there is a lot of projects using MNIST dataset for identifying handwritten numbers.
from handwriting-ocr.
Thank you for your reply. I'm doing a test paper writing recognition system. Before that, I used CNN neural network to train data, but the problem of text segmentation encountered difficulties. So this project needs text segmentation?
from handwriting-ocr.
It depends on recognition method. In one approach, I test segmentation of text by bidirectional RNN and then classifying individual characters using CNN. I also test classification using CTC which process images of whole words (could be transformed to process whole lines of words).
from handwriting-ocr.
What are the requirements for Bi-RNN and CNN for data training set? Need a single-letter data set?
from handwriting-ocr.
Yes, for the CNN you need single-letter dataset. For Bi-RNN you need dataset containing images of whole words along with text files containing positions of lines separating individual letters. If you have the words already, you can use WordClassDM.py for manual creation of letter separating lines.
from handwriting-ocr.
Before I mentioned the project that I wanted to do, if I use OCR.ipynb to do the recognition, then I need to use two models, the models trained by CharClassifier.ipynb to identify, and the models trained by GapClassifier-BiRNN.ipynb to cut it? ?
But both of these are trained from data/words2/ reading data? Now that I have some handwritten letter training sets and word training sets in my hands, can I handle these with WordClassDM.py and do training sets?
Can you explain the work done by GapClassifier-BiRNN-Attention.ipynb, GapClassifier-Attention-RNN.ipynb, GapClassification.ipynb, GapClassification-CharClass.ipynb?
I'm sorry for delaying your time, but for the first time I did a project on identification, I didn't understand that it was too much. @Breta01
from handwriting-ocr.
Is the Gap-Classifier used in OCR.ipynb the Classifier-BiRNN.ipynb model?
from handwriting-ocr.
Ok, yes you need two models and in OCR.ipynb is used the Classifier-BiRNN.ipynb.
I train both models from data/words2/ because it contains images of words along with files which contains the positions of gaplines. For CharClassifier I just cut out those separated letters... If you have letter and word training sets, you can process word set with WordClassDM.py (need manual work) and than train the two models. (If you have individual letters, you can possibly create artificial words for training with already know positions of gaplines, but I don't have code for that.)
GapClassifier-BiRNN-Attention.ipynb, GapClassifier-Attention-RNN.ipynb, GapClassification-CharClass.ipynb, and GapClassifier.ipynb are only experimental models which don't perform so good. Just skip those files.
GapClassification.ipynb deomstrates the process of separation of characters, but final code for separation is in ocr/charSeg.py.
from handwriting-ocr.
This problem is encountered in running WordClassDM.py:
Traceback (most recent call last):
File "E:/Jupar/handwriting-ocr-master/WordClassDM.py", line 218, in
Cycler(args.index, args.data, args.save)
File "E:/Jupar/handwriting-ocr-master/WordClassDM.py", line 76, in init
self.blockLoad()
File "E:/Jupar/handwriting-ocr-master/WordClassDM.py", line 118, in blockLoad
self.data_loc, self.org_idx + self.idx, 100)
File "E:/Jupar/handwriting-ocr-master/WordClassDM.py", line 50, in loadImages
printProgressBar(i - idx, upper - idx - 1)
File "E:\Jupar\handwriting-ocr-master\ocr\viz.py", line 20, in printProgressBar
percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
ZeroDivisionError: float division by zero
Process finished with exit code 1
No changes to the words_raw data.
from handwriting-ocr.
The printProgressBar()
is just for visualisation of loading, you can remove it.
It looks like the -1
shoudn't be there.
from handwriting-ocr.
Thanks for your help, your program is very good
from handwriting-ocr.
Related Issues (20)
- Query: Punctuation Marks HOT 1
- Language HOT 3
- not giving output same as in your github ocr.ipynb ctc model HOT 9
- ValueError: zero-size array to reduction operation minimum which has no identity
- unimplementederror: tensor array has size zero, but element shape [?,256] is not fully defined. currently only static shapes are supported when packing zero-size tensorarray
- File models/gap-clas/CNN-CG.meta does not exist.
- No Function : imageNorm ? HOT 1
- 'TrainingPlot' object has no attribute 'updateCost' HOT 2
- Tensor shape error / not training my images HOT 1
- handwriting-ocr/word_classifier_CTC.ipynb question
- ModuleNotFoundError: No module named 'ocr'
- ValueError: too many values to unpack (expected 2) HOT 5
- training time
- How much time it takes for training i am waiting for 2 hours and what is value of LOSS_ITER and also can you check the train.csv, dev.csv, test.csv i have generated are good to use or have some error?
- What does this code doing and how can i visualize it's output. HOT 1
- ValueError: Cannot feed value of shape (13, 1, 3600) for Tensor 'inputs:0', which has shape '(None, 64, None, 1)'
- Javascript implementation HOT 1
- File does not exist. Received: F:\MY_PROJECT\handwriting-ocr-master\src\ocr\../../models/gap-clas/CNN-CG.meta. HOT 1
- Request for resources
- field to access
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from handwriting-ocr.