Comments (10)
Hi,
The dict are composed like so :
it's an dictionary, for each keys it contains two informations :
- The full text (without the position of each text) where each '\n' or '\t' has been replaced by a space
- A numpy array, each element in the array represent a single character of the text, and have a 0 if this character don't belongs to a target category, or if so, the number of the category it belongs to.
If you want to create your own training file :
- Put your images and texts in the same folder
- In main of task3/src/my_data.py, call create_data with the path of the folder, or just uncomment line 222 and 223
The create_data function is located in task3/src/my_data.py line 139
from icdar-2019-sroie.
Could you please help me with providing an example of what a "texts" Would be along with an image,
Is it the output of hocr, Or just text saved in a word document?
from icdar-2019-sroie.
The text can be either be extracted by the already existing text file (provided by the SROIE challenge, where you have the text and the position). But since Task 3 don't need this information, the function sort_text (file my_data.py line 109), it takes as input the text file and return only the text with \n to separate lines.
If you don't want to use the provided text files and rather use your own image, you can use an OCR engine such as Amazon Textract, Tesseract, or many other to extract the text from the image, (Be sure when you extract the text to convert it uppercase) and provide this text (so basically a string containing \n) to the create_data function (file my_data.py line 139), where you replace the txt_files by the text you have extracted.
This also mean that you are not forced to save the text you extracted from an image in a text file, just tweak the create_data function to have as an input a path to a folder containing all the image you want to create a dataset from and create an array where for each image you can the extracted text corresponding
from icdar-2019-sroie.
@Karim-Baig I think we don't need images for task 3. Task 3 uses RNN for character-level classification. I guess that's also one reason why the task doesn't work for more complex documents.
related issue
from icdar-2019-sroie.
Yes, since Task 3 does not require images but only text, you can do the text extraction process as a standalone program instead of doing it while creating the training file
from icdar-2019-sroie.
@NISH1001 had mentioned about passing both the text and positions as embedding to a CNN in order to better localize the key-value pairs in documents . Is there any implementation of the same of what you mentioned?
I saw LayoutLM, but there the pipeline for creating and predicting on custom dataset was not clear!
CharGrid : I was not able to find any open implementation of the same.
from icdar-2019-sroie.
@Karim-Baig one possibility is to use some character-level embedding. In this case (task 3), it's already doing that using RNN. However, I presume that we could pre-train those layers in someways with a lot of text before doing the downstream (classification) tasks. That might help it to generalize I guess.
On another project (not related to this thread), I am using fasttext-based embedding with Graph Neural Network for classification. The results are way better than the RNN-based architecture. There I generated the embedding in unsupervised manner with all the documents. And eventually, use those embedding for the classification (along with some custom features for each word/token).
from icdar-2019-sroie.
@NISH1001 had mentioned about passing both the text and positions as embedding to a CNN in order to better localize the key-value pairs in documents . Is there any implementation of the same of what you mentioned?
I saw LayoutLM, but there the pipeline for creating and predicting on custom dataset was not clear!
CharGrid : I was not able to find any open implementation of the same.
About CharGrid: I haven't tried that either. One variation I have tried is using fasttext embedding to UNet. Directly adding a 64-dimension vector to the input image. This bumps up the input channel to 67 (64 + 3). However, the training is very inefficient because of the high dimensionality. I saw one paper doing the same with 32+3 input channels. Tried that. still, training was very inefficient (both memory and time).
from icdar-2019-sroie.
I have placed the test files into: tmp/task3-test(347p) for images
and : tmp/text.task1&2-test(361p) for text files containing co-ordinates and extracted text exactly in the format of ICDAR dataset.
When I run the my_data.pyto create the test_dict.pth, it runs fine and shows no error.
However, when I'm running the test.py using the trained model.pth file, it shows:
Traceback (most recent call last):
File "test.py", line 44, in <module>
test()
File "test.py", line 27, in test
oupt = model(text_tensor)
File "C:\Users\1632613\AppData\Local\conda\conda\envs\gpoc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\1632613\Pictures\task3\my_models\__init__.py", line 13, in forward
embedded = self.embed(inpt)
File "C:\Users\1632613\AppData\Local\conda\conda\envs\gpoc\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\1632613\AppData\Local\conda\conda\envs\gpoc\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "C:\Users\1632613\AppData\Local\conda\conda\envs\gpoc\lib\site-packages\torch\nn\functional.py", line 1916, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
anyidea? @aureliensimon @patrick22414 @Karim-Baig @lyrab96
from icdar-2019-sroie.
@anitchakraborty, I am facing the same issue, how did you mange to pass this issue? please help!
from icdar-2019-sroie.
Related Issues (20)
- task2 img = Image.open(buf).convert('L')OSError: cannot identify image file <_io.BytesIO object at 0x7f20e6463a98>
- I wonder, can we improve final score, if we encode each word and masking some numeric entry followed by classification, rather than character level classification. HOT 1
- Performance for higher number of classes in classification task 3 HOT 2
- task1(data_provider.py)
- can you provide any pretrained model? or a demo.py / demo.ipynb ?
- Code to predict and evaluate the model for task 1? HOT 2
- is there a pretrained model available? HOT 1
- How to evaluate the JSON dump created by the task3/src/test.py ? HOT 1
- Justification for Robust Padding
- building a .so files for CTPN method of task1 HOT 2
- How to crop 8 needed cells for tickets like below. [email protected]
- Custom Dataset creation
- main.py file missing in task 3 HOT 3
- AssertionError: Torch not compiled with CUDA enabled in task 3/src/train.py HOT 1
- Training data for task3 HOT 1
- Task 3: Inference Issue HOT 2
- my_data.py ???
- No module named 'models' HOT 2
- task2 img = Image.open(buf).convert('L')
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from icdar-2019-sroie.