Comments (3)
Input data preprocess
bros/preprocess/funsd_spade/preprocess.py
Lines 74 to 86 in 55c52d0
- The data must have 4 point quadrangle coordinates. If you have a rectangle coordinate, transform it into (8,) shape.
- Tokenize transcription(GT or output of OCR) using bert tokenizer.
bros/preprocess/funsd_spade/preprocess.py
Line 31 in 55c52d0
KIE task
Please refer to the code block below.
bros/preprocess/funsd_spade/preprocess.py
Lines 96 to 116 in 55c52d0
from bros.
Thank you!
For now I am interested in token classification task. To clarify, let's say for each document I have:
- a list of words
- a list of bounding boxes corresponding to those words
- and a list of labels for each box
Which type of preprocessing should I do? For FUNSD I see there are two types funsd
and funsd_spade
.
I ran both preprocessing and see that parse
will be different in the processed files. I appreciate if you can tell me conceptually the reason for this difference.
from bros.
Simply,
funsd
: for BIO-tagging decoderfunsd_spade
: for SPADE style decoder
Since BIO-tagging approach is common, I recommend using this method first.
from bros.
Related Issues (20)
- Correct implementation of RelationExtractor HOT 2
- How to solve lr = 0 after training 5 epochs HOT 1
- Suggestions for implement pre-training HOT 1
- MLM Pretraining missing bbox inputs HOT 2
- RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED
- Format of label & output in Relation Extraction task
- Clarification regarding `num_samples_per_epoch` HOT 1
- Clarification on table 5
- The dataset for CORD linking task
- Bounding box clarification
- TorchText Issue on Google Colab HOT 2
- Change bert model
- What does the result mean?
- config parameter max_seq_length: 512
- How to convert BIO-tagged sequence to SPADE
- Can you also provide the inference file for this repository? HOT 5
- Inference code for EL task HOT 1
- Question about EL Task Experiment Results HOT 1
- F-score on CORD dataset
- F-score on CORD dataset HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bros.