First of all, Great Work! Appreciate the novel concept of targeting multiple languages

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Invoice Extraction about lilt HOT 4 CLOSED

jpwang commented on August 12, 2024

Invoice Extraction

from lilt.

Comments (4)

jpWang commented on August 12, 2024 1

Thanks for your attention to our work.

Q1: Are these invoices monolingual or multilingual? For resource-rich languages, for example, English, LiLT+English-Roberta often performs better than LiLT+InfoXLM. Furthermore, you don't need to train a completely fresh model from scratch. For example, if your invoices are English, you can load the pre-trained LiLT+English-Roberta weight and continue to pre-train it on the unlabeled 1 million samples for a while.

Q2: Less than a week for the experimental setup described in our paper.

Q3: You can refer to the layout diversity (task difficulty), number of samples, and the SOTA performances of the public academic datasets such as FUNSD, CORD, SROIE, EPHOIE, XFUND. Generally speaking, compared with these datasets, 5000 is already a relatively sufficient number. You can also refer to our provided fine-tuning strategies and the experimental setup described in our paper.

from lilt.

vibeeshan025 commented on August 12, 2024

Thanks a lot, Currently monolingual but may be extended to other languages, I believe the LiLT+LN-Roberta method is much suited for my specific task. Is it really required to do the pre-train again with my own data from our end to achieve any better results or the fine-tuning is alone is sufficient.

First I want to see a few results before spending time and money on doing pretraining. That's why I am asking.

from lilt.

jpWang commented on August 12, 2024

Generally, performing fine-tuning alone can achieve a satisfactory result. But when you really want to utilize the unlabeled "in-domain" samples, or you really want to further improve the performance, you can try the strategy of continuing pre-training.

from lilt.

Bhageshwarsingh commented on August 12, 2024

@vincentAGNES HI, I recently came across your project. I have a few doubts and it'd be very helpful to me if you could please make some time and help me out.

from lilt.

Recommend Projects

Invoice Extraction about lilt HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent