Git Product home page Git Product logo

Comments (4)

jpWang avatar jpWang commented on August 12, 2024 1

Thanks for your attention to our work.

Q1: Are these invoices monolingual or multilingual? For resource-rich languages, for example, English, LiLT+English-Roberta often performs better than LiLT+InfoXLM. Furthermore, you don't need to train a completely fresh model from scratch. For example, if your invoices are English, you can load the pre-trained LiLT+English-Roberta weight and continue to pre-train it on the unlabeled 1 million samples for a while.

Q2: Less than a week for the experimental setup described in our paper.

Q3: You can refer to the layout diversity (task difficulty), number of samples, and the SOTA performances of the public academic datasets such as FUNSD, CORD, SROIE, EPHOIE, XFUND. Generally speaking, compared with these datasets, 5000 is already a relatively sufficient number. You can also refer to our provided fine-tuning strategies and the experimental setup described in our paper.

from lilt.

vibeeshan025 avatar vibeeshan025 commented on August 12, 2024

Thanks a lot, Currently monolingual but may be extended to other languages, I believe the LiLT+LN-Roberta method is much suited for my specific task. Is it really required to do the pre-train again with my own data from our end to achieve any better results or the fine-tuning is alone is sufficient.

First I want to see a few results before spending time and money on doing pretraining. That's why I am asking.

from lilt.

jpWang avatar jpWang commented on August 12, 2024

Generally, performing fine-tuning alone can achieve a satisfactory result. But when you really want to utilize the unlabeled "in-domain" samples, or you really want to further improve the performance, you can try the strategy of continuing pre-training.

from lilt.

Bhageshwarsingh avatar Bhageshwarsingh commented on August 12, 2024

@vincentAGNES HI, I recently came across your project. I have a few doubts and it'd be very helpful to me if you could please make some time and help me out.

from lilt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.