Git Product home page Git Product logo

Comments (4)

lalitpagaria avatar lalitpagaria commented on May 18, 2024

@julien-c Sorry I did not mean bad about transformers. I just want to test multiple things hence raised this request.

Following main concerns about using transformers lib as dependency -

  • Increasing the size of resultant package
  • Always need to bump up the version to use latest changes
  • Instead if there is transformers client library then it will be very lean to add in package
  • This client lib should able to connect with both local transformers pipeline (ie pure OSS run on system via Docker or rest interface) and commercial APIs

from rust-bert.

julien-c avatar julien-c commented on May 18, 2024

No worries at all your questions are interesting!

from rust-bert.

guillaume-be avatar guillaume-be commented on May 18, 2024

Hello @lalitpagaria ,

This is an interesting idea which I had not considered. The primary objective of the library was to bring state-of-the-art capabilities natively in the Rust ecosystem, not necessarily improve on the Python version. Essentially this would become a Python port of the Rust port of a Python library.

The main advantage of the Python binding of the Rust library over the Python version may be performance gain for text generation tasks (see https://guillaume-be.github.io/2020-11-21/generation_benchmarks).

I am unsure regarding the further benefits you mention:

  • I believe the adoption among developers is very high for the original Python Transformers library. This very broad community allows for a very high update frequency allowing a quick identification and correction of bugs and a quick implementation of state of the art models. I see you mention the need to bump the version of Transformers to use the latest changes - this is probably more an advantage than a drawback.
  • The size of the packages will in both case mostly be driven by Pytorch and the embedded CUDA library. These can represent several GBs for the latest CUDA versions and will be orders of magnitude larger than the Rust or Python versions binaries. I don't believe this is a promising area for optimization.
  • The ease and effectiveness to deploy would probably drive you to use the Rust version of the library and the powerful server solutions existing for Rust (e.g. rocket or actix) over Python-based servers.

The Python bindings would probably be a bit trickier to implement than the tokenizer bindings:

  • it involves downloading resource files and passing on complex configuration objects as an input
  • it involves downloading and linking the Libtorch library (by building tch-rs) which is likely to be less straight-forward than downloading Pytorch
  • Potentially additional changes required to keep the interface close to the Python version of Transformers (pretrained models leverage struct "variants" instead of paths)

I apologize for the delay in writing a response as this is a complex topic which I needed some time to reflect on. I think Python bindings may make sense - but they would require several important architecture/API decisions for the library - and I would like to drive these changes if they are added to this repository. I'd of course be happy to include you in reviewing and discussing the addition when adding these bindings. This is something I am planning to start looking into in the next few weeks.

Alternatively, an intermediate solution could be to start implementing bindings in a separate repository (using rust-bert as a dependency from cargo): I believe building bindings using PyO3 does not require you to implement them with a local Rust source code - as you can see in the rust-tokenizers library the Python bindings actually just use the Rust crate as a dependency, which happens to be a local path in this case (but could come from crates.io). This would allow you to try things out if you are mostly interested in testing the potential benefits for your project. If you try this out, feel free to reach out I'd be glad to help wherever I can.

from rust-bert.

lalitpagaria avatar lalitpagaria commented on May 18, 2024

Thank you @guillaume-be for detailed response. I like your following suggestion -

Alternatively, an intermediate solution could be to start implementing bindings in a separate repository (using rust-bert as a dependency from cargo): I believe building bindings using PyO3 does not require you to implement them with a local Rust source code - as you can see in the rust-tokenizers library the Python bindings actually just use the Rust crate as a dependency, which happens to be a local path in this case (but could come from crates.io). This would allow you to try things out if you are mostly interested in testing the potential benefits for your project. If you try this out, feel free to reach out I'd be glad to help wherever I can.

This will be more independent and easy to test hypothesis out.

BTW what do you mean about this -

which happens to be a local path in this case (but could come from crates.io)

Isn't not possible to add rust-bert as a git module, adding local path as dependency?

Anyway as I am noob in rust world and my main motivation to learn by doing so if you can guide me I can work on it in a separate repository.

from rust-bert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.