Git Product home page Git Product logo

tf-transformers's Introduction



Tests Coverage License

Tensorflow Transformers

tf-transformers: faster and easier state-of-the-art Transformer in TensorFlow 2.0

Imagine auto-regressive generation to be 90x faster. tf-transformers (Tensorflow Transformers) is designed to harness the full power of Tensorflow 2, designed specifically for Transformer based architecture.

These models can be applied on:

  • ๐Ÿ“ Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.
  • ๐Ÿ–ผ๏ธ Images, for tasks like image classification, object detection, and segmentation.
  • ๐Ÿ—ฃ๏ธ Audio, for tasks like speech recognition and audio classification. (Coming Soon)

Unique Features

  • Faster AutoReggressive Decoding
  • TFlite support
  • Creating TFRecords is simple.
  • Auto-Batching tf.data.dataset or tf.ragged tensors
  • Everything is dictionary (inputs and outputs)
  • Multiple mask modes like causal, user-defined, prefix.
  • tensorflow-text tokenizer support
  • Supports GPU, TPU, multi-GPU trainer with wandb, multiple callbacks, auto tensorboard

Benchmark on GPT2 text generation

GPT2 text generation with max_length=64, num_beams=3 .

tf_transformers : 31 minutes
huggingface_tf  : 83 minutes
huggingface_pt  : 36 minutes
huggingface_jax : 35 minutes

From 83 minutes to 31 minutes is a significant speedup. 92 % speedup. On an average, tf-transformers is 80-90 % speedup than HuggingFace Tensorflow implementation and in most cases it is comparable or faster than PyTorch.

More benchmarks can be found in benchmark

Installation

With pip

This repository is tested on Python 3.7+ and TensorFlow 2.7.

Recommended prerequistes

pip install sentencepiece
pip install tensorflow-text >= 2.7.3
pip install tqdm

Install tensorflow >= 2.7.0 [CPU or GPU] as per your machine. You should install tf-transformers in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

First, create a virtual environment with the version of Python you're going to use and activate it.

Then, you will need to install at least one of TensorFlow. Please refer to TensorFlow installation page, installation pages regarding the specific install command for your platform. We highly recommend to install [tensorflow-text] (https://www.tensorflow.org/text).

When one of those backends has been installed, tf-transformers can be installed using pip as follows:

pip install tf-transformers

From source

git clone https://github.com/legacyai/tf-transformers.git
pip install poetry
cd tf-transformers
poetry install

Quick tour

tf-transformers API is very simple and minimalistic.

>>> from tf_transformers.models import GPT2Model
>>> model = GPT2Model.from_pretrained('gpt2')
>>> model.save_checkpoint("/tmp/gpt2_model/") # Save Model

For text-generation, it is very important to add :obj:use_auto_regressive=True. This is required for all the models.

>>> from tf_transformers.models import GPT2Model
>>> model = GPT2Model.from_pretrained('gpt2', use_auto_regressive=True)

To serialize save and load model

>>> from tf_transformers.models import GPT2Model
>>> model = GPT2Model.from_pretrained('gpt2')
>>> model.save_transformers_serialized("/tmp/gpt2_serialized/")

# To load a serialized models for inference in prodcution:

>>> import tensorflow as tf
>>> loaded = tf.saved_model.load("/tmp/gpt2_serialized/")
>>> model  = loaded.signatures['serving_default']

Model inputs and outputs

In tf-transformers we mostly followed Functional API from keras. All models in tf-transformers are connected and always have following functionality.

Model inputs

If tf.keras.Model or tf_transformers.core.LegacyModel, use: print(model.input).

If tf.keras.Layer or tf_transformers.core.LegacyLayer, use: print(model.model_inputs).

Model outputs

If tf.keras.Model or tf_transformers.core.LegacyModel, use: print(model.output).

If tf.keras.Layer or tf_transformers.core.LegacyLayer, use: print(model.model_outputs).

Tutorials

We have covered tutorials covering pre-training, finetuning, classfication, QA, NER so much more.

Model usage

TFlite Tutorials

Why should I use tf-transformers?

  1. Use state-of-the-art models in Production, with less than 10 lines of code.

    • High performance models, better than all official Tensorflow based models
    • Very simple classes for all downstream tasks
    • Complete TFlite support for all tasks.
  2. Make industry based experience to avaliable to students and community with clear tutorials

  3. Train any model on GPU, multi-GPU, TPU with amazing tf.keras.Model.fit

    • Train state-of-the-art models in few lines of code.
    • All models are completely serializable.
  4. Customize any models or pipelines with minimal or no code change.

Research

The Research section has codes for pre-training different models ranging from **MLM, T5, CLIP etc **. All these scripts are designed to harness full power of tensorflow-io pipeline and tested on TPU V2 and TPU V3. Bugs are expected in those, but it serves as a purpose for practioners to start or modifying what we have already done.

Contributions

Joint Albert (Smallest and best Transformer based model ever) on GLUE.

We have conducted few experiments to squeeze the power of Albert base models ( concept is applicable to any models and in tf-transformers, it is out of the box.)

The idea is minimize the loss for specified task in each layer of your model and check predictions at each layer. as per our experiments, we are able to get the best smaller model (thanks to Albert), and from layer 4 onwards we beat all the smaller model in GLUE benchmark. By layer 6, we got a GLUE score of 81.0, which is 4 points ahead of Distillbert with GLUE score of 77 and MobileBert GLUE score of 78.

The Albert model has 14 million parameters, and by using layer 6, we were able to speed up the compuation by 50% .

The concept is applicable to all the models and tasks.

Codes + Read More

Long Block Sequence Transformer

By splitting input sequence into block attention and merge using FFN layer we have shown that, smaller machines will be able to perform sequence processing up to 4096 tokens in a single V100 GPU machine. The model has outperforms Pegasus Base (128 million) in PubMed summarisation despite being 60 million parameter.



Codes + Read More

Supported Models architectures

tf-transformers currently provides the following architectures .

  1. ALBERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
  2. BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
  3. BERT For Sequence Generation (from Google) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
  4. ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
  5. GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
  6. MT5 (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
  7. RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
  8. T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
  9. Vision Transformer (ViT) (from Google AI) released with the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 10 CLIP (from OpenAI) released with the paper Learning Transferable Visual Models From Natural Language Supervision by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.

Citation

We now have a page you can cite for the tf-transformers library.

tf-transformers's People

Contributors

legacyai avatar s4sarath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

tf-transformers's Issues

HF models are not using key-value caching?

I was reading the code for the HF GPT2 benchmark, and it seems like key-value caching is not being used? This is pretty important for any kind of autoregressive generation and would greatly speed up the decoding time. HF models have had support for key-value caching for a while, see config arguments use_cache and past_key_values here: https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2LMHeadModel.

I think it would be important for this project to re-benchmark the HF models with key-value caching enabled, as that is standard practice and without it the HF numbers are being handicapped.

Colab

This is great work!!! I have problem with TF2+HF with too many errors, reported to TF2, I aim to switch to tf-transformers. Though library did not work in colab, I guess there are some missing files? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.