Git Product home page Git Product logo

stepantita / nano-bert Goto Github PK

View Code? Open in Web Editor NEW
11.0 1.0 2.0 39.08 MB

Nano-BERT is a straightforward, lightweight and comprehensible custom implementation of BERT, inspired by the foundational "Attention is All You Need" paper. The primary objective of this project is to distill the essence of transformers by simplifying the complexities and unnecessary details.

License: MIT License

Jupyter Notebook 99.14% Python 0.86%
bert-model deep-learning imdb-dataset llms machine-learning neural-network nlp text-classification

nano-bert's Introduction

nano-BERT

transformers-2

Nano-BERT: A Simplified and Understandable Implementation of BERT

Nano-BERT is a straightforward, lightweight and comprehensible custom implementation of BERT, inspired by the foundational "Attention is All You Need" paper. The primary objective of this project is to distill the essence of transformers by simplifying the complexities and unnecessary details, making it an ideal starting point for those aiming to grasp the fundamental ideas behind transformers.

Key Features and Focus ๐Ÿš€:

  • Simplicity and Understandability: Nano-BERT prioritizes simplicity and clarity, making it accessible for anyone looking to understand the core concepts of transformers.

  • Multi-Headed Self Attention: The implementation of multi-headed self-attention is intentionally less efficient but more descriptive. Each attention head is treated as a separate object, emphasizing transparency over optimization techniques like matrix transposition and efficient multiplication.

  • Educational Purposes: This project is designed for educational purposes, offering a learning platform for individuals interested in transformer architectures.

  • Customizability: Nano-BERT allows extensive customization, enabling users to experiment with various parameters such as the number of layers, heads, and embedding sizes. It serves as a playground for exploring the impact of different configurations on model performance.

  • Inspiration: The project draws inspiration from ongoing research endeavors related to efficient LLM fine-tuning space-model. Additionally, it is influenced by the deep learning series conducted by Andrej Karpathy YouTube, particularly the nanoGPT project.

  • Motivation and Development: Nano-BERT originated from the author's curiosity about embedding custom datasets into a three-dimensional space using BERT. To achieve this, the goal was to construct a fully customizable version of BERT, providing complete control over the model's behavior. The motivation was to comprehend how BERT could handle datasets with words as tokens, diverging from the common sub-word approach.

Community Engagement ๐Ÿ’ฌ: While Nano-BERT is not intended for production use, contributions, suggestions, and feedback from the community are highly encouraged. Users are welcome to propose improvements, simplifications, or enhanced descriptions by creating pull requests or issues.

Exploration and Experimentation ๐ŸŒŽ: Nano-BERT's flexibility enables users to experiment freely. Parameters like the number of layers, heads, and embedding sizes can be tailored to specific needs. This customizable nature empowers users to explore diverse configurations and assess their impact on model outcomes.

Note: Nano-BERT was developed with a focus on educational exploration and understanding, and it should be utilized within the scope of educational and experimental contexts only!

Installation ๐Ÿ› ๏ธ

Prerequisites

  • Python 3.10.x
  • pip*
pip install torch

Note: to be able to run demos you might need some additional packages, but for base model all you needs is pytorch

pip install tqdm scikit-learn matplotlib plotly

Package installation

โš ๏ธ: currently only available through GitHub, but pip version is coming soon!

git clone https://github.com/StepanTita/nano-BERT.git

Usage Example โš™๏ธ

from nano_bert.model import NanoBERT
from nano_bert.tokenizer import WordTokenizer

vocab = [...]  # a list of tokens (or words) to use in tokenizer

tokenizer = WordTokenizer(vocab=vocab, max_seq_len=128)

# Usage:
input_ids = tokenizer('This is a sentence')  # or tokenizer(['This', 'is', 'a', 'sentence'])

# Instantiate the NanoBERT model
nano_bert = NanoBERT(input_ids)

# Example usage
embedded_text = nano_bert.embedding(input_ids)
print(embedded_text)

Results ๐Ÿ“ˆ:

Benchmarks ๐Ÿ†:

For all of the following experiments we use the following configuration:

n_layer = 1
n_head = 1
dropout = 0.1
n_embed = 3
max_seq_len = 128
epochs = 200
batch_size = 32
Dataset Accuracy F-1 Score
IMDB Sentiment (2-class) 0.734 0.745
HateXplain Data (2-class) 0.693 0.597

Result plots IMDB:

accuracy-IMDB f1-IMDB

Interpretation โ‰๏ธ:

Attentions Visualized:

Attention-IMDB-1 Attention-IMDB-2 Attention-IMDB-3 Attention-IMDB-4

Embeddings Visualized in 3D:

Embeddings-3d-1 Embeddings-3d-2 Embeddings-3d-3 Embeddings-3d-4 Embeddings-3d-5

Note: see demo.ipynb and imdb_demo.ipynb for better examples

License ๐Ÿ“„

This project is licensed under the MIT License. See the LICENSE.md file for details.

nano-bert's People

Contributors

stepantita avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.