Git Product home page Git Product logo

awesome-huge-models's Introduction

awesome-huge-models Awesome

MIT License

A collection of AWESOME things about HUGE AI models.

There is a trend of training large-scale deep learning models (w.r.t. params, dataset, FLOPs) led by big companies. These models achieve the SoTA performance at a high price, with bags of training tricks and distributed training systems. Keeping an eye on this trend informs us of the current boundaries of AI models. [Intro in Chinese]

To support the open source process of LLM, we highlight the open-sourced LLM models here:

LLaMA-65B, GLM-130B, BLOOM-176B, OPT-175B, T5-11B, UL2-20B, RWKV-14B, Cerabras-GPT-13B, Dolly-12B.

Contents

Survey

Big models in NLP

Models

Language Model

  • StableLM [Stability AI] Apr. 2023 [open]

    Field: Language
    Params: 3B, 7B, 15B, 30B, 65B, 175B
    Training Data: The Pile (1.5t tokens)
  • Dolly 2.0 [EleutherAI] Apr. 2023 [open]

    Field: Language
    Params: 12B
    Training Data: databricks-dolly-15k
  • Cerabras-GPT [Cerabras] Mar. 2023 [open]
    Training Compute-Optimal Large Language Models [preprint]

    Field: Language
    Params: 13B
    Training Data: (371B tokens)
  • GPT-4 [OpenAI] Mar. 2023 [close]
    GPT-4 Technical Report [Preprint]

    Field: Language-Vision
  • LLaMa [Meta] Feb. 2023 [open]
    Open and Efficient Foundation Language Models [Preprint]

    Field: Language
    Params: 65B
    Training Data: 4TB (1.4T tokens)
    Training Cost: 1,022,362 (2048 80G-A100 x 21 days)
    Training Power Consumption: 449 MWh
  • RWKV-4-14B [Personal] Dec. 2022 [open]

    Field: Language
    Params: 14B
    Training Data: (332B tokens)
  • AnthropicLM [Anthropic] Dec. 2022 [close]
    Constitutional AI: Harmlessness from AI Feedback

    Field: Language
    Params: 52B
  • BLOOM [BigScience] Nov. 2022 [open]
    A 176B-Parameter Open-Access Multilingual Language Model [Preprint]

    Field: Language
    Params: 176B
    Training Data: 174GB (336B tokens)
    Training Cost: 1M A100 GPU hours = 384 80G-A100 x 4 months
    Training Power Consumption: 475 MWh
    Training Framework: Megatron + Deepspeed
  • Galactica [Meta] Nov. 2022 [open] A scientific language model trained on over 48 million scientific texts [Preprint]

    Field: Language
    Params: 125M, 1.3B, 6.7B, 30B, 120B
  • Pythia [EleutherAI] Oct. 2022 [open]

    Field: Language
    Params: 12B
  • GLM-130B [BAAI] Oct. 2022 [open]
    GLM-130B: An Open Bilingual Pre-trained Model [ICLR'23]

    Field: Language
    Params: 130B
    Training Data: (400B tokens)
    Training Cost: 516,096 A100 hours = 768 40G-A100 x 28 days
    Training Framework: Megatron + Deepspeed
  • UL2 [Google] May 2022 [open]
    Unifying Language Learning Paradigms [Preprint]

    Field: Language
    Params: 20B (1T tokens)
    Training Data: 800GB
    Achitecture: En-De
    Training Framework: Jax + T5x
  • OPT [Meta] May 2022 [open]
    OPT: Open Pre-trained Transformer Language Models [Preprint]

    Field: Language
    Params: 175B
    Training Data: 800GB (180B tokens)
    Training Cost: 809,472 A100 hours =  992 80G-A100 x 34 days
    Training Power Consumption: 356 MWh
    Architecutre: De
    Training Framework: Megatron + Fairscale
  • PaLM [Google] Apr. 2022 [close]
    PaLM: Scaling Language Modeling with Pathways [Preprint]

    Field: Language
    Params: 550B
    Training Data: 3TB (780B tokens)
    Training Cost: $10M (16,809,984 TPUv4core-hours, 64 days)
    Training petaFLOPs: 2.5B
    Architecture: De
    Training Framework: Jax + T5x
  • GPT-NeoX [EleutherAI] Apr. 2022 [open]
    GPT-NeoX-20B: An Open-Source Autoregressive Language Model [Preprint]

    Field: Language
    Params: 20B
    Training Data: 525GiB
    Training petaFLOPs: 93B
    Architecture: De
    Training Framework: Megatron + Fairscale
  • InstructGPT [OpenAI] Mar. 2022 [close]
    Training language models to follow instructions with human feedback [Preprint]

    Field: Language
    Params: 175B
  • Chinchilla [DeepMind] Mar. 2022 [close]
    Training Compute-Optimal Large Language Models [Preprint]

    Field: Language
    Params: 70B
    Training Data: 5.2TB (1.4T tokens)
    Training petaFLOPs: 580M
    Architecture: De
  • EVA 2.0 [BAAI] Mar. 2022 [open]
    EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training [Preprint]

    Field: Language (Dialogue)
    Params: 2.8B
    Training Data: 180G (1.4B samples, Chinese)
  • AlphaCode [DeepMind] Mar. 2022 [close]
    Competition-Level Code Generation with AlphaCode [Preprint]

    Field: Code Generation
    Params: 41B
    Training Data: (967B tokens)
    Architecture: De
  • ST-MoE [Google] Feb. 2022 [close]
    ST-MoE: Designing Stable and Transferable Sparse Expert Models [Preprint]

    Field: Language
    Params: 296B
    Architecture: En-De, MoE
  • LaMDA [Google] Jan. 2022 [close]
    LaMDA: Language Models for Dialog Applications [Preprint]

    Field: Language (Dialogue)
    Params: 137B
    Training Data: (1.56T words)
    Training petaFLOPs: 360M
    Architecture: De
  • ERNIE-ViLG [Baidu] Dec. 2022 [close]
    ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation [Preprint]

    Field: Image Generation (text to image)
    Params: 10B
    Training Data: (145M text-image pairs)
    Architecture: Transformer, dVAE + De
  • GLaM [Google] Dec. 2021 [close]
    GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [Preprint]

    Field: Language
    Params: 1.2T
    Architecture: De, MoE
  • Gopher [DeepMind] Dec. 2021 [close]
    Scaling Language Models: Methods, Analysis & Insights from Training Gopher [Preprint]

    Field: Language
    Params: 280B
    Training Data: 1.3TB (300B tokens)
    Training petaFLOPs: 630M
    Architecture: De
  • Yuan 1.0 [inspur] Oct. 2021 [close]
    Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning [Preprint]

    Field: Language
    Params: 245B
    Training Data: 5TB (180B tokens, Chinese)
    Training petaFLOPs: 410M
    Architecture: De, MoE
  • MT-NLG [Microsoft, Nvidia] Oct. 2021 [close]
    Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model [Preprint]

    Field: Language
    Params: 530B
    Training Data: (339B tokens)
    Training petaFLOPs: 1.4B
    Architecture: De
  • Plato-XL [Baidu] Sept. 2021 [close]
    PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation [Preprint]

    Field: Language (Dialogue)
    Params: 11B
    Training Data: (1.2B samples)
  • Jurassic-1 [AI21 Labs] Aug. 2021 [close]
    Jurassic-1: Technical Details and Evaluation [Preprint]

    Field: Language
    Params: 178B
    Training petaFLOPs: 370M
    Architecture: De
  • Codex [OpenAI] July 2021 [close]
    Evaluating Large Language Models Trained on Code [Preprint]

    Field: Code Generation
    Params: 12B
    Training Data: 159GB
    Architecture: De
  • ERNIE 3.0 [Baidu] July 2021 [close]
    ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation [Preprint]

    Field: Language
    Params: 10B
    Training Data: 4TB (375B tokens, with knowledge graph)
    Architecture: En
    Objective: MLM
  • CPM-2 [BAAI] June 2021 [open]
    CPM-2: Large-scale Cost-effective Pre-trained Language Models [Preprint]

    Field: Language
    Params: 198B
    Training Data: 2.6TB (Chinese 2.3TB, English 300GB)
    Architecture: En-De
    Objective: MLM
  • HyperClova [Naver] May 2021 [close]
    What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers [Preprint]

    Field: Language
    Params: 82B
    Training Data: (562B tokens, Korean)
    Training petaFLOPs: 63B
    Architecture: De
  • ByT5 [Google] May 2021 [open]
    ByT5: Towards a token-free future with pre-trained byte-to-byte models [TACL'22]

    Field: Language
    Params: 13B
    Training Data: (101 languages)
    Architecture: En-De
  • PanGu-α [Huawei] Apr. 2021 [close]
    PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation [Preprint]

    Field: Language
    Params: 200B
    Training Data: 1.1TB (Chinese)
    Training petaFLOPs: 58M
    Architecture: De
  • mT5 [Google] Mar. 2021 [open]
    mT5: A massively multilingual pre-trained text-to-text transformer [Preprint]

    Field: Language
    Params: 13B
    Training Data: (101 languages)
    Architecture: En-De
  • WuDao-WenHui [BAAI] Mar. 2021 [open]

    Field: Language
    Params: 2.9B
    Training Data: 303GB (Chinese)
  • GLM [BAAI] Mar. 2021 [open]
    GLM: General Language Model Pretraining with Autoregressive Blank Infilling [Preprint]

    Field: Language
    Params: 10B
    Architecture: De
  • Switch Transformer [Google] Jan. 2021 [open]
    Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity [Preprint]

    Field: Language
    Params: 1.6T
    Training Data: 750GB
    Training petaFLOPs: 82M
    Architecture: En-De, MoE
    Objective: MLM
  • CPM [BAAI] Dec. 2020 [open]
    CPM: A Large-scale Generative Chinese Pre-trained Language Model [Preprint]

    Field: Language
    Params: 2.6B
    Training Data: 100G (Chinese)
    Training petaFLOPs: 1.8M
    Architecture: De
    Objective: LTR
  • GPT-3 [OpenAI] May 2020 [close]
    Language Models are Few-Shot Learners [NeurIPS'20]

    Field: Language
    Params: 175B
    Training Data: 45TB (680B Tokens)
    Training Time: 95 A100 GPU years (835584 A100 GPU hours, 355 V100 GPU years)
    Training Cost: $4.6M
    Training petaFLOPs: 310M
    Architecture: De
    Obective: LTR
  • Blender [Meta] Apr. 2020 [close]
    Recipes for building an open-domain chatbot [Preprint]

    Field: Language (Dialogue)
    Params: 9.4B
  • T-NLG [Microsoft] Feb. 2020 [close]

    Field: Language
    Params: 17B
    Training petaFLOPs: 16M
    Architecture: De
    Obective: LTR
  • Meena [Google] Jan. 2020 [close]
    Towards a Human-like Open-Domain Chatbot [Preprint]

    Field: Language (Dialogue)
    Params: 2.6B
    Training Data: 341GB (40B words)
    Training petaFLOPs: 110M
  • DialoGPT [Microsoft] Nov. 2019 [open]
    DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation [ACL'20]

    Field: Language (Dialogue)
    Params: 762M
    Training Data: (147M conversation)
    Architecture: De
  • T5 [Google] Oct. 2019 [open]
    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [JMLR'19]

    Field: Language
    Params: 11B
    Training Data: 800GB
    Training Cost: $1.5M
    Training petaFLOPs: 41M
    Architecture: En-De
    Obective: MLM
  • Megatron-LM [Nvidia] Sept. 2019 [open]
    Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [Preprint]

    Field: Language
    Params: 8.3B
    Training Data: 174 GB
    Training petaFLOPs: 9.1M
    Architecture: De
    Obective: LTR
    Training Framework: Megatron
  • Megatron-BERT [Nvidia] Sept. 2019 [open]
    Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [Preprint]

    Field: Language
    Params: 3.9B
    Training Data: 174 GB
    Training petaFLOPs: 57M
    Architecture: En
    Obective: MLM
    Training Framework: Megatron
  • RoBERTa [Meta] July 2019 [open]
    RoBERTa: A Robustly Optimized BERT Pretraining Approach [Preprint]

    Field: Language
    Params: 354M
    Training Data: 160GB
    Training Time: 1024 V100 GPU days
    Architecture: En
    Objective: MLM
  • XLNet [Google] June 2019 [open]
    XLNet: Generalized Autoregressive Pretraining for Language Understanding [NeurIPS'19]

    Field: Language
    Params: 340M
    Training Data: 113GB (33B words)
    Training Time: 1280 TPUv3 days
    Training Cost: $245k
    Architecture: En
    Objective: PLM
  • GPT-2 [OpenAI] Feb. 2019 [open]
    Language Models are Unsupervised Multitask Learners [Preprint]

    Field: Language
    Params: 1.5B
    Training Data: 40GB (8M web pages)
    Training Cost: $43k
    Training petaFLOPs: 1.5M
    Architecture: De
    Objective: LTR
  • BERT [Google] Oct. 2018 [open]
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [NAACL'18]

    Field: Language
    Params: 330M
    Training Data: 16GB (3.3B words)
    Training Time: 64 TPUv2 days (280 V100 GPU days)
    Training Cost: $7k
    Training petaFLOPs: 290k
    Architecture: En
    Objective: MLM, NSP
  • GPT [OpenAI] June 2018 [open] Improving Language Understanding by Generative Pre-Training [Preprint]

    Field: Language
    Params: 117M
    Training Data: 1GB (7k books)
    Training petaFLOPs: 18k
    Architecture: De
    Objective: LTR

Vision Models

  • MAE->WSP-2B [Meta] Mar. 2023 [close]
    The effectiveness of MAE pre-pretraining for billion-scale pretraining

    Field: Vision
    Params: 6.5B
    Training Data: (3B images)
    Architecture: Transformer
    Objective: MAE, Weakly-Supervised
  • OpenCLIP G/14 [LAION] Mar. 2023 [open]

    Field: Vision-Language
    Params: 2.5B
    Training Data: (2B images)
  • ViT-22B [Google] Feb. 2023 [close] Scaling Vision Transformers to 22 Billion Parameters

    Field: Vision
    Params: 22B
    Training Data: (4B images)
    Architecture: Transformer
    Objective: Supervised
  • InternImage-G [Shanghai AI Lab] Nov. 2022 [open] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions [CVPR'23 Highlight]

    Field: Vision
    Params: 3B
    Architecture: CNN
    Core Operator: Deformable Convolution v3
  • Stable Diffusion [Stability AI] Aug. 2022 [open]

    Field: Image Generation (text to image)
    Params: 890M
    Training Data: (5B images)
    Architecture: Transformer, Diffusion
  • Imagen [Google] May 2022
    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [Preprint]

    Field: Image Generation (text to image)
    Text Encoder: T5
    Image Decoder: Diffusion, Upsampler
  • Flamingo [DeepMind] Apr. 2022 [close]
    Flamingo: a Visual Language Model for Few-Shot Learning [Preprint]

    Field: Vision-Language
    Params: 80B
  • DALL·E 2 [OpenAI] Apr. 2022
    Hierarchical Text-Conditional Image Generation with CLIP Latents [Preprint]

    Field: Image Generation (text to image)
    Text Encoder: GPT2 (CLIP)
    Image Encoder: ViT (CLIP)
    Image Decoder: Diffusion, Upsampler
  • BaGuaLu [BAAI, Alibaba] Apr. 2022
    BaGuaLu: targeting brain scale pretrained models with over 37 million cores [PPoPP'22]

    Field: Vision-Language
    Params: 174T
    Architecture: M6
  • SEER [Meta] Feb. 2022 [open]
    Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision [Preprint]

    Field: Vision
    Params: 10B
    Training Data: (1B images)
    Architecture: Convolution
    Objective: SwAV
  • ERNIE-ViLG [Baidu] Dec. 2021
    ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation [Preprint]

    Field: Image Generation (text to image)
    Params: 10B
    Training Data: (145M text-image pairs)
    Architecture: Transformer, dVAE + De
  • NUWA [Microsoft] Nov. 2021 [open]
    NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion [Preprint]

    Field: Vision-Language
    Generatioon: Image, Video
    Params: 870M
  • SwinV2-G [Google] Nov. 2021 [open]
    Swin Transformer V2: Scaling Up Capacity and Resolution [CVPR'22]

    Field: Vision
    Params: 3B
    Training Data: 70M
    Architecture: Transformer
    Objective: Supervised
  • Zidongtaichu [CASIA] Sept. 2021 [close]

    Field: Image, Video, Language, Speech
    Params: 100B
  • ViT-G/14 [Google] June 2021
    Scaling Vision Transformers [Preprint]

    Field: Vision
    Params: 1.8B
    Training Data: (300M images)
    Training petaFLOPs: 3.4M
    Architecture: Transformer
    Objective: Supervised
  • CoAtNet [Google] June 2021 [open]
    CoAtNet: Marrying Convolution and Attention for All Data Sizes [NeurIPS'21]

    Field: Vision
    Params: 2.4B
    Training Data: (300M images)
    Architecture: Transformer, Convolution
    Objective: Supervised
  • V-MoE [Google] June 2021
    Scaling Vision with Sparse Mixture of Experts [NeurIPS'21]

    Field: Vision
    Params: 15B
    Training Data: (300M images)
    Training Time: 16.8k TPUv3 days
    Training petaFLOPs: 33.9M
    Architecture: Transformer, MoE
    Objective: Supervised
  • CogView [BAAI, Alibaba] May 2021 </>
    CogView: Mastering Text-to-Image Generation via Transformers [NeurIPS'21]

    Field: Vision-Language
    Params: 4B
    Training Data: (30M text-image pairs)
    Training petaFLOPs: 27M
    Image Encoder: VAE
    Text Encoder & Image Decoder: GPT2
  • M6 [Alibaba] Mar. 2021
    M6: A Chinese Multimodal Pretrainer [Preprint]

    Field: Vision-Language
    Params: 10T
    Training Data: 300G Texts + 2TB Images
    Training petaFLOPs: 5.5M
    Fusion: Single-stream
    Objective: MLM, IC
  • DALL·E [OpenAI] Feb. 2021
    Zero-Shot Text-to-Image Generation [ICML'21]

    Field: Image Generation (text to image)
    Params: 12B
    Training Data: (250M text-images pairs)
    Training petaFLOPs: 47M
    Image Encoder: dVAE
    Text Encoder & Image Decoder: GPT2
  • CLIP [OpenAI] Jan. 2021
    Learning Transferable Visual Models From Natural Language Supervision [ICML'22]

    Field: Vision-Language
    Training Data: 400M text-image pairs
    Training petaFLOPs: 11M
    Image Encoder: ViT
    Text Encoder: GPT-2
    Fusion: Dual Encoder
    Objective: CMCL
  • ViT-H/14 [Google] Oct. 2020 [open]
    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [ICLR'20]

    Field: Vision
    Params: 632M
    Training Data: (300M images)
    Training petaFLOPs: 13M
    Architecture: Transformer
    Objective: Supervised
  • iGPT-XL [OpenAI] June 2020 [open]
    Generative Pretraining From Pixels [ICML'20]

    Field: Image Generation
    Params: 6.8B
    Training Data: (1M images)
    Training petaFLOPs: 33M
    Architecture: Transformer, De
  • BigGAN-deep [DeepMind] Sept. 2018 [open]
    Large Scale GAN Training for High Fidelity Natural Image Synthesis [ICLR'19]

    Field: Image Generation
    Params: 158M
    Training Data: (300M images)
    Training petaFLOPs: 3M
    Architecture: Convolution, GAN
    Resolution: 512x512

Reinforcement Learning

  • PaLM-E [Google] March 2023 [close]
    PaLM-E: An Embodied Multimodal Language Model [Preprint]

    Field: Reinforcement Learning
    Params: 562B (540B LLM + 22B Vi)
  • Gato [DeepMind] May 2022 [close]
    A Generalist Agent [Preprint]

    Field: Reinforcement Learning
    Params: 1.2B
    Training Data: (604 Tasks)
    Objective: Supervised

Speech

  • USM [Google] Mar. 2023 [close]
    Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages [Preprint]

    Field: Speech
    Params: 2B
    Training Data: 12,000,000 hours
  • Whisper [OpenAI] Sept. 2022 [close]
    Robust Speech Recognition via Large-Scale Weak Supervision [Preprint]

    Field: Speech
    Params: 1.55B
    Training Data: 680,000 hours
    Objective: Weakly Supervised
  • HuBERT [Meta] June 2021 [open]
    HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [Preprint]

    Field: Speech
    Params: 1B
    Training Data: 60,000 hours
    Objective: MLM
  • wav2vec 2.0 [Meta] Oct. 2020 [open]
    wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations [NeurIPS'20]

    Field: Speech
    Params: 317M
    Training Data: 50,000 hours
    Training petaFLOPs: 430M
    Objective: MLM
  • DeepSpeech 2 [Meta] Dec. 2015 [open]
    Deep Speech 2: End-to-End Speech Recognition in English and Mandarin [ICML'15]

    ```yaml
    Field: Speech
    Params: 300M
    Training Data: 21,340 hours
    ```
    

Science

  • AlphaFold 2 [DeepMind] July 2021 [open]
    Highly accurate protein structure prediction with AlphaFold [Nature]

    Field: Biology
    Params: 21B
    Training petaFLOPs: 100k

Distributed Training Framework

Deep Learning frameworks supportting distributed training are marked with *.

PyTorch Ecosystem

XLA Ecosystem

Other Frameworks

Inference Frameworks

Recommendation Training Framework

  • HET [Tencent] Dec. 2021
    HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework [VLDB'22]

  • Persia [Kuaishou] Nov. 2021
    Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters [Preprint]

    Embeddings Params: 100T
  • ZionEX [Meta] Apr. 2021
    Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models [ISCA'21]

    Embeddings Params: 10T
  • ScaleFreeCTR [Huawei] Apr. 2021
    ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table [SIGIR'21]

  • Kraken [Kuaishou] Nov. 2020
    Kraken: Memory-Efficient Continual Learning for Large-Scale Real-Time Recommendations [SC'20]

  • TensorNet [Qihoo360] Sept. 2020 [open]

  • HierPS [Baidu] Mar. 2020
    Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems [MLSys'20]

  • AIBox [Baidu] Oct. 2019
    AIBox: CTR Prediction Model Training on a Single Node [CIKM'20]

    Embeddings Params: 0.1T
  • XDL [Alibaba] Aug. 2019
    XDL: an industrial deep learning framework for high-dimensional sparse data [DLP-KDD'21]

    Embeddings Params: 0.01T

Keys Explanations

  • Company tags: the related company name. Other institudes may also involve in the job.
  • Params: number of parameters of the largest model
  • Training data size, training cost and training petaFLOPs may have some uncertainty.
  • Training cost
    • TPUv2 hour: $4.5
    • TPUv3 hour: $8
    • V100 GPU hour: $0.55 (2022)
    • A100 GPU hoor: $1.10 (2022)
  • Architecture
    • En: Encoder-based Language Model
    • De: Decoder-based Language Model
    • En-De=Encoder-Decoder-based Language Model
    • The above three architectures are powered with transformers.
    • MoE: Mixture of Experts
  • Objective (See explanation in section 6–8 of this paper)
    • MLM: Masked Language Modeling
    • LTR: Left-To-Right Language Modeling
    • NSP: Next Sentence Prediction
    • PLM: Permuted Language Modeling
    • IC: Image Captioning
    • VLM: Vision Languauge Matching
    • CMCL: Cross-Modal Contrastive Learning
  • FLOPs: number of FLOating-Point operations [explanation]
    • 1 petaFLOPs = 1e15 FLOPs

awesome-huge-models's People

Contributors

zhengzangw avatar pr0ck0 avatar erjanmx avatar cpsxhao avatar maxwell-lyu avatar saibo-creator avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.