Git Product home page Git Product logo

llm-foundry's Introduction

LLM Foundry

PyPi Version PyPi Package Version Chat @ Slack License


LLM Foundry

This repository contains code for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. Designed to be easy-to-use, efficient and flexible, this codebase enables rapid experimentation with the latest techniques.

You'll find in this repo:

  • llmfoundry/ - source code for models, datasets, callbacks, utilities, etc.
  • scripts/ - scripts to run LLM workloads
    • data_prep/ - convert text data from original sources to StreamingDataset format
    • train/ - train or finetune HuggingFace and MPT models from 125M - 70B parameters
      • train/benchmarking - profile training throughput and MFU
    • inference/ - convert models to HuggingFace or ONNX format, and generate responses
      • inference/benchmarking - profile inference latency and throughput
    • eval/ - evaluate LLMs on academic (or custom) in-context-learning tasks
  • mcli/ - launch any of these workloads using MCLI and the MosaicML platform
  • TUTORIAL.md - a deeper dive into the repo, example workflows, and FAQs

DBRX

DBRX is a state-of-the-art open source LLM trained by Databricks Mosaic team. It uses the Mixture-of-Experts (MoE) architecture and was trained with optimized versions of Composer, LLM Foundry, and MegaBlocks. The model has 132B total parameters and 36B active parameters. We have released two DBRX models:

Model Context Length Download
DBRX Base 32768 https://huggingface.co/databricks/dbrx-base
DBRX Instruct 32768 https://huggingface.co/databricks/dbrx-instruct

Our model weights and code are licensed for both researchers and commercial entities. The Databricks Open Source License can be found at LICENSE, and our Acceptable Use Policy can be found here.

For more information about the DBRX models, see https://github.com/databricks/dbrx.

MPT

Mosaic Pretrained Transformers (MPT) are GPT-style models with some special features -- Flash Attention for efficiency, ALiBi for context length extrapolation, and stability improvements to mitigate loss spikes. As part of MosaicML's Foundation series, we have open-sourced several MPT models:

Model Context Length Download Commercial use?
MPT-30B 8192 https://huggingface.co/mosaicml/mpt-30b Yes
MPT-30B-Instruct 8192 https://huggingface.co/mosaicml/mpt-30b-instruct Yes
MPT-30B-Chat 8192 https://huggingface.co/mosaicml/mpt-30b-chat No
MPT-7b-8k 8192 https://huggingface.co/mosaicml/mpt-7b-8k Yes
MPT-7b-8k-Chat 8192 https://huggingface.co/mosaicml/mpt-7b-8k-chat No
MPT-7B 2048 https://huggingface.co/mosaicml/mpt-7b Yes
MPT-7B-Instruct 2048 https://huggingface.co/mosaicml/mpt-7b-instruct Yes
MPT-7B-Chat 2048 https://huggingface.co/mosaicml/mpt-7b-chat No
MPT-7B-StoryWriter 65536 https://huggingface.co/mosaicml/mpt-7b-storywriter Yes

To try out these models locally, follow the instructions in scripts/inference/README.md to prompt HF models using our hf_generate.py or hf_chat.py scripts.

MPT Community

We've been overwhelmed by all the amazing work the community has put into MPT! Here we provide a few links to some of them:

  • ReplitLM: replit-code-v1-3b is a 2.7B Causal Language Model focused on Code Completion. The model has been trained on a subset of the Stack Dedup v1.2 dataset covering 20 languages such as Java, Python, and C++
  • LLaVa-MPT: Visual instruction tuning to get MPT multimodal capabilities
  • ggml: Optimized MPT version for efficient inference on consumer hardware
  • GPT4All: locally running chat system, now with MPT support!
  • Q8MPT-Chat: 8-bit optimized MPT for CPU by our friends at Intel

Tutorial videos from the community:

Something missing? Contribute with a PR!

Latest News

Hardware and Software Requirements

This codebase has been tested with PyTorch 2.2 with NVIDIA A100s and H100s. This codebase may also work on systems with other devices, such as consumer NVIDIA cards and AMD cards, but we are not actively testing these systems. If you have success/failure using LLM Foundry on other systems, please let us know in a Github issue and we will update the support matrix!

Device Torch Version Cuda Version Status
A100-40GB/80GB 2.2.1 12.1 ✅ Supported
H100-80GB 2.2.1 12.1 ✅ Supported

MosaicML Docker Images

We highly recommend using our prebuilt Docker images. You can find them here: https://hub.docker.com/orgs/mosaicml/repositories.

The mosaicml/pytorch images are pinned to specific PyTorch and CUDA versions, and are stable and rarely updated.

The mosaicml/llm-foundry images are built with new tags upon every commit to the main branch. You can select a specific commit hash such as mosaicml/llm-foundry:2.2.1_cu121_flash2-36ab1ba or take the latest one using mosaicml/llm-foundry:2.2.1_cu121_flash2-latest.

Please Note: The mosaicml/llm-foundry images do not come with the llm-foundry package preinstalled, just the dependencies. You will still need to pip install llm-foundry either from PyPi or from source.

Docker Image Torch Version Cuda Version LLM Foundry dependencies installed?
mosaicml/pytorch:2.2.1_cu121-python3.11-ubuntu20.04 2.2.1 12.1 (Infiniband) No
mosaicml/llm-foundry:2.2.1_cu121_flash2-latest 2.2.1 12.1 (Infiniband) Yes
mosaicml/llm-foundry:2.2.1_cu121_flash2_aws-latest 2.2.1 12.1 (EFA) Yes

Installation

This assumes you already have PyTorch, CMake, and packaging installed. If not, you can install them with pip install cmake packaging torch.

To get started, clone the repo and set up your environment. Instructions to do so differ slightly depending on whether you're using Docker.

With Docker (recommended)

We strongly recommend working with LLM Foundry inside a Docker container (see our recommended Docker image above). If you are doing so, follow these steps to clone the repo and install the requirements.

git clone https://github.com/mosaicml/llm-foundry.git
cd llm-foundry
pip install -e ".[gpu]"  # or `pip install -e .` if no NVIDIA GPU.

Without Docker (not recommended)

If you choose not to use Docker, you should create and use a virtual environment.

git clone https://github.com/mosaicml/llm-foundry.git
cd llm-foundry

# Creating and activate a virtual environment
python3 -m venv llmfoundry-venv
source llmfoundry-venv/bin/activate

pip install cmake packaging torch  # setup.py requires these be installed

pip install -e ".[gpu]"  # or `pip install -e .` if no NVIDIA GPU.

TransformerEngine and amp_fp8 support

NVIDIA H100 GPUs have FP8 support; this additionally requires the following installations:

pip install flash-attn==1.0.7 --no-build-isolation
pip install git+https://github.com/NVIDIA/[email protected]

See here for more details on enabling TransformerEngine layers and amp_fp8.

AMD (BETA support)

In our testing of AMD GPUs, the env setup includes:

git clone https://github.com/mosaicml/llm-foundry.git
cd llm-foundry

# Creating and activate a virtual environment
python3 -m venv llmfoundry-venv-amd
source llmfoundry-venv-amd/bin/activate

# installs
pip install cmake packaging torch
pip install -e .  # This installs some things that are not needed but they don't hurt
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

Lastly, install the ROCm enabled flash attention (instructions here).

Notes:

  1. We don't yet have a Docker image where everything works perfectly. You might need to up/downgrade some packages (in our case, we needed to downgrade to numpy==1.23.5) before everything works without issue.

Intel Gaudi

Support for LLM Foundry on Intel Gaudi devices is experimental, please use the branch habana_alpha and see the README on that branch which has install instructions and known issues.

For training and inference performance results on Intel Gaudi2 accelerators, see our blog: https://www.databricks.com/blog/llm-training-and-inference-intel-gaudi2-ai-accelerators

Quickstart

Note Make sure to go through the installation steps above before trying the quickstart!

Here is an end-to-end workflow for preparing a subset of the C4 dataset, training an MPT-125M model for 10 batches, converting the model to HuggingFace format, evaluating the model on the Winograd challenge, and generating responses to prompts.

(Remember this is a quickstart just to demonstrate the tools -- To get good quality, the LLM must be trained for longer than 10 batches 😄)

cd scripts

# Convert C4 dataset to StreamingDataset format
python data_prep/convert_dataset_hf.py \
  --dataset c4 --data_subset en \
  --out_root my-copy-c4 --splits train_small val_small \
  --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text '<|endoftext|>'

# Train an MPT-125m model for 10 batches
composer train/train.py \
  train/yamls/pretrain/mpt-125m.yaml \
  data_local=my-copy-c4 \
  train_loader.dataset.split=train_small \
  eval_loader.dataset.split=val_small \
  max_duration=10ba \
  eval_interval=0 \
  save_folder=mpt-125m

# Convert the model to HuggingFace format
python inference/convert_composer_to_hf.py \
  --composer_path mpt-125m/ep0-ba10-rank0.pt \
  --hf_output_path mpt-125m-hf \
  --output_precision bf16 \
  # --hf_repo_for_upload user-org/repo-name

# Evaluate the model on a subset of tasks
composer eval/eval.py \
  eval/yamls/hf_eval.yaml \
  icl_tasks=eval/yamls/copa.yaml \
  model_name_or_path=mpt-125m-hf

# Generate responses to prompts
python inference/hf_generate.py \
  --name_or_path mpt-125m-hf \
  --max_new_tokens 256 \
  --prompts \
    "The answer to life, the universe, and happiness is" \
    "Here's a quick recipe for baking chocolate chip cookies: Start by"

Note: the composer command used above to train the model refers to the Composer library's distributed launcher.

If you have a write-enabled HuggingFace auth token, you can optionally upload your model to the Hub! Just export your token like this:

export HUGGING_FACE_HUB_TOKEN=your-auth-token

and uncomment the line containing --hf_repo_for_upload ... in the above call to inference/convert_composer_to_hf.py.

Registry

You can use the registry to customize your workflows without forking the library. Some components of LLM Foundry are registrable, such as models, loggers, and callbacks. This means that you can register new options for these components, and then use them in your yaml config.

Discovering registrable components

To help find and understand registrable components, you can use the llmfoundry registry cli command.

We provide two commands currently:

  • llmfoundry registry get [--group]: List all registries, and their components, optionally specifying a specific registry. Example usage: llmfoundry registry get --group loggers or llmfoundry registry get
  • llmfoundry registry find <group> <name>: Get information about a specific registered component. Example usage: llmfoundry registry find loggers wandb

Use --help on any of these commands for more information.

How to register

There are a few ways to register a new component:

Python entrypoints

You can specify registered components via a Python entrypoint if you are building your own package with registered components.

For example, the following would register the WandBLogger class, under the key wandb, in the llm_foundry.loggers registry:

[build-system]
requires = ["setuptools>=42", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "foundry_registry"
version = "0.1.0"
dependencies = [
    "mosaicml",
    "llm-foundry",
]

[project.entry-points."llmfoundry_loggers"]
my_logger = "foundry_registry.loggers:MyLogger"

Direct call to register

You can also register a component directly in your code:

from composer.loggers import LoggerDestination
from llmfoundry.registry import loggers

class MyLogger(LoggerDestination):
    pass

loggers.register("my_logger", func=MyLogger)

Decorators

You can also use decorators to register components directly from your code:

from composer.loggers import LoggerDestination
from llmfoundry.registry import loggers

@loggers.register("my_logger")
class MyLogger(LoggerDestination):
    pass

For both the direct call and decorator approaches, if using the LLM Foundry train/eval scripts, you will need to provide the code_paths argument, which is a list of files need to execute in order to register your components. For example, you may have a file called foundry_imports.py that contains the following:

from foundry_registry.loggers import MyLogger
from llmfoundry.registry import loggers

loggers.register("my_logger", func=MyLogger)

You would then provide code_paths to the train/eval scripts in your yaml config:

...
code_paths:
  - foundry_imports.py
...

Learn more about LLM Foundry!

Check out TUTORIAL.md to keep learning about working with LLM Foundry. The tutorial highlights example workflows, points you to other resources throughout the repo, and answers frequently asked questions!

Contact Us

If you run into any problems with the code, please file Github issues directly to this repo.

If you want to train LLMs on the MosaicML platform, reach out to us at [email protected]!

llm-foundry's People

Contributors

a-jacobson avatar abhi-mosaic avatar alextrott16 avatar aspfohl avatar b-chu avatar bandish-shah avatar bcui19 avatar bmosaicml avatar cli99 avatar codestar12 avatar dakinggg avatar dblalock avatar dskhudia avatar hanlint avatar irenedea avatar j316chuck avatar jacobfulano avatar jerrychen109 avatar jjanezhang avatar landanjs avatar milocress avatar mvpatel2000 avatar nik-mosaic avatar rajammanabrolu avatar rishab-partha avatar samhavens avatar sashadoubov avatar shashankmosaicml avatar snarayan21 avatar vchiley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llm-foundry's Issues

FasterTransformer

Hi, I saw in mpt model card that the models could run with FasterTransformer
I didn't find any details about that anywhere
can you guys share the conversion scripts or help there?

Thanks

Finetune MPT models with local dataset

Hello Team,

Can you please guide on how to finetune on local datasets, the instructions given in scripts/train are not so clear. Below yaml file was given as sample example:

train_loader:
name: finetuning
dataset:
hf_name: my-local-dataset
hf_kwargs:
data_files:
train: /path/to/train.jsonl
preprocessing_fn: my.import.path:my_preprocessing_fn
split: train

So do we need to create a loading_script to load local dataset using huggingface datasets or is there a way we can directly use jsonl file paths instead of converting it into huggingface dataset & for that what changes I need to make in yaml file

Reproduce result of Boolq on LLaMA-7B

Hi
The zeroshot performance on BoolQ in LLaMA paper is 76.5. While the llm-foundry only 62.16 (zero-shot) when following tasks.yaml. The result in blog is a few-shot ? How about the zero-shot result of BoolQ using llm-foundry ? Where can I find the config to reproduce the result in blog img.

---update---

metrics/boolq/10-shot/InContextLearningMultipleChoiceAccuracy: 0.734413206577301

Add GGML support

It would be nice to have the model supported by GGML, so as to make quantized versions of it or future derivatives also run without GPU. See ggerganov/llama.cpp#1333 (comment)

I gave it a shot myself today, but only managed to get the quantized version to output gibberish - apparently it's not just the state_dict parameter names in the checkpoint that are different from LLaMA (and also from the already supported dolly-v2 and stablelm models), but also the actual transformer implementation... the meat which would go into ggml/examples/mpt/main.cpp... which is way over my head.

Perhaps someone from your dev team could at least assess how difficult such an integration would be - for someone who actually understands the implementation and/or LLMs in general. My feeling is that it requires more understanding of the mosaicml model than of GGML.

how to run in v100 GPU

when I run with “composer train.py yamls/mpt/125m.yaml train_loader.dataset.split=train_small eval_loader.dataset.split=val_small”,
I get the error,my GPU is V100


Traceback (most recent call last):
File "", line 21, in _bwd_kernel
KeyError: ('2-.-0-.-0-1e8410f206c822547fb50e2ea86e45a6-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-42648570729a4835b21c1c18cebedbfe-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, None, torch.float16, torch.float32, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('none', True, 64, False, True, True, True, 128, 128), (True, True, True, (False,), True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 254, in
main(cfg)
File "train.py", line 243, in main
trainer.fit()
File "/root/mpt-env/lib/python3.8/site-packages/composer/trainer/trainer.py", line 1766, in fit
self._train_loop()
File "/root/mpt-env/lib/python3.8/site-packages/composer/trainer/trainer.py", line 1940, in _train_loop
total_loss_dict = self._train_batch(use_grad_scaling)
File "/root/mpt-env/lib/python3.8/site-packages/composer/trainer/trainer.py", line 2118, in _train_batch
self._train_microbatches(microbatches, total_loss_dict)
File "/root/mpt-env/lib/python3.8/site-packages/composer/trainer/trainer.py", line 2213, in _train_microbatches
microbatch_loss_dict = self._train_microbatch(use_grad_scaling, current_batch_size, is_final_microbatch)
File "/root/mpt-env/lib/python3.8/site-packages/composer/trainer/trainer.py", line 2340, in _train_microbatch
microbatch_loss.backward(create_graph=self._backwards_create_graph)
File "/root/mpt-env/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/root/mpt-env/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/mpt-env/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/root/mpt-env/lib/python3.8/site-packages/flash_attn/flash_attn_triton.py", line 827, in backward
_flash_attn_backward(do, q, k, v, o, lse, dq, dk, dv,
File "/root/mpt-env/lib/python3.8/site-packages/flash_attn/flash_attn_triton.py", line 694, in _flash_attn_backward
_bwd_kernel[grid](
File "/root/mpt-env/lib/python3.8/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/root/mpt-env/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/root/mpt-env/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/root/mpt-env/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 63, in _bench
return do_bench(kernel_call)
File "/root/mpt-env/lib/python3.8/site-packages/triton/testing.py", line 140, in do_bench
fn()
File "/root/mpt-env/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "/root/mpt-env/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument
ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1.
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.
Global rank 0 (PID 6259) exited with code 1
ERROR:composer.cli.launcher:Global rank 0 (PID 6259) exited with code 1

Windows support ?

Hello
I have been unable to run the model on Windows since the install fails as it requires Triton that is only supported on Linux.

Any idea ?

Thanks in advance

Configs say optimizer is AdamW but blog post says the optimizer is LION

The training configs (e.g. https://github.com/mosaicml/llm-foundry/blob/main/scripts/train/yamls/mpt/7b.yaml#L60) say the optimizer is decoupled_adamw, which calls through to here returning a DecoupledAdamW. The announcement blog post says all the models are trained with the LION optimizer:

We also train our MPT models with [the Lion optimizer](https://arxiv.org/abs/2302.06675) rather than AdamW,
which provides stable update magnitudes and cuts optimizer state memory in half.

Are the training config yamls in this repository supposed to be the ones used for the MPT training runs? If not, will the actual configs be made available?

Broken on docker image?

I am trying to follow the Quickstart guide on the mosaicml/pytorch docker image and running into issues when trying the exact commands.

The training step is broken. In particular, there seems to be an issue setting up the StreamingDataset required for training.

For example, the command in the training README:

python ../../llmfoundry/data/text_data.py --local_path ./my-copy-c4 --split val_small

is broken:

Bus error (core dumped)
root@e81df48d8ecb:/home/llm-foundry# /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Is this just me or can anyone repro?

URL not found

I am getting:

huggingface_hub.utils._errors.HfHubHTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/mosaicml/mpt-7b-chat 

the problem about mosaicml-streaming

when I follow pip install -e ".[gpu]",I find the error about mosaicml-streaming
#-------------------------------------------------------------------------------------------
root@7730f5bd29fa:/home/mosaicml/llm-foundry# pip list|grep stream
mosaicml-streaming 0.2.1
root@7730f5bd29fa:/home/mosaicml/llm-foundry#
root@7730f5bd29fa:/home/mosaicml/llm-foundry# python scripts/data_prep/convert_dataset_hf.py --dataset c4 --data_subset en --out_root ./my-copy-c4 --splits train_small val_small --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text '<|endoftext|>'
Traceback (most recent call last):
File "/home/mosaicml/llm-foundry/llmfoundry/init.py", line 8, in
from llmfoundry.data import (ConcatTokensDataset,
File "/home/mosaicml/llm-foundry/llmfoundry/data/init.py", line 5, in
from llmfoundry.data.denoising import (MixtureOfDenoisersCollator,
File "/home/mosaicml/llm-foundry/llmfoundry/data/denoising.py", line 19, in
from llmfoundry.data.text_data import StreamingTextDataset
File "/home/mosaicml/llm-foundry/llmfoundry/data/text_data.py", line 15, in
from streaming import Stream, StreamingDataset
ImportError: cannot import name 'Stream' from 'streaming' (/usr/lib/python3/dist-packages/streaming/init.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/mosaicml/llm-foundry/scripts/data_prep/convert_dataset_hf.py", line 19, in
from llmfoundry.data.datasets import ConcatTokensDataset, NoConcatDataset
File "/home/mosaicml/llm-foundry/llmfoundry/init.py", line 32, in
raise ImportError(
ImportError: Please make sure to pip install . to get the requirements for the LLM example.
root@7730f5bd29fa:/home/mosaicml/llm-foundry#

TypeError: Object of type DictConfig is not JSON serializable for HFCausalLM

When using train.py in scripts/train with a config like

model:
  name: hf_causal_lm
  device: cpu
  pretrained: true
  pretrained_model_name_or_path: mosaicml/mpt-7b
  config_overrides:
    attn_config:
      attn_impl: triton

we get the error

Traceback (most recent call last):
  File "/home/paperspace/llm-foundry/scripts/train/train.py", line 376, in <module>
    main(cfg)
  File "/home/paperspace/llm-foundry/scripts/train/train.py", line 365, in main
    trainer.fit()
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/trainer/trainer.py", line 1766, in fit
    self._train_loop()
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/trainer/trainer.py", line 1996, in _train_loop
    self.engine.run_event(Event.BATCH_CHECKPOINT)
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/engine.py", line 293, in run_event
    self._run_nonlogger_callbacks(event)
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/engine.py", line 475, in _run_nonlogger_callbacks
    self._run_callbacks(event, callbacks)
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/engine.py", line 467, in _run_callbacks
    cb.run_event(event, self.state, self.logger)
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/callback.py", line 96, in run_event
    return event_cb(state, logger)
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/callbacks/checkpoint_saver.py", line 346, in batch_checkpoint
    self._save_checkpoint(
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/callbacks/checkpoint_saver.py", line 384, in _save_checkpoint
    saved_path = checkpoint.save_checkpoint(
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/utils/checkpoint.py", line 518, in save_checkpoint
    'state': state.state_dict(),
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/state.py", line 838, in state_dict
    state_dict['integrations'] = self._get_integrations_state_dict()
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/state.py", line 727, in _get_integrations_state_dict
    integrations['huggingface'] = self.model.get_metadata()
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/models/huggingface.py", line 404, in get_metadata
    self.model.config.save_pretrained(model_dir)
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/transformers/configuration_utils.py", line 456, in save_pretrained
    self.to_json_file(output_config_file, use_diff=True)
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/transformers/configuration_utils.py", line 845, in to_json_file
    writer.write(self.to_json_string(use_diff=use_diff))
  File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/transformers/configuration_utils.py", line 831, in to_json_string
    return json.dumps(config_dict, indent=2, sort_keys=True) + "\n"
  File "/usr/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/usr/lib/python3.9/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/usr/lib/python3.9/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.9/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type DictConfig is not JSON serializable

when a checkpoint is being saved. There's a chance I'm passing in the info incorrectly though

Evaluation result mismatch

I tried to run the 0-shot evaluation on winograd for the MPT-7b model. This is the result that I got:
Eval metrics/winograd/0-shot/InContextLearningMultipleChoiceAccuracy: 0.5055

This is the script that I use:
python eval/eval.py
eval/yamls/hf_eval.yaml
icl_tasks=eval/yamls/winograd.yaml
model_name_or_path=mosaicml/mpt-7b

The reported number for 0-shot winograd for MPT-7b is 0.878. Is there anything missing here?

image

Error in FSDP with composer

When finetuning an MPT-7B model with 8gpus, I get the following error when training is about to begin (after model and dataset loading etc.):

Traceback (most recent call last):
  File "scripts/train/train.py", line 254, in <module>
    main(cfg)
  File "scripts/train/train.py", line 197, in main
    trainer = Trainer(
  File "/mnt/nvme/home/llm_foundry/lib/python3.8/site-packages/composer/trainer/trainer.py", line 1330, in __init__
    self._rng_state = checkpoint.load_checkpoint(
  File "/mnt/nvme/home/llm_foundry/lib/python3.8/site-packages/composer/utils/checkpoint.py", line 216, in load_checkpoint
    rng_state_dicts = _restore_checkpoint(
  File "/mnt/nvme/home/llm_foundry/lib/python3.8/site-packages/composer/utils/checkpoint.py", line 446, in _restore_checkpoint
    state_dict = safe_torch_load(composer_states_filepath)
  File "/mnt/nvme/home/llm_foundry/lib/python3.8/site-packages/composer/utils/checkpoint.py", line 421, in safe_torch_load
    state_dict = torch.load(composer_states_filepath, map_location=map_location)
  File "/mnt/nvme/home/llm_foundry/lib/python3.8/site-packages/torch/serialization.py", line 771, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/mnt/nvme/home/llm_foundry/lib/python3.8/site-packages/torch/serialization.py", line 270, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/mnt/nvme/home/llm_foundry/lib/python3.8/site-packages/torch/serialization.py", line 251, in __init__
    super(_open_file, self).__init__(open(name, mode))
IsADirectoryError: [Errno 21] Is a directory: '/tmp/tmp7_pd5dbu/rank0_checkpoint'

This reference rank0_checkpoint and its parent tmp7_pd5dbu or whatever it is called for that run do not exist when I check.

Any idea on this?

does it work on local machine or someone with limited resources

does it work on my local machine or is necessary in need GPU for it to run this model?

I tried to load a model on my local machine with 15GB of ram but it's stuck model loading because of the size of the model.

Is there any way to run it on the local machine? if so please guide me on what the additional parameter needs to pass

`
import transformers
import torch
config = transformers.AutoConfig.from_pretrained(
'mosaicml/mpt-7b',
trust_remote_code=True
)

config.attn_config['attn_impl'] = 'triton'

model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b',
trust_remote_code=True,
config=config,
torch_dtype=torch.bfloat16
)
model.to(device='cuda:0')
`

FasterTransformer inference with MPT-7b

Hello! How do I run the MPT-7b as advertised with FasterTransformer? I need the fastest inference possible, if that is with FasterTransformer or FlashAttention etc.

I would be very grateful for any help!

Error in triton implementation

I used this configuration

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.attn_config['attn_impl'] = 'triton'

model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  torch_dtype=torch.bfloat16,
  trust_remote_code=True
)
model.to(device='cuda:0')

to load the instruct model.

But i the following error:
TypeError: dot() got an unexpected keyword argument 'trans_b'

Generate shorter sentences

I using mpt-intruct using hf.

I need to generate shorter sentences.
I'm using mpt-instruct for creating titles.

I need to restrict the length of the sentences.
what options are available?

I tried
length_penalty = -0.9 (but didn't work). Any suggested value for generating shorter sentences
Should i control the length using max_tokens or max_length.

I need to generate shorter sentences which will be used for creating titles/ headings.

the problem about mosaicml-streaming

when I follow pip install -e ".[gpu]",I find the error about mosaicml-streaming
#-------------------------------------------------------------------------------------------
root@7730f5bd29fa:/home/mosaicml/llm-foundry# pip list|grep stream
mosaicml-streaming 0.2.1
root@7730f5bd29fa:/home/mosaicml/llm-foundry#
root@7730f5bd29fa:/home/mosaicml/llm-foundry# python scripts/data_prep/convert_dataset_hf.py --dataset c4 --data_subset en --out_root ./my-copy-c4 --splits train_small val_small --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text '<|endoftext|>'
Traceback (most recent call last):
File "/home/mosaicml/llm-foundry/llmfoundry/init.py", line 8, in
from llmfoundry.data import (ConcatTokensDataset,
File "/home/mosaicml/llm-foundry/llmfoundry/data/init.py", line 5, in
from llmfoundry.data.denoising import (MixtureOfDenoisersCollator,
File "/home/mosaicml/llm-foundry/llmfoundry/data/denoising.py", line 19, in
from llmfoundry.data.text_data import StreamingTextDataset
File "/home/mosaicml/llm-foundry/llmfoundry/data/text_data.py", line 15, in
from streaming import Stream, StreamingDataset
ImportError: cannot import name 'Stream' from 'streaming' (/usr/lib/python3/dist-packages/streaming/init.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/mosaicml/llm-foundry/scripts/data_prep/convert_dataset_hf.py", line 19, in
from llmfoundry.data.datasets import ConcatTokensDataset, NoConcatDataset
File "/home/mosaicml/llm-foundry/llmfoundry/init.py", line 32, in
raise ImportError(
ImportError: Please make sure to pip install . to get the requirements for the LLM example.
root@7730f5bd29fa:/home/mosaicml/llm-foundry#

Precision.AMP_BF16 is not supported for CPU training

Hi,
I am trying to fine-tune the pretrained MPT-7b from HF hub.
I followed #68 and set the yaml as such
model:
name: hf_causal_lm

device: cpu

pretrained: true
pretrained_model_name_or_path: mosaicml/mpt-7b
init_device: cpu
d_model: 4096
n_heads: 32
n_layers: 32
expansion_ratio: 4
max_seq_len: ${max_seq_len}
vocab_size: 50368
attn_config:
attn_impl: triton

I am unclear why does it say I am using CPU device, the init_device is confusing as it only allows CPU and meta, while CPU is used if you want to load a pretrained model from HF hub.

in the trainer class, the device is not passed in explicitly, so it is none by default, I tried to pass in GPU as the device and it has another error "ValueError: DeviceGPU cannot be created as torch.cuda is not available."

MPT-7B Finetuning Jupyter notebook request

@vchiley @samhavens @alextrott16 ,
i was going through the MPT-7B model fine tuning documentation.
It is def well written but quite hard to grasp in the first look.

Therefore, I am putting forth this request to create a fine-tuning jupyter notebook, that folks can use to train using their local GPU or cloud GPUs (paid like A100s etc)

It would be great to have the same as a jupyter notebook.
From loading the mpt model , to loading the instruction set or combination of instruction sets and then finally running the model fine-tuning.

How to install torch 1.13.1+cu117?

pip3 install torch does not install torch + support for cuda 11.7. Therefore I'm not able to install all requirements in a new venv. pip list lists that torch version is 1.13.1, not 1.13.1_cu117. I'm on WSL on Windows (Ubuntu 22.04)

`(llmfoundry-venv) ighodgao@isha-determined:~/llm-foundry$ pip install .[gpu]
Processing /home/ighodgao/llm-foundry
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting xentropy-cuda-lib@ git+https://github.com/HazyResearch/[email protected]#subdirectory=csrc/xentropy
Cloning https://github.com/HazyResearch/flash-attention.git (to revision v0.2.8) to /tmp/pip-install-f5ns9hrp/xentropy-cuda-lib_1cb623aeb89f429dbb3595be2c38f9c1
Running command git clone --filter=blob:none --quiet https://github.com/HazyResearch/flash-attention.git /tmp/pip-install-f5ns9hrp/xentropy-cuda-lib_1cb623aeb89f429dbb3595be2c38f9c1
Running command git checkout -q 33e0860c9c5667fded5af674882e731909096a7f
Resolved https://github.com/HazyResearch/flash-attention.git to commit 33e0860c9c5667fded5af674882e731909096a7f
Running command git submodule update --init --recursive -q
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [13 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-f5ns9hrp/xentropy-cuda-lib_1cb623aeb89f429dbb3595be2c38f9c1/csrc/xentropy/setup.py", line 98, in
raise_if_cuda_home_none("--xentropy")
File "/tmp/pip-install-f5ns9hrp/xentropy-cuda-lib_1cb623aeb89f429dbb3595be2c38f9c1/csrc/xentropy/setup.py", line 48, in raise_if_cuda_home_none
raise RuntimeError(
RuntimeError: --xentropy was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.

  torch.__version__  = 1.13.1+cu117


  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

Protobuf version conflict

error: protobuf 3.20.3 is installed but protobuf>=4.22.0 is required by {'mosaicml-cli'}

I get this when I run setup.py

So I did a pip install protobuf==4.22.0

Now I get this error

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mosaicml-cli 0.4.0a6 requires argcomplete>=2.0.0, which is not installed.
mosaicml-cli 0.4.0a6 requires arrow>=1.2.2, which is not installed.
mosaicml-cli 0.4.0a6 requires backoff>=2.2.1, which is not installed.
mosaicml-cli 0.4.0a6 requires docker>=5.0.3, which is not installed.
mosaicml-cli 0.4.0a6 requires gql[websockets]>=3.4.0, which is not installed.
mosaicml-cli 0.4.0a6 requires prompt-toolkit>=3.0.29, which is not installed.
onnx 1.13.1 requires protobuf<4,>=3.20.2, but you have protobuf 4.22.0 which is incompatible.

It's the last line which is conflicting versions of protobuf

How to install MPT-7B?

Hey, I wanted to use the HF transformer library for the currently fastest inference possible (triton).
I am trying to use https://github.com/mosaicml/llm-foundry/blob/main/scripts/inference/hf_generate.py

 python test.py --temperature 1.0 \                                    (transformers-direct)                                                                                                              --name_or_path "../../../text/models/mpt-7b" \
--top_p 0.95 \
--top_k 50 \
--seed 1 \
--max_new_tokens 256 \
--attn_impl triton \
--prompts \
"The answer to life, the universe, and happiness is" \
"MosaicML is an ML training efficiency startup that is known for" \
"Here's a quick recipe for baking chocolate chip cookies: Start by" \
"The best 5 cities to visit in Europe are" \
--device cuda \
--trust_remote_code

However, I only receive this error (I installed the latest flash_attn because I can not install the exact version provided)
[and it takes ages to load before the "Loading shard" appears]:

Loading HF Config...
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Loading HF model to device=cuda and dtype=torch.bfloat16...
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
/home/ubuntu/.cache/huggingface/modules/transformers_modules/mpt-7b/attention.py:144: UserWarning: While `attn_impl: triton` can be faster than `attn_impl: flash` it uses more memory. When training larger models this can trigger alloc retries which hurts performance. If encountered, we recommend using `attn_impl: flash` if your model does not use `alibi` or `prefix_lm`.
  warnings.warn('While `attn_impl: triton` can be faster than `attn_impl: flash` ' + 'it uses more memory. When training larger models this can trigger ' + 'alloc retries which hurts performance. If encountered, we recommend ' + 'using `attn_impl: flash` if your model does not use `alibi` or `prefix_lm`.')
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.07s/it]
n_params=6649286656

Loading HF tokenizer...
/home/ubuntu/ml/llm/inference/hf/transformers-direct/test.py:167: UserWarning: pad_token_id is not set for the tokenizer. Using eos_token_id as pad_token_id.
  warnings.warn(

Generate kwargs:
{'max_new_tokens': 256, 'temperature': 1.0, 'top_p': 0.95, 'top_k': 50, 'use_cache': True, 'do_sample': True, 'eos_token_id': 0, 'pad_token_id': 0}

Tokenizing prompts...
NOT using autocast...
Warming up...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
  File "<string>", line 21, in _fwd_kernel
KeyError: ('2-.-0-.-0-3d28bf8dd4111863b189d74cf84b730b-d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-2d732a2488b7ed996facc3e641ee56bf-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('vector', True, 128, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (True, False), (True, False), (True, False), (True, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 937, in build_triton_ir
    generator.visit(fn.parse())
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/ast.py", line 407, in visit
    return visitor(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 183, in visit_Module
    ast.NodeVisitor.generic_visit(self, node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/ast.py", line 415, in generic_visit
    self.visit(item)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/ast.py", line 407, in visit
    return visitor(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 252, in visit_FunctionDef
    has_ret = self.visit_compound_statement(node.body)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 177, in visit_compound_statement
    self.last_ret_type = self.visit(stmt)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/ast.py", line 407, in visit
    return visitor(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 678, in visit_For
    self.visit_compound_statement(node.body)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 177, in visit_compound_statement
    self.last_ret_type = self.visit(stmt)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/ast.py", line 407, in visit
    return visitor(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 319, in visit_AugAssign
    self.visit(assign)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/ast.py", line 407, in visit
    return visitor(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 301, in visit_Assign
    values = self.visit(node.value)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/ast.py", line 407, in visit
    return visitor(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 339, in visit_BinOp
    rhs = self.visit(node.right)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/ast.py", line 407, in visit
    return visitor(node)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 797, in visit_Call
    return fn(*args, _builder=self.builder, **kws)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/impl/base.py", line 22, in wrapper
    return fn(*args, **kwargs)
TypeError: dot() got an unexpected keyword argument 'trans_b'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/ml/llm/inference/hf/transformers-direct/test.py", line 269, in <module>
    main(parse_args())
  File "/home/ubuntu/ml/llm/inference/hf/transformers-direct/test.py", line 218, in main
    _ = _generate(encoded_inp)
  File "/home/ubuntu/ml/llm/inference/hf/transformers-direct/test.py", line 209, in _generate
    return model.generate(
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/mpt-7b/modeling_mpt.py", line 237, in forward
    outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/mpt-7b/modeling_mpt.py", line 183, in forward
    (x, past_key_value) = block(x, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=self.is_causal)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/mpt-7b/blocks.py", line 36, in forward
    (b, _, past_key_value) = self.attn(a, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=is_causal)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/mpt-7b/attention.py", line 171, in forward
    (context, attn_weights) = self.attn_fn(query, key, value, self.n_heads, softmax_scale=self.softmax_scale, attn_bias=attn_bias, key_padding_mask=key_padding_mask, is_causal=is_causal, dropout_p=self.attn_dropout_p, training=self.training, needs_weights=needs_weights)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/mpt-7b/attention.py", line 111, in triton_flash_attn_fn
    attn_output = flash_attn_triton.flash_attn_func(query, key, value, attn_bias, reset_is_causal, softmax_scale)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/flash_attn/flash_attn_triton.py", line 810, in forward
    o, lse, ctx.softmax_scale = _flash_attn_forward(
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/flash_attn/flash_attn_triton.py", line 623, in _flash_attn_forward
    _fwd_kernel[grid](
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 199, in run
    return self.fn.run(*args, **kwargs)
  File "<string>", line 41, in _fwd_kernel
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 1621, in compile
    next_module = compile(module)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 1550, in <lambda>
    lambda src: ast_to_ttir(src, signature, configs[0], constants)),
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 962, in ast_to_ttir
    mod, _ = build_triton_ir(fn, signature, specialization, constants)
  File "/home/ubuntu/.anaconda3/envs/transformers-direct/lib/python3.9/site-packages/triton/compiler.py", line 942, in build_triton_ir
    raise CompilationError(fn.src, node) from e
triton.compiler.CompilationError: at 78:24:
def _fwd_kernel(
    Q, K, V, Bias, Out,
    Lse, TMP,  # NOTE: TMP is a scratchpad buffer to workaround a compiler bug
    softmax_scale,
    stride_qb, stride_qh, stride_qm,
    stride_kb, stride_kh, stride_kn,
    stride_vb, stride_vh, stride_vn,
    stride_bb, stride_bh, stride_bm,
    stride_ob, stride_oh, stride_om,
    nheads, seqlen_q, seqlen_k, seqlen_q_rounded, headdim,
    CACHE_KEY_SEQLEN_Q, CACHE_KEY_SEQLEN_K,
    BIAS_TYPE: tl.constexpr,
    IS_CAUSAL: tl.constexpr,
    BLOCK_HEADDIM: tl.constexpr,
    EVEN_M: tl.constexpr, EVEN_N: tl.constexpr, EVEN_HEADDIM: tl.constexpr,
    BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr,
):
    start_m = tl.program_id(0)
    off_hb = tl.program_id(1)
    off_b = off_hb // nheads
    off_h = off_hb % nheads
    # off_b = tl.program_id(1)
    # off_h = tl.program_id(2)
    # off_hb = off_b * nheads + off_h
    # initialize offsets
    offs_m = start_m * BLOCK_M + tl.arange(0, BLOCK_M)
    offs_n = tl.arange(0, BLOCK_N)
    offs_d = tl.arange(0, BLOCK_HEADDIM)
    # Initialize pointers to Q, K, V
    # Adding parenthesis around indexing might use int32 math instead of int64 math?
    # https://github.com/openai/triton/issues/741
    # I'm seeing a tiny bit of difference (5-7us)
    q_ptrs = Q + off_b * stride_qb + off_h * stride_qh + (offs_m[:, None] * stride_qm + offs_d[None, :])
    k_ptrs = K + off_b * stride_kb + off_h * stride_kh + (offs_n[:, None] * stride_kn + offs_d[None, :])
    v_ptrs = V + off_b * stride_vb + off_h * stride_vh + (offs_n[:, None] * stride_vn + offs_d[None, :])
    if BIAS_TYPE == 'vector':
        b_ptrs = Bias + off_b * stride_bb + off_h * stride_bh + offs_n
    elif BIAS_TYPE == 'matrix':
        b_ptrs = Bias + off_b * stride_bb + off_h * stride_bh + (offs_m[:, None] * stride_bm + offs_n[None, :])
    # initialize pointer to m and l
    t_ptrs = TMP + off_hb * seqlen_q_rounded + offs_m
    lse_i = tl.zeros([BLOCK_M], dtype=tl.float32) - float("inf")
    m_i = tl.zeros([BLOCK_M], dtype=tl.float32) - float("inf")
    acc_o = tl.zeros([BLOCK_M, BLOCK_HEADDIM], dtype=tl.float32)
    # load q: it will stay in SRAM throughout
    # [2022-10-30] TD: Triton bug - in the case of EVEN_M=True and EVEN_N=False, if we just call
    # tl.load(q_ptrs), we get the wrong output!
    if EVEN_M & EVEN_N:
        if EVEN_HEADDIM:
            q = tl.load(q_ptrs)
        else:
            q = tl.load(q_ptrs, mask=offs_d[None, :] < headdim, other=0.0)
    else:
        if EVEN_HEADDIM:
            q = tl.load(q_ptrs, mask=offs_m[:, None] < seqlen_q, other=0.0)
        else:
            q = tl.load(q_ptrs, mask=(offs_m[:, None] < seqlen_q) & (offs_d[None, :] < headdim),
                        other=0.0)
    # loop over k, v and update accumulator
    end_n = seqlen_k if not IS_CAUSAL else tl.minimum((start_m + 1) * BLOCK_M, seqlen_k)
    for start_n in range(0, end_n, BLOCK_N):
        start_n = tl.multiple_of(start_n, BLOCK_N)
        # -- compute qk ----
        if EVEN_N & EVEN_M:  # If we just do "if EVEN_N", there seems to be some race condition
            if EVEN_HEADDIM:
                k = tl.load(k_ptrs + start_n * stride_kn)
            else:
                k = tl.load(k_ptrs + start_n * stride_kn, mask=offs_d[None, :] < headdim, other=0.0)
        else:
            if EVEN_HEADDIM:
                k = tl.load(k_ptrs + start_n * stride_kn, mask=(start_n + offs_n)[:, None] < seqlen_k,
                            other=0.0)
            else:
                k = tl.load(k_ptrs + start_n * stride_kn,
                            mask=((start_n + offs_n)[:, None] < seqlen_k) & (offs_d[None, :] < headdim),
                            other=0.0)
        qk = tl.zeros([BLOCK_M, BLOCK_N], dtype=tl.float32)
        qk += tl.dot(q, k, trans_b=True)
                        ^

getting messy response (response includes #'s)

I implemented Instruct model locally.
Here is a sample respose I got:

Screenshot from 2023-05-10 12-29-24

I get multiple # in the response

Code:

def __call__(
        self, instruction: str, **generate_kwargs: Dict[str, Any]
    ) -> Tuple[str, str, float]:
        s = PROMPT_FOR_GENERATION_FORMAT.format(instruction=instruction)
        input_ids = self.tokenizer(s, return_tensors="pt").input_ids
        input_ids = input_ids.to(self.model.device)
        with torch.no_grad():
            output_ids = self.model.generate(input_ids, **generate_kwargs)
        # Slice the output_ids tensor to get only new tokens
        new_tokens = output_ids[0, len(input_ids[0]) :]
        output_text = self.tokenizer.decode(new_tokens, skip_special_tokens=True)
        return output_text

self.generate_text = InstructionTextGenerationPipeline(model=self.model, tokenizer=self.tokenizer)

data ={"max_new_tokens": MAX_NEW_TOKENS,
"temperature": 0.1,
"top_p": 1,
"use_cache": True,
"top_k": 0
}
response =  self.generate_text(prompt, **data)

How to use the train.py finetuning the pre-trained MPT-7B?

It seems like we need to perform the pre-trainning process to get the checkpoint file before we can fine-tuning the MPT model.

1b_local_data_sft.yaml mentioned that we have to replace the load_path with our own checkpoint path.

Can I use the one (.bin) in Hugging Face as the pre-trained model loading file?

Can we use PyTorch Version 2 to train models?

Hi @bmosaicml team,
Thank you for the amazing support of open source code and trained weights.

I am wondering why we cannot use PT v2 for training models. As you know, PT2 is much faster than PT1. However, this README only mentions PT1.

It is quite hard to install flash-attention due to be compatible among PTv1+CUDA, redandent CUDA installed, while PTv2 already supports flash attention.

Best,
Linh

Finetuning

Hi Team, I tried the finetuning code given in repo with 7b_dolly_sft.yaml, I ran for one epoch. Please find the details below:

[epoch=1][batch=927/927]:
Train time/batch: 926
Train time/sample: 59238
Train time/batch_in_epoch: 926
Train time/sample_in_epoch: 59238
Train time/token: 121319424
Train time/token_in_epoch: 121319424
Train memory/allocated_mem: 57.2720
Train memory/active_mem: 57.2720
Train memory/inactive_mem: 4.2835
Train memory/reserved_mem: 78.8820
Train memory/alloc_retries: 1
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 5.4634
Train metrics/train/LanguageCrossEntropy: 5.4463
Train metrics/train/LanguagePerplexity: 231.9032
Train throughput/batches_per_sec: 0.0554
Train throughput/samples_per_sec: 3.4448
Train throughput/device/batches_per_sec: 0.0277
Train throughput/device/samples_per_sec: 1.7224
Train throughput/flops_per_sec: 304591151416294.6250
Train throughput/device/flops_per_sec: 152295575708147.3125
Train throughput/device/mfu: 0.4881
Train time/train: 4.8149
Train time/val: 0.0000
Train time/total: 4.8149

After that converted the composer checkpoint into a standard HF checkpoint folder using the convert_composer_to_hf.py .
And tried running the inference code as given in repo:

python hf_generate.py --name_or_path hf_test_model --temperature 1.0 --top_p 0.95 --top_k 50 --seed 1 --max_new_tokens 256 --prompts "Who invented Ford vehicles ?"

But I am not sure why I am getting grabage output, please find the logs below :

python hf_generate.py --name_or_path hf_test_model --temperature 1.0 --top_p 0.95 --top_k 50 --seed 1 --max_new_tokens 256 --prompts "Who invented Ford vehicles ?"
Loading HF Config...
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Loading HF model to device=cuda:0 and dtype=torch.bfloat16...
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
n_params=6658859008

Loading HF tokenizer...

Generate kwargs:
{'max_new_tokens': 256, 'temperature': 1.0, 'top_p': 0.95, 'top_k': 50, 'use_cache': True, 'do_sample': True, 'eos_token_id': 0, 'pad_token_id': 0}

Tokenizing prompts...
NOT using autocast...
Warming up...
Generating responses...
####################################################################################################
Who invented Ford vehicles ?term that of of, state. people than to of that of state is " health people people to people a country time of people, and American important known known first the in are known of. the country include modern people the state of people, as people. A, on of the one of a group of his. B-.
####################################################################################################

Can you please guide if I am going somewhere wrong ?

Installation issue

Cannot run python data_prep/convert_dataset_hf.py --dataset c4 --data_subset en --out_root my-copy-c4 --splits train_small val_small --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text '<|endoftext|>'

Getting Traceback (most recent call last): File "/home/ighodgao/llm-foundry/scripts/data_prep/convert_dataset_hf.py", line 14, in <module> from streaming import MDSWriter ModuleNotFoundError: No module named 'streaming'

pip3 install streaming returns a ModuleNotFound error for Cython, which I then install manually but then am not able to install streaming due to more installation problems..

I am in a python venv and have installed requirements already as outlined in the docs.

Error in Triton implementation

Using the below configuration i loaded the instruct model

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.attn_config['attn_impl'] = 'triton'

model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  torch_dtype=torch.bfloat16,
  trust_remote_code=True
)
model.to(device='cuda:0')

But I got error:
TypeError: dot() got an unexpected keyword argument 'trans_b'

Fused Cross Entropy is not installed. Either (1) have a CUDA-compatible GPU and `pip install .[gpu]`, or (2) set your config model.loss_fn=torch_crossentropy.

Initializing model...
Traceback (most recent call last):
File "/content/llm-foundry/llmfoundry/models/mpt/modeling_mpt.py", line 619, in init
from flash_attn.losses.cross_entropy import CrossEntropyLoss as FusedCrossEntropyLoss # type: ignore # isort: skip
File "/usr/local/lib/python3.10/dist-packages/flash_attn/losses/cross_entropy.py", line 9, in
import xentropy_cuda_lib
ImportError: /usr/local/lib/python3.10/dist-packages/xentropy_cuda_lib.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda20CUDACachingAllocator9allocatorE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/content/llm-foundry/scripts/train/train.py", line 254, in
main(cfg)
File "/content/llm-foundry/scripts/train/train.py", line 144, in main
model = build_composer_model(cfg.model, tokenizer)
File "/content/llm-foundry/scripts/train/train.py", line 67, in build_composer_model
return COMPOSER_MODEL_REGISTRY[model_cfg.name](model_cfg, tokenizer)
File "/content/llm-foundry/llmfoundry/models/mpt/modeling_mpt.py", line 624, in init
raise ValueError(
ValueError: Fused Cross Entropy is not installed. Either (1) have a CUDA-compatible GPU and pip install .[gpu], or (2) set your config model.loss_fn=torch_crossentropy.
ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1.
Waiting up to 30 seconds for all training processes to terminate. Press Ctrl-C to exit immediately.
Global rank 0 (PID 13871) exited with code 1
ERROR:composer.cli.launcher:Global rank 0 (PID 13871) exited with code 1

Error in triton implementation

I used this configuration

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.attn_config['attn_impl'] = 'triton'

model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  torch_dtype=torch.bfloat16,
  trust_remote_code=True
)
model.to(device='cuda:0')

to load the instruct model.

But i the following error:
TypeError: dot() got an unexpected keyword argument 'trans_b'

HF Auth Token issue

running
python inference/hf_generate.py --name_or_path mpt-125m-hf --max_new_tokens 256 --prompts "The answer to life, the universe, and happiness is" "Here's a quick recipe for baking chocolate chip cookies: Start by"

with all requirements installed
gives

Loading HF Config...
Traceback (most recent call last):
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/mpt-125m-hf/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/transformers/utils/hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1195, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1541, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-645aab74-09aa00034330c99310638c3a)

Repository Not Found for url: https://huggingface.co/mpt-125m-hf/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ighodgao/llm-foundry/scripts/inference/hf_generate.py", line 123, in main
    config = AutoConfig.from_pretrained(args.name_or_path,
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 916, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 573, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 628, in _get_config_dict
    resolved_config_file = cached_file(
  File "/home/ighodgao/llm-foundry/lfvenv/lib/python3.10/site-packages/transformers/utils/hub.py", line 424, in cached_file
    raise EnvironmentError(
OSError: mpt-125m-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ighodgao/llm-foundry/scripts/inference/hf_generate.py", line 271, in <module>
    main(parse_args())
  File "/home/ighodgao/llm-foundry/scripts/inference/hf_generate.py", line 131, in main
    raise RuntimeError(
RuntimeError: If you are having auth problems, try logging in via `huggingface-cli login` or by setting the environment variable `export HUGGING_FACE_HUB_TOKEN=... using your access token from https://huggingface.co/settings/tokens.

I have already tried both methods suggested (logging in with cli was successful, and env variable is already set to the same write auth token)

fatal error: cuda.h: No such file or directory

When running the training section of the readme i get an error regarding cuda.h. Is it possible to specify a path for the composer to look for the cuda support? I have cuda.h under ~/anaconda3/include/cuda.h and also have ~/anaconda3/include/cuda_runtime.h which some general stackoverflow articles recommended when running into issues with cuda.h.

What i'm running:

# Train an MPT-125m model for 10 batches
composer train/train.py \
  train/yamls/mpt/125m.yaml \
  data_local=my-copy-c4 \
  train_loader.dataset.split=train_small \
  eval_loader.dataset.split=val_small \
  max_duration=10ba \
  eval_interval=0 \
  save_folder=mpt-125m

The error i'm getting:

/tmp/tmpd3ybp9wv/main.c:2:10: fatal error: cuda.h: No such file or directory
    2 | #include "cuda.h"
      |          ^~~~~~~~
compilation terminated.

There is also a giant stack trace which i can include if that's useful.

How to fine-tune instruct mpt-7b model?

I see in the train.py under scripts/train, it gets a model when given a model configuration. I took a look at this yaml 7b_dolly_sft.yaml and do you think I could further tune the instruct model somehow?

What name do I specify in the yaml for the instruct model? Or can I give it a custom path to that instruct model if I have saved it locally? Looking to freeze weights for all the layers other than the last one and then fine-tuning.

Please guide me. Thanks a bunch!

Crash in hf_chat.py after lots of activity with longer output

After running the chat for a few dozen times through the loop with --max_new_tokens=500 a several longer outputs, the script crashes in model.generate() with the message:

RuntimeError: The size of tensor a (2049) must match the size of tensor b (2048) at non-singleton dimension 3

I attempted to limit the tokens on input with:

tokenizer = AutoTokenizer.from_pretrained(args.name_or_path, max_new_tokens=1024,
**from_pretrained_kwargs)

but get the same result.

GPU memory issue

I used the train script from README to kick off 125M model training for 10 batches, exactly as in the example and was surprised to see this used almost all of my GPU memory (3080 16GB)

Here's the output from the last batch:
[batch=10/10]:
Train time/batch: 9
Train time/sample: 2304
Train time/batch_in_epoch: 9
Train time/sample_in_epoch: 2304
Train time/token: 4718592
Train time/token_in_epoch: 4718592
Train memory/allocated_mem: 3.8951
Train memory/active_mem: 3.8951
Train memory/inactive_mem: 5.5924
Train memory/reserved_mem: 13.5500
Train memory/alloc_retries: 3
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 10.9654
Train metrics/train/LanguageCrossEntropy: 10.9651
Train metrics/train/LanguagePerplexity: 57818.7188
Train time/train: 0.0606
Train time/val: 0.0000
Train time/total: 0.0606
Train lr-DecoupledAdamW/group0: 0.0001
Train time/remaining_estimate: 0.0000

I would ideally want to train/fine-tune a larger model, but even trying the 350M model fails:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.54 GiB (GPU 0; 16.00 GiB total capacity; 13.53 GiB already allocated; 6.00 MiB free; 14.01 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:composer.cli.launcher:Rank 0 crashed with exit code 1.

I checked with nvidia-smi before running the train script and the GPU had no memory allocated.

Question about the "streaming" package

I followed the instruction: pip install -e ".[gpu]" in the readme, but the "streaming" package was not installed.
And I got the error "no module named streaming".
And when I use "pip install streaming" to download streaming==0.1.2, a lot of errors about _cython.c occur.
I was wondering which version of the "streaming" package should I install?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.