Git Product home page Git Product logo

starcoder's Introduction

๐Ÿ’ซ StarCoder

Paper | Model | Playground | VSCode | Chat

What is this about?

๐Ÿ’ซ StarCoder is a language model (LM) trained on source code and natural language text. Its training data incorporates more that 80 different programming languages as well as text extracted from GitHub issues and commits and from notebooks. This repository showcases how we get an overview of this LM's capabilities.

News

  • May 9, 2023: We've fine-tuned StarCoder to act as a helpful coding assistant ๐Ÿ’ฌ! Check out the chat/ directory for the training code and play with the model here.

Disclaimer

Before you can use the model go to hf.co/bigcode/starcoder and accept the agreement. And make sure you are logged into the Hugging Face hub with:

huggingface-cli login

Table of Contents

  1. Quickstart
  2. Fine-tuning
  3. Evaluation
  4. Inference hardware requirements

Quickstart

StarCoder was trained on GitHub code, thus it can be used to perform code generation. More precisely, the model can complete the implementation of a function or infer the following characters in a line of code. This can be done with the help of the ๐Ÿค—'s transformers library.

Installation

First, we have to install all the libraries listed in requirements.txt

pip install -r requirements.txt

Code generation

The code generation pipeline is as follows

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# to save memory consider using fp16 or bf16 by specifying torch_dtype=torch.float16 for example
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)
# clean_up_tokenization_spaces=False prevents a tokenizer edge case which can result in spaces being removed around punctuation
print(tokenizer.decode(outputs[0], clean_up_tokenization_spaces=False))

or

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
checkpoint = "bigcode/starcoder"

model = AutoModelForCausalLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
print( pipe("def hello():") )

For hardware requirements, check the section Inference hardware requirements.

Text-generation-inference

docker run -p 8080:80 -v $PWD/data:/data -e HUGGING_FACE_HUB_TOKEN=<YOUR BIGCODE ENABLED TOKEN> -d  ghcr.io/huggingface/text-generation-inference:latest --model-id bigcode/starcoder --max-total-tokens 8192

For more details, see here.

Fine-tuning

Here, we showcase how we can fine-tune this LM on a specific downstream task.

Step by step installation with conda

Create a new conda environment and activate it

conda create -n env
conda activate env

Install the pytorch version compatible with your version of cuda here, for example the following command works with cuda 11.6

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia

Install transformers and peft

conda install -c huggingface transformers 
pip install git+https://github.com/huggingface/peft.git

Note that you can install the latest stable version of transformers by using

pip install git+https://github.com/huggingface/transformers

Install datasets, accelerate and huggingface_hub

conda install -c huggingface -c conda-forge datasets
conda install -c conda-forge accelerate
conda install -c conda-forge huggingface_hub

Finally, install bitsandbytes and wandb

pip install bitsandbytes
pip install wandb

To get the full list of arguments with descriptions you can run the following command on any script:

python scripts/some_script.py --help

Before you run any of the scripts make sure you are logged in and can push to the hub:

huggingface-cli login

Make sure you are logged in wandb:

wandb login

Now that everything is done, you can clone the repository and get into the corresponding directory.

Datasets

๐Ÿ’ซ StarCoder can be fine-tuned to achieve multiple downstream tasks. Our interest here is to fine-tune StarCoder in order to make it follow instructions. Instruction fine-tuning has gained a lot of attention recently as it proposes a simple framework that teaches language models to align their outputs with human needs. That procedure requires the availability of quality instruction datasets, which contain multiple instruction - answer pairs. Unfortunately such datasets are not ubiquitous but thanks to Hugging Face ๐Ÿค—'s datasets library we can have access to some good proxies. To fine-tune cheaply and efficiently, we use Hugging Face ๐Ÿค—'s PEFT as well as Tim Dettmers' bitsandbytes.

Stack Exchange SE

Stack Exchange is a well-known network of Q&A websites on topics in diverse fields. It is a place where a user can ask a question and obtain answers from other users. Those answers are scored and ranked based on their quality. Stack exchange instruction is a dataset that was obtained by scrapping the site in order to build a collection of Q&A pairs. A language model can then be fine-tuned on that dataset to make it elicit strong and diverse question-answering skills.

To execute the fine-tuning script run the following command:

python finetune/finetune.py \
  --model_path="bigcode/starcoder"\
  --dataset_name="ArmelR/stack-exchange-instruction"\
  --subset="data/finetune"\
  --split="train"\
  --size_valid_set 10000\
  --streaming\
  --seq_length 2048\
  --max_steps 1000\
  --batch_size 1\
  --input_column_name="question"\
  --output_column_name="response"\ 
  --gradient_accumulation_steps 16\
  --learning_rate 1e-4\
  --lr_scheduler_type="cosine"\
  --num_warmup_steps 100\
  --weight_decay 0.05\
  --output_dir="./checkpoints" \

The size of the SE dataset is better manageable when using streaming. We also have to precise the split of the dataset that is used. For more details, check the dataset's page on ๐Ÿค—. Similarly we can modify the command to account for the availability of GPUs

python -m torch.distributed.launch \
  --nproc_per_node number_of_gpus finetune/finetune.py \
  --model_path="bigcode/starcoder"\
  --dataset_name="ArmelR/stack-exchange-instruction"\
  --subset="data/finetune"\
  --split="train"\
  --size_valid_set 10000\
  --streaming \
  --seq_length 2048\
  --max_steps 1000\
  --batch_size 1\
  --input_column_name="question"\
  --output_column_name="response"\ 
  --gradient_accumulation_steps 16\
  --learning_rate 1e-4\
  --lr_scheduler_type="cosine"\
  --num_warmup_steps 100\
  --weight_decay 0.05\
  --output_dir="./checkpoints" \

Merging PEFT adapter layers

If you train a model with PEFT, you'll need to merge the adapter layers with the base model if you want to run inference / evaluation. To do so, run:

python finetune/merge_peft_adapters.py --base_model_name_or_path model_to_merge --peft_model_path model_checkpoint

# Push merged model to the Hub
python finetune/merge_peft_adapters.py --base_model_name_or_path model_to_merge --peft_model_path model_checkpoint --push_to_hub

For example

python finetune/merge_peft_adapters.py --model_name_or_path bigcode/starcoder --peft_model_path checkpoints/checkpoint-1000 --push_to_hub

Evaluation

To evaluate StarCoder and its derivatives, you can use the BigCode-Evaluation-Harness for evaluating Code LLMs.

Inference hardware requirements

In FP32 the model requires more than 60GB of RAM, you can load it in FP16 or BF16 in ~30GB, or in 8bit under 20GB of RAM with

# make sure you have accelerate and bitsandbytes installed
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder")
# for fp16 replace with  `load_in_8bit=True` with   `torch_dtype=torch.float16`
model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder", device_map="auto", load_in_8bit=True)
print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
Memory footprint: 15939.61 MB

You can also try starcoder.cpp, a C++ implementation with ggml library.

starcoder's People

Contributors

arjunguha avatar armelrandy avatar dpfried avatar eltociear avatar eryk-mazus avatar lewtun avatar loubnabnl avatar lvwerra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

starcoder's Issues

Model loading failed for inference - KeyError: 'gpt_bigcode'


KeyError Traceback (most recent call last)
/tmp/ipykernel_1004512/2718782402.py in
2 tokenizer = AutoTokenizer.from_pretrained(checkpoint,use_auth_token=True)
3 # to save memory consider using fp16 or bf16 by specifying torch.dtype=torch.float16 for example
----> 4 model = AutoModelForCausalLM.from_pretrained(checkpoint,use_auth_token=True).to(device)

~/anaconda3/envs/verilog_gpt/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
421 kwargs["_from_auto"] = True
422 if not isinstance(config, PretrainedConfig):
--> 423 config, kwargs = AutoConfig.from_pretrained(
424 pretrained_model_name_or_path, return_unused_kwargs=True, trust_remote_code=trust_remote_code, **kwargs
425 )

~/anaconda3/envs/verilog_gpt/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
743 return config_class.from_pretrained(pretrained_model_name_or_path, **kwargs)
744 elif "model_type" in config_dict:
--> 745 config_class = CONFIG_MAPPING[config_dict["model_type"]]
746 return config_class.from_dict(config_dict, **kwargs)
747 else:

~/anaconda3/envs/verilog_gpt/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py in getitem(self, key)
450 return self._extra_content[key]
451 if key not in self._mapping:
--> 452 raise KeyError(key)
453 value = self._mapping[key]
454 module_name = model_type_to_module_name(key)

KeyError: 'gpt_bigcode'

Publish VS Code Plugin on OpenVSX

As per title. Right now the plugin is only published on the proprietary VS Code marketplace. Would it be possible to publish it on OpenVSX too? Then VSCode derived editors like Theia would be able to use it.

dataset_config_name is not set in config.yaml

if data_args.dataset_config_name is not None:

It generates error:
if data_args.dataset_config_name is not None:AttributeErrorif data_args.dataset_config_name is not None:
AttributeError:
'DataArguments' object has no attribute 'dataset_config_name': 'DataArguments' object has no attribute 'dataset_config_name'AttributeError: AttributeError'DataArguments' object has no attribute 'dataset_config_name
': 'DataArguments' object has no attribute 'dataset_config_name'

Does anyone run successfully with CPU only offline?

I try to run the model with a CPU-only python driving file but unfortunately always got failure on making some attemps.
And here is my adapted file:

Attempt 1:

from transformers import AutoModelForCausalLM, AutoTokenizer ,BitsAndBytesConfig

checkpoint = "bigcode/starcoder"
device = "cpu" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto").to(device)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)

outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

While I got following error messages:

ValueError: The current `device_map` had weights offloaded to the disk. Please 
provide an `offload_folder` for them. Alternatively, make sure you have `safetensors`
installed if the model you are using offers the weights in this format.

Attempt 2:

Also I tried a huggingface quantization way reference by https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig , AutoConfig

quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)

device_map = {
    "transformer.word_embeddings": 0,
    "transformer.word_embeddings_layernorm": 0,
    "lm_head": "cpu",
    "transformer.h": 0,
    "transformer.ln_f": 0,
}
tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder")

model_8bit = AutoModelForCausalLM.from_pretrained(
    "bigcode/starcoder",
    device_map=device_map,
    quantization_config=quantization_config,
)

print(f"Memory footprint: {model_8bit.get_memory_footprint() / 1e6:.2f} MB")

But got another error as well:

ValueError: transformer.wte.weight doesn't have any device set.

Some system env info:

ubuntu 18
python 3.8.5
torch1.10.1+cuda111

So I'm not sure if GPU is required for inference or how to properly config the device_map for the model,
I wish someone who know about this could help me on this.
Thanks.

`bfloat16` / `float16` StarCoder keeps producing `<|endoftext|>` for HumanEval inputs in greedy decoding

Thanks for open-sourcing this amazing work. However, I tried to starcoder with half-precision and greedy decoing but it simply produces <|endoftext|> for the majority of problems in HumanEval. However, from the paper, the pass@1 is pretty good (33.6 / 40.8). I am curious if it is expected or if it is because I did not use it in the right way. (sorry the A6000 we had cannot load the full-precision version...) Thanks!

A quick repro:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

checkpoint = "bigcode/starcoder"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(
    checkpoint, torch_dtype=torch.bfloat16
).cuda()

text = R'''from typing import List


def separate_paren_groups(paren_string: str) -> List[str]:
    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
    separate those group into separate strings and return the list of those.
    Separate groups are balanced (each open brace is properly closed) and not nested within each other
    Ignore any spaces in the input string.
    >>> separate_paren_groups('( ) (( )) (( )( ))')
    ['()', '(())', '(()())']
    """
'''

inputs = tokenizer.encode(text, return_tensors="pt").cuda()
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, temperature=0) # greedy
print(tokenizer.decode(outputs[0]))

When setting a higher temperature, it can produce something different but sometimes invalid:

0.8

image

0.2

image

0.1

image

Project Uses Code Under Non-Permissive License to train AI models in violation of these licenses.

People had their work added to the training set without their explicit opt in permission and without their consent.

This means that this entire project stack, as it's called, is stolen code, and makes the output stolen as well; Because you're generating code off of other people's work without their consent and not remunerating them. This is theft. This is a violation of all the licenses.

To make this worse, your website says everything is permissively based when that's not true. You're actively lying to the community about where you got your training set and code from, and telling people its Based on public data that was permissible to train on, when, in reality, you stole the code, and didn't tell the authors, and now you're trying to walk it back and pretend like they can opt out when in reality, this entire thing wouldn't exist unless you had stolen our code.

As somebody who was illegally added to that data set without his consent, I opt out. Earlier versions of this project could not exist without my code. And so this project should not exist at all because it only exists due to the theft of our code. This project shouldn't exist in its current form because it's built on stolen property.

onnx formated model

is it possible to release the model as serialized onnx file probably it's a good idea to release some sample code with onnx Inference engine with public restful API

Can/How StarCoder model can be used for encoding?

Beside the well-kown ChatGPT, now more and more startups and researchers note the great value and potential in OpenAI embedding API (https://platform.openai.com/docs/guides/embeddings). It enables many domain-specific adaptation and applications, like LLaMa-index, soft prompting, retrieval-augmented generation, etc.

Therefore, I wonder if StarCoder can be used for encoding? If the anwser is Yes, how should we make it usable? By modifying the network layers or solely the inference code?

I know there is StarEncoder~125M, is it already ok for encoding?

During finetuning (finetuner_starcoder.py), I'm running out of GPU memory during the checkpoint saving step (save_pretrained)

Looks like GPU usage almost doubles during saving (save_pretrained - get_peft_model_state_dict function). Is there a way to avoid this?

stack trace:

  File "finetune_starcoder.py", line 343, in <module>
    main(args)
  File "finetune_starcoder.py", line 332, in main
    run_training(args, train_dataset, eval_dataset)
  File "finetune_starcoder.py", line 323, in run_training
    trainer.train()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2291, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2348, in _save_checkpoint
    self.save_model(output_dir, _internal_call=True)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2830, in save_model
    self._save(output_dir)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2873, in _save
    state_dict = self.model.state_dict()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  [Previous line repeated 4 more times]
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1815, in state_dict
    self._save_to_state_dict(destination, prefix, keep_vars)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 268, in _save_to_state_dict
    self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 100, in undo_layout
    return outputs.reshape(rows, cols).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 0; 39.56 GiB total capacity; 36.25 GiB already allocated; 24.56 MiB free; 37.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  0%|                                                                                                                                                                                    | 1/8000 [00:22<49:33:48, 22.31s/it]```

Generate (Android,..) project and use it as context in StarCoder (for whole life-cycle help)?

Seeing a lot of typical test examples like a Python counting function or code completion I'm wondering if there is something in the works to help on the project level.
I tried GPT4All with an Android (Studio) sample app and it gave me some code and XML, but it was far from trivial to find where to put the pieces.

Taking the GPT 4 Code Interpreter with file uploads as context, spitting out marvelous analysis and graphics:
I'm wondering if or when StarCoder will be able to generate a project according to a prompt and / or further use it as advanced context to help in the whole life-cycle.

Thanks for any insights
G.

OOM on T4 inference

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

Does anyone could help me to resolve the problem ,T4 has almost 15G GPU memory

image

And if I use:

with torch.device(device):
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.float16)

There is another issue

image

KeyError: 'gpt_bigcode'

Hi, I am trying to run the given inference code in the README, but it gives me such error after downloading the files:

Downloading (โ€ฆ)okenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 677/677 [00:00<00:00, 44.4kB/s]
Downloading (โ€ฆ)olve/main/vocab.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 777k/777k [00:00<00:00, 1.02MB/s]                                                                                                                                            
Downloading (โ€ฆ)olve/main/merges.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 442k/442k [00:00<00:00, 774kB/s]
Downloading (โ€ฆ)/main/tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2.06M/2.06M [00:00<00:00, 2.09MB/s]
Downloading (โ€ฆ)cial_tokens_map.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 532/532 [00:00<00:00, 99.9kB/s]
Downloading (โ€ฆ)lve/main/config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.04k/1.04k [00:00<00:00, 271kB/s]
Traceback (most recent call last):                                   
  File "/home/workspace//starcoder/starcoder.py", line 7, in <module>                                                               
    model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
  File "/root/miniconda3/envs/starcoder/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(                     
  File "/root/miniconda3/envs/starcoder/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 917, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]         
  File "/root/miniconda3/envs/starcoder/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 623, in __getitem__
    raise KeyError(key)                                              
KeyError: 'gpt_bigcode' 

minimum hardware

What is the minimum hardware someone has run this? I have 32 CPU ram and a 4090 GPU (24 GB) and no luck so far...

VS Code plugin is breaking the completion

Hello! Thank you for your work.

I've encountered a strange behavior using a VS Code plugin (HF autocompletion). Sometimes it breaks the completion and adding it from the middle, like this:

Looks like there are some issues with plugin.

VS Code version 1.79.0-insider

Finetune.py OOM when saving checkpoint if trained on 24GB 3090

I am attempting to finetune the model using the command provided in the README. I am getting CUDA OutOfMemoryError:

OutOfMemoryError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 23.69 GiB total capacity; 21.01 GiB already allocated; 77.06 MiB free; 22.23 GiB reserved in total by PyTorch) If reserved memory is >> 
allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The hardware is a 24GB 3090. It goes out of memory when saving the file, otherwise the training runs well.

To reproduce the error quickly I add --save_freq 2 to trigger the error early on eg:
python3 finetune/finetune-split.py --model_path="bigcode/starcoder" --dataset_name="ArmelR/stack-exchange-instruction" --subset="data/finetune" --split="train" --size_valid_set 10000 --streaming --seq_length 256 --save_freq 2 --max_steps 1000 --batch_size 1 --input_column_name="question" --output_column_name="response"

I've reduced sequence length here but have tried other context lengths as well.

If I leave the save_freq I'll get a full training run in until the final stage and then it crashed.

Here is an example wandb training run:
Example

CMDLINE:
(star2) gpu@gpu:~/code/starcoder$ python3 finetune/finetune.py --model_path="bigcode/starcoder" --dataset_name="ArmelR/stack-exchange-instruction" --subset="data/finetune" --split="train" --size_valid_set 10000 --streaming --seq_length 256 --save_freq 2 --max_steps 1000 --batch_size 1 --input_column_name="question" --output_column_name="response"

bin /home/gpu/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so
/home/gpu/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/gpu/miniconda3/envs/star2/lib/libcudart.so'), PosixPath('/home/gpu/miniconda3/envs/star2/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /home/gpu/miniconda3/envs/star2/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /home/gpu/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so...
Loading the dataset in streaming mode
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 400/400 [00:27<00:00, 14.72it/s]
The character to token ratio of the dataset is: 3.46
Loading the model
Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 7/7 [00:30<00:00, 4.34s/it]
trainable params: 35553280 || all params: 15553009664 || trainable%: 0.22859421274773536
Starting main loop
Training...
/home/gpu/.local/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
wandb: Currently logged in as: richarjb. Use wandb login --relogin to force relogin
wandb: wandb version 0.15.2 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.14.2
wandb: Run data is saved locally in /home/gpu/code/starcoder/wandb/run-20230507_134929-5jvhgep5
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run StarCoder-finetuned

/home/gpu/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:318: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
/home/gpu/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:318: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
File "/home/gpu/code/starcoder/finetune/finetune.py", line 314, in
main(args)
File "/home/gpu/code/starcoder/finetune/finetune.py", line 303, in main
run_training(args, train_dataset, eval_dataset)
File "/home/gpu/code/starcoder/finetune/finetune.py", line 293, in run_training
trainer.train()
File "/home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2291, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2348, in _save_checkpoint
self.save_model(output_dir, _internal_call=True)
File "/home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2830, in save_model
self._save(output_dir)
File "/home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2873, in _save
state_dict = self.model.state_dict()
File "/home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "/home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "/home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
[Previous line repeated 4 more times]
File "/home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1815, in state_dict
self._save_to_state_dict(destination, prefix, keep_vars)
File "/home/gpu/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 268, in _save_to_state_dict
self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
File "/home/gpu/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 100, in undo_layout
return outputs.reshape(rows, cols).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 23.69 GiB total capacity; 21.01 GiB already allocated; 77.06 MiB free; 22.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /home/gpu/code/starcoder/finetune/finetune.py:314 in โ”‚
โ”‚ โ”‚
โ”‚ 311 โ”‚ โ”‚
โ”‚ 312 โ”‚ logging.set_verbosity_error() โ”‚
โ”‚ 313 โ”‚ โ”‚
โ”‚ โฑ 314 โ”‚ main(args) โ”‚
โ”‚ 315 โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/code/starcoder/finetune/finetune.py:303 in main โ”‚
โ”‚ โ”‚
โ”‚ 300 โ”‚ โ”‚
โ”‚ 301 โ”‚ tokenizer = AutoTokenizer.from_pretrained(args.model_path, use_auth_token=token) โ”‚
โ”‚ 302 โ”‚ train_dataset, eval_dataset = create_datasets(tokenizer, args) โ”‚
โ”‚ โฑ 303 โ”‚ run_training(args, train_dataset, eval_dataset) โ”‚
โ”‚ 304 โ”‚
โ”‚ 305 โ”‚
โ”‚ 306 if name == "main": โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/code/starcoder/finetune/finetune.py:293 in run_training โ”‚
โ”‚ โ”‚
โ”‚ 290 โ”‚ trainer = Trainer(model=model, args=training_args, train_dataset=train_data, eval_da โ”‚
โ”‚ 291 โ”‚ โ”‚
โ”‚ 292 โ”‚ print("Training...") โ”‚
โ”‚ โฑ 293 โ”‚ trainer.train() โ”‚
โ”‚ 294 โ”‚ โ”‚
โ”‚ 295 โ”‚ print("Saving last checkpoint of the model") โ”‚
โ”‚ 296 โ”‚ model.save_pretrained(os.path.join(args.output_dir, "final_checkpoint/")) โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py:1662 in train โ”‚
โ”‚ โ”‚
โ”‚ 1659 โ”‚ โ”‚ inner_training_loop = find_executable_batch_size( โ”‚
โ”‚ 1660 โ”‚ โ”‚ โ”‚ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size โ”‚
โ”‚ 1661 โ”‚ โ”‚ ) โ”‚
โ”‚ โฑ 1662 โ”‚ โ”‚ return inner_training_loop( โ”‚
โ”‚ 1663 โ”‚ โ”‚ โ”‚ args=args, โ”‚
โ”‚ 1664 โ”‚ โ”‚ โ”‚ resume_from_checkpoint=resume_from_checkpoint, โ”‚
โ”‚ 1665 โ”‚ โ”‚ โ”‚ trial=trial, โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py:2006 in โ”‚
โ”‚ _inner_training_loop โ”‚
โ”‚ โ”‚
โ”‚ 2003 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ self.state.epoch = epoch + (step + 1 + steps_skipped) / steps_in_epo โ”‚
โ”‚ 2004 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ self.control = self.callback_handler.on_step_end(args, self.state, s โ”‚
โ”‚ 2005 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ โฑ 2006 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_k โ”‚
โ”‚ 2007 โ”‚ โ”‚ โ”‚ โ”‚ else: โ”‚
โ”‚ 2008 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ self.control = self.callback_handler.on_substep_end(args, self.state โ”‚
โ”‚ 2009 โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py:2291 in โ”‚
โ”‚ _maybe_log_save_evaluate โ”‚
โ”‚ โ”‚
โ”‚ 2288 โ”‚ โ”‚ โ”‚ self._report_to_hp_search(trial, self.state.global_step, metrics) โ”‚
โ”‚ 2289 โ”‚ โ”‚ โ”‚
โ”‚ 2290 โ”‚ โ”‚ if self.control.should_save: โ”‚
โ”‚ โฑ 2291 โ”‚ โ”‚ โ”‚ self._save_checkpoint(model, trial, metrics=metrics) โ”‚
โ”‚ 2292 โ”‚ โ”‚ โ”‚ self.control = self.callback_handler.on_save(self.args, self.state, self.con โ”‚
โ”‚ 2293 โ”‚ โ”‚
โ”‚ 2294 โ”‚ def _load_rng_state(self, checkpoint): โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py:2348 in _save_checkpoint โ”‚
โ”‚ โ”‚
โ”‚ 2345 โ”‚ โ”‚ โ”‚
โ”‚ 2346 โ”‚ โ”‚ run_dir = self._get_output_dir(trial=trial) โ”‚
โ”‚ 2347 โ”‚ โ”‚ output_dir = os.path.join(run_dir, checkpoint_folder) โ”‚
โ”‚ โฑ 2348 โ”‚ โ”‚ self.save_model(output_dir, _internal_call=True) โ”‚
โ”‚ 2349 โ”‚ โ”‚ if self.deepspeed: โ”‚
โ”‚ 2350 โ”‚ โ”‚ โ”‚ # under zero3 model file itself doesn't get saved since it's bogus! Unless d โ”‚
โ”‚ 2351 โ”‚ โ”‚ โ”‚ # config stage3_gather_16bit_weights_on_model_save is True โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py:2830 in save_model โ”‚
โ”‚ โ”‚
โ”‚ 2827 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ self.deepspeed.save_checkpoint(output_dir) โ”‚
โ”‚ 2828 โ”‚ โ”‚ โ”‚
โ”‚ 2829 โ”‚ โ”‚ elif self.args.should_save: โ”‚
โ”‚ โฑ 2830 โ”‚ โ”‚ โ”‚ self._save(output_dir) โ”‚
โ”‚ 2831 โ”‚ โ”‚ โ”‚
โ”‚ 2832 โ”‚ โ”‚ # Push to the Hub when save_model is called by the user. โ”‚
โ”‚ 2833 โ”‚ โ”‚ if self.args.push_to_hub and not _internal_call: โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/transformers/trainer.py:2873 in _save โ”‚
โ”‚ โ”‚
โ”‚ 2870 โ”‚ โ”‚ # They can then be reloaded using from_pretrained() โ”‚
โ”‚ 2871 โ”‚ โ”‚ if not isinstance(self.model, PreTrainedModel): โ”‚
โ”‚ 2872 โ”‚ โ”‚ โ”‚ if state_dict is None: โ”‚
โ”‚ โฑ 2873 โ”‚ โ”‚ โ”‚ โ”‚ state_dict = self.model.state_dict() โ”‚
โ”‚ 2874 โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ 2875 โ”‚ โ”‚ โ”‚ if isinstance(unwrap_model(self.model), PreTrainedModel): โ”‚
โ”‚ 2876 โ”‚ โ”‚ โ”‚ โ”‚ unwrap_model(self.model).save_pretrained( โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1818 in state_dict โ”‚
โ”‚ โ”‚
โ”‚ 1815 โ”‚ โ”‚ self._save_to_state_dict(destination, prefix, keep_vars) โ”‚
โ”‚ 1816 โ”‚ โ”‚ for name, module in self._modules.items(): โ”‚
โ”‚ 1817 โ”‚ โ”‚ โ”‚ if module is not None: โ”‚
โ”‚ โฑ 1818 โ”‚ โ”‚ โ”‚ โ”‚ module.state_dict(destination=destination, prefix=prefix + name + '.', k โ”‚
โ”‚ 1819 โ”‚ โ”‚ for hook in self._state_dict_hooks.values(): โ”‚
โ”‚ 1820 โ”‚ โ”‚ โ”‚ hook_result = hook(self, destination, prefix, local_metadata) โ”‚
โ”‚ 1821 โ”‚ โ”‚ โ”‚ if hook_result is not None: โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1818 in state_dict โ”‚
โ”‚ โ”‚
โ”‚ 1815 โ”‚ โ”‚ self._save_to_state_dict(destination, prefix, keep_vars) โ”‚
โ”‚ 1816 โ”‚ โ”‚ for name, module in self._modules.items(): โ”‚
โ”‚ 1817 โ”‚ โ”‚ โ”‚ if module is not None: โ”‚
โ”‚ โฑ 1818 โ”‚ โ”‚ โ”‚ โ”‚ module.state_dict(destination=destination, prefix=prefix + name + '.', k โ”‚
โ”‚ 1819 โ”‚ โ”‚ for hook in self._state_dict_hooks.values(): โ”‚
โ”‚ 1820 โ”‚ โ”‚ โ”‚ hook_result = hook(self, destination, prefix, local_metadata) โ”‚
โ”‚ 1821 โ”‚ โ”‚ โ”‚ if hook_result is not None: โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1818 in state_dict โ”‚
โ”‚ โ”‚
โ”‚ 1815 โ”‚ โ”‚ self._save_to_state_dict(destination, prefix, keep_vars) โ”‚
โ”‚ 1816 โ”‚ โ”‚ for name, module in self._modules.items(): โ”‚
โ”‚ 1817 โ”‚ โ”‚ โ”‚ if module is not None: โ”‚
โ”‚ โฑ 1818 โ”‚ โ”‚ โ”‚ โ”‚ module.state_dict(destination=destination, prefix=prefix + name + '.', k โ”‚
โ”‚ 1819 โ”‚ โ”‚ for hook in self._state_dict_hooks.values(): โ”‚
โ”‚ 1820 โ”‚ โ”‚ โ”‚ hook_result = hook(self, destination, prefix, local_metadata) โ”‚
โ”‚ 1821 โ”‚ โ”‚ โ”‚ if hook_result is not None: โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1818 in state_dict โ”‚
โ”‚ โ”‚
โ”‚ 1815 โ”‚ โ”‚ self._save_to_state_dict(destination, prefix, keep_vars) โ”‚
โ”‚ 1816 โ”‚ โ”‚ for name, module in self._modules.items(): โ”‚
โ”‚ 1817 โ”‚ โ”‚ โ”‚ if module is not None: โ”‚
โ”‚ โฑ 1818 โ”‚ โ”‚ โ”‚ โ”‚ module.state_dict(destination=destination, prefix=prefix + name + '.', k โ”‚
โ”‚ 1819 โ”‚ โ”‚ for hook in self._state_dict_hooks.values(): โ”‚
โ”‚ 1820 โ”‚ โ”‚ โ”‚ hook_result = hook(self, destination, prefix, local_metadata) โ”‚
โ”‚ 1821 โ”‚ โ”‚ โ”‚ if hook_result is not None: โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1818 in state_dict โ”‚
โ”‚ โ”‚
โ”‚ 1815 โ”‚ โ”‚ self._save_to_state_dict(destination, prefix, keep_vars) โ”‚
โ”‚ 1816 โ”‚ โ”‚ for name, module in self._modules.items(): โ”‚
โ”‚ 1817 โ”‚ โ”‚ โ”‚ if module is not None: โ”‚
โ”‚ โฑ 1818 โ”‚ โ”‚ โ”‚ โ”‚ module.state_dict(destination=destination, prefix=prefix + name + '.', k โ”‚
โ”‚ 1819 โ”‚ โ”‚ for hook in self._state_dict_hooks.values(): โ”‚
โ”‚ 1820 โ”‚ โ”‚ โ”‚ hook_result = hook(self, destination, prefix, local_metadata) โ”‚
โ”‚ 1821 โ”‚ โ”‚ โ”‚ if hook_result is not None: โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1818 in state_dict โ”‚
โ”‚ โ”‚
โ”‚ 1815 โ”‚ โ”‚ self._save_to_state_dict(destination, prefix, keep_vars) โ”‚
โ”‚ 1816 โ”‚ โ”‚ for name, module in self._modules.items(): โ”‚
โ”‚ 1817 โ”‚ โ”‚ โ”‚ if module is not None: โ”‚
โ”‚ โฑ 1818 โ”‚ โ”‚ โ”‚ โ”‚ module.state_dict(destination=destination, prefix=prefix + name + '.', k โ”‚
โ”‚ 1819 โ”‚ โ”‚ for hook in self._state_dict_hooks.values(): โ”‚
โ”‚ 1820 โ”‚ โ”‚ โ”‚ hook_result = hook(self, destination, prefix, local_metadata) โ”‚
โ”‚ 1821 โ”‚ โ”‚ โ”‚ if hook_result is not None: โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1818 in state_dict โ”‚
โ”‚ โ”‚
โ”‚ 1815 โ”‚ โ”‚ self._save_to_state_dict(destination, prefix, keep_vars) โ”‚
โ”‚ 1816 โ”‚ โ”‚ for name, module in self._modules.items(): โ”‚
โ”‚ 1817 โ”‚ โ”‚ โ”‚ if module is not None: โ”‚
โ”‚ โฑ 1818 โ”‚ โ”‚ โ”‚ โ”‚ module.state_dict(destination=destination, prefix=prefix + name + '.', k โ”‚
โ”‚ 1819 โ”‚ โ”‚ for hook in self._state_dict_hooks.values(): โ”‚
โ”‚ 1820 โ”‚ โ”‚ โ”‚ hook_result = hook(self, destination, prefix, local_metadata) โ”‚
โ”‚ 1821 โ”‚ โ”‚ โ”‚ if hook_result is not None: โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1815 in state_dict โ”‚
โ”‚ โ”‚
โ”‚ 1812 โ”‚ โ”‚ if hasattr(destination, "_metadata"): โ”‚
โ”‚ 1813 โ”‚ โ”‚ โ”‚ destination._metadata[prefix[:-1]] = local_metadata โ”‚
โ”‚ 1814 โ”‚ โ”‚ โ”‚
โ”‚ โฑ 1815 โ”‚ โ”‚ self._save_to_state_dict(destination, prefix, keep_vars) โ”‚
โ”‚ 1816 โ”‚ โ”‚ for name, module in self._modules.items(): โ”‚
โ”‚ 1817 โ”‚ โ”‚ โ”‚ if module is not None: โ”‚
โ”‚ 1818 โ”‚ โ”‚ โ”‚ โ”‚ module.state_dict(destination=destination, prefix=prefix + name + '.', k โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py:268 in โ”‚
โ”‚ _save_to_state_dict โ”‚
โ”‚ โ”‚
โ”‚ 265 โ”‚ โ”‚ โ”‚
โ”‚ 266 โ”‚ โ”‚ try: โ”‚
โ”‚ 267 โ”‚ โ”‚ โ”‚ if reorder_layout: โ”‚
โ”‚ โฑ 268 โ”‚ โ”‚ โ”‚ โ”‚ self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices) โ”‚
โ”‚ 269 โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ 270 โ”‚ โ”‚ โ”‚ super()._save_to_state_dict(destination, prefix, keep_vars) โ”‚
โ”‚ 271 โ”‚
โ”‚ โ”‚
โ”‚ /home/gpu/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:100 in โ”‚
โ”‚ undo_layout โ”‚
โ”‚ โ”‚
โ”‚ 97 โ”‚ outputs[tile_indices.flatten()] = tensor โ”‚
โ”‚ 98 โ”‚ outputs = outputs.reshape(tile_rows, tile_cols, cols // tile_cols, rows // tile_rows โ”‚
โ”‚ 99 โ”‚ outputs = outputs.permute(3, 0, 2, 1) # (rows // tile_rows, tile_rows), (cols // ti โ”‚
โ”‚ โฑ 100 โ”‚ return outputs.reshape(rows, cols).contiguous() โ”‚
โ”‚ 101 โ”‚
โ”‚ 102 โ”‚
โ”‚ 103 class MatMul8bit(torch.autograd.Function): โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
OutOfMemoryError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 23.69 GiB total capacity; 21.01 GiB already allocated; 77.06 MiB free; 22.23 GiB reserved in total by PyTorch) If reserved memory is >>
allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How do I use the chat feature?

  1. Is there a saved checkpoint that I can use to load the chatting feature of the model?
  2. Also, I tried fine tuning the model by following the instructions given in the chat folder's README.md. I am getting, the below error, any help would be appreciated.

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 2185, in create_repo
    hf_raise_for_status(r)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 301, in hf_raise_for_status
    raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create (Request ID: Root=1-647a7100-1e66ef850974653b0ad5470f)

You don't have the rights to create a model under this namespace

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/lewtun/starchat-alpha

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 348, in <module>
    main()
  File "train.py", line 257, in main
    trainer = Trainer(
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/transformers/trainer.py", line 551, in __init__
    self.init_git_repo(at_init=True)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/transformers/trainer.py", line 3532, in init_git_repo
    create_repo(repo_name, token=self.args.hub_token, private=self.args.hub_private_repo, exist_ok=True)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 2193, in create_repo
    self.repo_info(repo_id=repo_id, repo_type=repo_type, token=token)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1794, in repo_info
    return method(
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1604, in model_info
    hf_raise_for_status(r)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-647a7100-127bed5713d651f01e5d2b19)

Repository Not Found for url: https://huggingface.co/api/models/lewtun/starchat-alpha.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 635862 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 635863 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 635864 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 635865 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 635866 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 635867 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 635870 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 635861) of binary: /home/ec2-user/anaconda3/bin/python
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ec2-user/anaconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------


only small amount coming as output.

I run the model and when running the model this message appears:
Setting pad_token_id to eos_token_id:0 for open-end generation.
Input length of input_ids is 21, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

and the output is only small characters not so much of words like the demo of the model page in the huggingface website.

AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size

Hi

I am working on FineTuning StarCoder by following the README in the /chat directory. I encounter the following Assertion error:

AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 256 != 4 * 8 * 1
	ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 91809) of binary: /home/ubuntu/anaconda3/envs/chat/bin/python

when I run

TRANSFORMERS_VERBOSITY=info torchrun --nproc_per_node=8 train.py config.yaml --deepspeed=deepspeed_z3_config_bf16.json

System info:

  • OS: Ubuntu 22.04.5
  • GPU count and types: 8 X A100 (80GB) GPUs
  • Python version: 3.10
  • deepspeed: 0.9.2
  • accelerate: 0.19.0

Has anyone encountered this issue? It looks very similar to the issue. Looks like the world_size in DeepSpeed package is always 1.

Any pointers will be greatly appreciated. Thanks in advance.

finetune time

use A800 80g, how long it takes to finetune? I am stucking...

Batching Errors

I'm seeing batching errors when updating to the latest text-generation-inference container.

Latest container image:

ghcr.io/huggingface/text-generation-inference                                        latest                                      7b12068effa3   2 hours ago     9.15GB

I cloned the model repo, which is the only differece between my setup and the one-line docker command provided in the readme.

Here is my interactive session:

deepops@a100:~/Projects/starcoder$ NV_GPU=0 nvidia-docker run  -p 8080:80 -v /raid/data:/data -e HUGGING_FACE_HUB_TOKEN=<removed> -e HF_HUB_ENABLE_HF_TRANSFER=0 -ti --rm --entrypoint bash ghcr.io/huggingface/text-generation-inference:latest
root@fce683d4ae5a:/usr/src# text-generation-launcher --model-id /data/starcoder --max-total-tokens 8192
2023-05-10T17:11:17.297074Z  INFO text_generation_launcher: Args { model_id: "/data/starcoder", revision: None, sharded: None, num_shard: Some(1), quantize: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 8192, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false }
2023-05-10T17:11:17.297178Z  INFO text_generation_launcher: Starting download process.
2023-05-10T17:11:18.827834Z  INFO download: text_generation_launcher: Files are already present on the host. Skipping download.

I notice in the startup that max_batch_size is set to None.

I have a VS Code session with the HF Code Autocomplete plugin driving requests to the generate endpoint. Batch 1 works fine, but when typing out paces responses, I start to see batching errors on the inference server.

Specific error:

2023-05-10T17:15:27.396195Z ERROR shard-manager: text_generation_launcher: Method Decode encountered an error.
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 58, in serve
    server.serve(model_id, revision, sharded, quantize, uds_path)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
    asyncio.run(serve_inner(model_id, revision, sharded, quantize))
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.9/site-packages/grpc_interceptor/server.py", line 159, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/interceptor.py", line 20, in intercept
    return await response
  File "/opt/conda/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 86, in Decode
    batch = self.model.batch_type.concatenate(batches)
  File "/opt/conda/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py", line 351, in concatenate
    _, num_heads, padded_sequence_length, head_dim = first_past_kvs[0][1].shape
IndexError: index 1 is out of bounds for dimension 0 with size 1
 rank=0
2023-05-10T17:15:27.396322Z ERROR batch{batch_size=2}:decode:decode{size=2}:decode{size=2}: text_generation_client: router/client/src/lib.rs:33: Server error: index 1 is out of bounds for dimension 0 with size 1

Fine-tuning on other programming languages than Python

Hello,

It is really exciting to see your work! May I know if the codes for fine-tuning on other programming languages will be released in the near future? Up to now, the only version that could be found online is about Python though.

Thanks in advance.

Dataset Preparation for Proprietary Python Library

@loubnabnl and @ArmelRandy

Thank you for your work on CodeLLM Starcoder. I greatly appreciate it.

I'm interested in fine-tuning the Starcoder on one of the pythonic library for my company's code generation project.

Since this library hasn't been exposed publicly, this library hasn't been used to train startcoder. Hence, the starchat doesn't perform a good job at generating codes in this pythonic library.

I have a question regarding the dataset generation for this use-case:
I have a few code examples that I am planning to add in the prompt - completion pairs. But apart from these , the python library has a lot of commands that aren't used in the code examples. In order to make LLM aware about these commands , would it be beneficial to include mappings of commands to their explanations?

For example, let's consider the numpy library , my dataset would look like this

Query: Return a sorted copy of an array.
Command: numpy.sort(a, axis=-1, kind=None, order=None)

<If I have a code example , I will also include that in the dataset, but if I don't have an example>

Query: Sort array [[1,4],[3,1]] along the last axis.
Command: a = np.array([[1,4],[3,1]]) \n np.sort(a)

Hardware requirements for inference and fine tuning.

Hey,

I am finishing a project on evaluating code language models on "creative" programming (shadercode). Just yesterday I finished fine-tuning sanatacoder on three different datasets to evaluate on my metric.

Now this new project popped up but it's vastly larger. Prohibitively so. I was unable to run 6B models on the RTX A5000 I have access to.
For finetuning santacoder (no_fp16, batch_size 2 and sequence length of 2048) 97% of the 24GB VRAM was used using a slightly adapted version of the provided script.

Do you have any numbers on what requirements there are for PEFT on this model? And is there a set way to run inference with the hardware I listed?

Thanks for releasing yet another interesting model!

Has anyone attempted to fine-tune the Starcoder model with your own code?

As per the title, I have attempted to fine-tune Starcoder with my own 400MB Python code. I concatenated all .py files into a single text file, similar to the content column of the bigcode/the-stack-dedup Parquet. I then scanned the text and sliced code snippets with 1024 characters to train the model for 1000 steps. However, during validation, I could not achieve any recall of my own code.

I have been informed that fine-tuning a language model does not require a large dataset.

Missing dependency in "latest" text-generation-inference image

Dear Bigcode contributors,

I'm getting an error about missing pygments dependency when trying to serve bigcode/starcoderbase using the current "latest" text-generation-image at: ghcr.io/huggingface/text-generation-inference:latest

{"timestamp":"2023-05-24T20:01:28.796631Z","level":"ERROR","fields":{"message":"Shard 0 failed to start:\nTraceback (most recent call last):\n\n File "/usr/src/transformers/src/transformers/utils/import_utils.py", line 1157, in _get_module\n return importlib.import_module("." + module_name, self.name)\n\n File "/opt/conda/lib/python3.9/importlib/init.py", line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n\n File "", line 1030, in _gcd_import\n\n File "", line 1007, in _find_and_load\n\n File "", line 986, in _find_and_load_unlocked\n\n File "", line 680, in _load_unlocked\n\n File "", line 850, in exec_module\n\n File "", line 228, in _call_with_frames_removed\n\n File "/usr/src/transformers/src/transformers/modeling_utils.py", line 83, in \n from accelerate import version as accelerate_version\n\n File "/opt/conda/lib/python3.9/site-packages/accelerate/init.py", line 32, in \n from .utils import rich\n\n File "/opt/conda/lib/python3.9/site-packages/accelerate/utils/rich.py", line 19, in \n from rich.traceback import install\n\n File "/home/toolkit/.local/lib/python3.9/site-packages/rich/traceback.py", line 11, in \n from pygments.lexers import guess_lexer_for_filename\n\nModuleNotFoundError: No module named 'pygments'\n\n\nThe above exception was the direct cause of the following exception:\n\n\nTraceback (most recent call last):\n\n File "/opt/conda/bin/text-generation-server", line 8, in \n sys.exit(app())\n\n File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 58, in serve\n from text_generation_server import server\n\n File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 12, in \n from text_generation_server.cache import Cache\n\n File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cache.py", line 3, in \n from text_generation_server.models.types import Batch\n\n File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/init.py", line 10, in \n from text_generation_server.models.flash_causal_lm import FlashCausalLM\n\n File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_causal_lm.py", line 10, in \n from transformers import AutoTokenizer, PreTrainedTokenizerBase, PreTrainedModel\n\n File "", line 1055, in _handle_fromlist\n\n File "/usr/src/transformers/src/transformers/utils/import_utils.py", line 1147, in getattr\n module = self._get_module(self._class_to_module[name])\n\n File "/usr/src/transformers/src/transformers/utils/import_utils.py", line 1159, in _get_module\n raise RuntimeError(\n\nRuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):\nNo module named 'pygments'\n\n"},"target":"text_generation_launcher"}
{"timestamp":"2023-05-24T20:01:28.796804Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
Error: ShardCannotStart

torch.cuda.OutOfMemoryError: CUDA out of memory When Trying to Save the Model

Howdy!

I am using the finetune/finetune.py script. It trains on NVIDIA A40, and at the end when it tries to save the model/checkpoints it raises the torch.cuda.OutOfMemoryError: CUDA out of memory error.

Here is a full traceback:

Traceback (most recent call last):
  File "/scratch/user/seyyedaliayati/auto-test-gpt/finetune.py", line 336, in <module>
    main(args)
  File "/scratch/user/seyyedaliayati/auto-test-gpt/finetune.py", line 325, in main
    run_training(args, train_dataset, eval_dataset)
  File "/scratch/user/seyyedaliayati/auto-test-gpt/finetune.py", line 313, in run_training
    trainer.train()
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/transformers/trainer.py", line 2019, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/transformers/trainer.py", line 2308, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/transformers/trainer.py", line 2365, in _save_checkpoint
    self.save_model(output_dir, _internal_call=True)
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/transformers/trainer.py", line 2866, in save_model
    self._save(output_dir)
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/transformers/trainer.py", line 2909, in _save
    state_dict = self.model.state_dict()
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1448, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1448, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1448, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  [Previous line repeated 4 more times]
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1445, in state_dict
    self._save_to_state_dict(destination, prefix, keep_vars)
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 268, in _save_to_state_dict
    self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
  File "/scratch/user/seyyedaliayati/.conda/envs/env/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 96, in undo_layout
    outputs = torch.empty_like(tensor)  # note: not using .index_copy because it was slower on cuda
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 47.38 GiB total capacity; 44.56 GiB already allocated; 109.19 MiB free;                                46.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for                                Memory Management and PYTORCH_CUDA_ALLOC_CONF

Any ideas what's happening and how to solve this issue?

Trying to fine tune starcoderbase model using finetuning.oy - multiple GPUs

I am trying to fine tune bigcode/starcoderbase model on compute A100 with 8 GPUs 80Gb VRAM.
My initial steps are to adjust parameters.
I get some impression that it becomes slow if I increase batch size from 1 to 32 with total 256.
I can see the memory usage increases from 5Gb to 61Gb and I assume it utilizes more memory, but
the progress bar does not come up for initial for more than 5 hours while if I use batch size 1 with total 16 for compute the progress comes and estimates 2.5 hours for my initial fine-tuning. I can see in both cases it alternates between GPUs with 100% utilization for any box at a time. It just goes way slower with batch size of 32.
I assume it should be faster with bigger batch size, but impression is like when I increase batch size it feels like multiplying the training steps instead of increasing the speed.

Here's my code

import os  
# map all available GPUs to the visible device list
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"
os.environ["WANDB_DISABLED"] = "true"

import argparse
import os

import torch
from accelerate import Accelerator
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, set_peft_model_state_dict
from torch.utils.data import IterableDataset
from tqdm import tqdm
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, logging, set_seed
from transformers import TrainerCallback, TrainingArguments, TrainerState, TrainerControl
from transformers.trainer_utils import PREFIX_CHECKPOINT_DIR
from pynvml import *

def print_gpu_utilization():
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(handle)
    print(f"GPU memory occupied: {info.used//1024**2} MB.")


def print_summary(result):
    print(f"Time: {result.metrics['train_runtime']:.2f}")
    print(f"Samples/second: {result.metrics['train_samples_per_second']:.2f}")
    print_gpu_utilization()

x
"""
Fine-Tune StarCoder on Private code
"""

class SavePeftModelCallback(TrainerCallback):
    def on_save(
        self,
        args: TrainingArguments,
        state: TrainerState,
        control: TrainerControl,
        **kwargs,
    ):
        checkpoint_folder = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")

        kwargs["model"].save_pretrained(checkpoint_folder)

        pytorch_model_path = os.path.join(checkpoint_folder, "pytorch_model.bin")
        torch.save({}, pytorch_model_path)
        return control


class LoadBestPeftModelCallback(TrainerCallback):
    def on_train_end(
        self,
        args: TrainingArguments,
        state: TrainerState,
        control: TrainerControl,
        **kwargs,
    ):
        print(f"Loading best peft model from {state.best_model_checkpoint} (score: {state.best_metric}).")
        best_model_path = os.path.join(state.best_model_checkpoint, "adapter_model.bin")
        adapters_weights = torch.load(best_model_path)
        model = kwargs["model"]
        set_peft_model_state_dict(model, adapters_weights)
        return control
    

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_path", type=str, default="bigcode/starcoderbase")
    parser.add_argument("--training_data", type=str, default="training-data/training_data_12151.json")
    parser.add_argument("--subset", type=str)
    parser.add_argument("--split", type=str)
    parser.add_argument("--size_valid_set", type=int, default=10000)
    parser.add_argument("--streaming", action="store_true")
    parser.add_argument("--shuffle_buffer", type=int, default=5000)

    # parser.add_argument("--input_column_name", type=str, default="prompt")
    # parser.add_argument("--output_column_name", type=str, default="completion")

    parser.add_argument("--seq_length", type=int, default=2048)
    parser.add_argument("--max_steps", type=int, default=10000)
    parser.add_argument("--batch_size", type=int, default=1)
    parser.add_argument("--gradient_accumulation_steps", type=int, default=16)
    parser.add_argument("--eos_token_id", type=int, default=49152)

    parser.add_argument("--lora_r", type=int, default=16)
    parser.add_argument("--lora_alpha", type=int, default=32)
    parser.add_argument("--lora_dropout", type=float, default=0.05)

    parser.add_argument("--learning_rate", type=float, default=5e-6)
    parser.add_argument("--lr_scheduler_type", type=str, default="cosine")
    parser.add_argument("--num_warmup_steps", type=int, default=100)
    parser.add_argument("--weight_decay", type=float, default=0.05)

    parser.add_argument("--local_rank", type=int, default=0)
    parser.add_argument("--no_fp16", action="store_false")
    parser.add_argument("--bf16", action="store_true", default=True)
    parser.add_argument("--no_gradient_checkpointing", action="store_false", default=False)
    parser.add_argument("--seed", type=int, default=0)
    parser.add_argument("--num_workers", type=int, default=None)
    parser.add_argument("--output_dir", type=str, default="./checkpoints")
    parser.add_argument("--log_freq", default=100, type=int)
    parser.add_argument("--eval_freq", default=100, type=int)
    parser.add_argument("--save_freq", default=1000, type=int)

    return parser.parse_args(args=[
        "--max_steps", "200",
        "--num_workers", "20",
        "--seq_length", "1024",
        "--batch_size", "1",
        "--log_freq", "1",
        "--num_warmup_steps", "100"     
    ])

def chars_token_ratio(dataset, tokenizer, nb_examples=400):
    """
    Estimate the average number of characters per token in the dataset.
    """
    total_characters, total_tokens = 0, 0
    for _, example in tqdm(zip(range(nb_examples), iter(dataset)), total=nb_examples):
        text = prepare_sample_text(example)
        total_characters += len(text)
        if tokenizer.is_fast:
            total_tokens += len(tokenizer(text).tokens())
        else:
            total_tokens += len(tokenizer.tokenize(text))

    return total_characters / total_tokens


def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

def prepare_sample_text(item):
    """Prepare the text from a sample of the dataset."""
    metadata = item['metadata']  
    code_snippet = item['code_snippet'] 

    prompt = f"Metadata: {metadata} | Code:"  
    text = f"{prompt} {code_snippet}"
    return text

class CodeConstantLengthDataset(IterableDataset):
    """
    Iterable dataset that returns constant length chunks of tokens from stream of text files.
        Args:
            tokenizer (Tokenizer): The processor used for proccessing the data.
            dataset (dataset.Dataset): Dataset with text files.
            infinite (bool): If True the iterator is reset after dataset reaches end else stops.
            seq_length (int): Length of token sequences to return.
            num_of_sequences (int): Number of token sequences to keep in buffer.
            chars_per_token (int): Number of characters per token used to estimate number of tokens in text buffer.
    """

    def __init__(
        self,
        tokenizer,
        dataset,
        infinite=False,
        seq_length=1024,
        num_of_sequences=1024,
        chars_per_token=3.6
    ):
        self.tokenizer = tokenizer
        self.dataset = dataset
        self.concat_token_id = tokenizer.eos_token_id
        self.seq_length = seq_length
        self.infinite = infinite
        self.current_size = 0
        self.max_buffer_size = seq_length * chars_per_token * num_of_sequences

    def __iter__(self):
        iterator = iter(self.dataset)
        more_examples = True
        while more_examples:
            buffer, buffer_len = [], 0
            while True:
                if buffer_len >= self.max_buffer_size:
                    break
                try:
                    item = next(iterator)
                    buffer.append(prepare_sample_text(next(iterator)))
                    buffer_len += len(buffer[-1])
                except StopIteration:
                    if self.infinite:
                        iterator = iter(self.dataset)
                    else:
                        more_examples = False
                        break
            tokenized_inputs = self.tokenizer(buffer, truncation=False)["input_ids"]
            all_token_ids = []
            for tokenized_input in tokenized_inputs:
                all_token_ids.extend(tokenized_input + [self.concat_token_id])
            for i in range(0, len(all_token_ids), self.seq_length):
                input_ids = all_token_ids[i : i + self.seq_length]
                if len(input_ids) == self.seq_length:
                    self.current_size += 1
                    yield {
                        "input_ids": torch.LongTensor(input_ids),
                        "labels": torch.LongTensor(input_ids),
                    }

def create_datasets(tokenizer, args):
    dataset = load_dataset('json', data_files=args.training_data)
    # first shuffle dataset
    dataset['train'] = dataset['train'].shuffle(seed=42)
    # split dataset into train and test, note there is no test set, so we need to split train
    dataset = dataset['train'].train_test_split(test_size=0.1)

    train_data = dataset["train"]
    valid_data = dataset["test"]

    print(f"Size of the train set: {len(train_data)}. Size of the validation set: {len(valid_data)}")

    chars_per_token = chars_token_ratio(train_data, tokenizer)
    print(f"The character to token ratio of the dataset is: {chars_per_token:.2f}")

    train_dataset = CodeConstantLengthDataset(
        tokenizer,
        train_data,
        infinite=True,
        seq_length=args.seq_length,
        chars_per_token=chars_per_token
    )
    valid_dataset = CodeConstantLengthDataset(
        tokenizer,
        valid_data,
        infinite=False,
        seq_length=args.seq_length,
        chars_per_token=chars_per_token
    )
    return train_dataset, valid_dataset

args = get_args()
set_seed(args.seed)
os.makedirs(args.output_dir, exist_ok=True)
logging.set_verbosity_info()
tokenizer = AutoTokenizer.from_pretrained(args.model_path, use_auth_token=True)
train_dataset, eval_dataset = create_datasets(tokenizer, args)
train_data = train_dataset
val_data = eval_dataset

print("Loading the model")
# disable caching mechanism when using gradient checkpointing
model = AutoModelForCausalLM.from_pretrained(
    args.model_path,
    use_auth_token=True,
    use_cache=not args.no_gradient_checkpointing,
    load_in_8bit=True,
    device_map="auto"#{"": Accelerator().process_index},
)
model = prepare_model_for_int8_training(model)

lora_config = LoraConfig(
    r=args.lora_r,
    lora_alpha=args.lora_alpha,
    lora_dropout=args.lora_dropout,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules = ["c_proj", "c_attn", "q_attn"]
)

model = get_peft_model(model, lora_config)

print_trainable_parameters(model)

print("Starting main loop")

training_args = TrainingArguments(
    output_dir=args.output_dir,
    dataloader_drop_last=True,
    evaluation_strategy="steps",
    max_steps=args.max_steps,
    eval_steps=args.eval_freq,
    save_steps=args.save_freq,
    logging_steps=args.log_freq,
    per_device_train_batch_size=args.batch_size,
    per_device_eval_batch_size=args.batch_size,
    learning_rate=args.learning_rate,
    lr_scheduler_type=args.lr_scheduler_type,
    warmup_steps=args.num_warmup_steps,
    gradient_accumulation_steps=args.gradient_accumulation_steps,
    gradient_checkpointing=not args.no_gradient_checkpointing,
    fp16=not args.no_fp16,
    bf16=args.bf16,
    weight_decay=args.weight_decay,
    run_name="StarCoderBase-finetuned",
    ddp_find_unused_parameters=False,
)

trainer = Trainer(model=model, args=training_args, train_dataset=train_data, eval_dataset=val_data, callbacks=[SavePeftModelCallback, LoadBestPeftModelCallback])

print("Training...")
result = trainer.train()
print_summary(result)

print("Saving last checkpoint of the model")
model.save_pretrained(os.path.join(args.output_dir, "final_checkpoint/"))

Some of the screens when I start fine-tuning with batch size of 1:

Screenshot 2023-05-26 at 10 34 09 AM Screenshot 2023-05-26 at 10 34 18 AM Screenshot 2023-05-26 at 10 34 38 AM Screenshot 2023-05-26 at 10 25 01 AM

error

(venv) PS D:\Python project\venv> python starcoder.py
Traceback (most recent call last):
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\utils_errors.py", line 259, in hf_raise_for_status
response.raise_for_status()
File "C:\WINDOWS\system32\venv\Lib\site-packages\requests\models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/bigcode/starcoder/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\utils\hub.py", line 409, in cached_file
resolved_file = hf_hub_download(
^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\utils_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\file_download.py", line 1166, in hf_hub_download
metadata = get_hf_file_metadata(
^^^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\utils_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\file_download.py", line 1507, in get_hf_file_metadata
hf_raise_for_status(r)
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\utils_errors.py", line 291, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64543c9b-38c8826659eb2e211b453be8)

Repository Not Found for url: https://huggingface.co/bigcode/starcoder/resolve/main/tokenizer_config.json.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Python project\venv\starcoder.py", line 6, in
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 642, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 486, in get_tokenizer_config
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\utils\hub.py", line 424, in cached_file
raise EnvironmentError(
OSError: bigcode/starcoder is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.
(venv) PS D:\Python project\venv> python starcoder.py
Traceback (most recent call last):
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\utils_errors.py", line 259, in hf_raise_for_status
response.raise_for_status()
File "C:\WINDOWS\system32\venv\Lib\site-packages\requests\models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/bigcode/starcoder/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\utils\hub.py", line 409, in cached_file
resolved_file = hf_hub_download(
^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\utils_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\file_download.py", line 1166, in hf_hub_download
metadata = get_hf_file_metadata(
^^^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\utils_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\file_download.py", line 1507, in get_hf_file_metadata
hf_raise_for_status(r)
File "C:\WINDOWS\system32\venv\Lib\site-packages\huggingface_hub\utils_errors.py", line 291, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64543d48-024479d8123941844694a131)

Repository Not Found for url: https://huggingface.co/bigcode/starcoder/resolve/main/config.json.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Python project\venv\starcoder.py", line 4, in
model = AutoModelForCausalLM.from_pretrained(model_ckpt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\models\auto\auto_factory.py", line 441, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 916, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\configuration_utils.py", line 573, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\configuration_utils.py", line 628, in _get_config_dict
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "C:\WINDOWS\system32\venv\Lib\site-packages\transformers\utils\hub.py", line 424, in cached_file
raise EnvironmentError(
OSError: bigcode/starcoder is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Hi, the model is to large to out of my GPU meomery. So i want to use 2 GPUS to dispatch the model, and use the device_map like:

{'transformer.wte': 0, 'lm_head': 0, 'transformer.wpe': 0, 'transformer.drop': 0, 'transformer.h.0': 0, 'transformer.h.1': 0, 'transformer.h.2': 0, 'transformer.h.3': 0, 'transformer.h.4': 0, 'transformer.h.5': 0, 'transformer.h.6': 0, 'transformer.h.7': 0, 'transformer.h.8': 0, 'transformer.h.9': 0, 'transformer.h.10': 0, 'transformer.h.11': 0, 'transformer.h.12': 0, 'transformer.h.13': 0, 'transformer.h.14': 0, 'transformer.h.15': 0, 'transformer.h.16': 0, 'transformer.h.17': 0, 'transformer.h.18': 0, 'transformer.h.19': 0, 'transformer.h.20': 0, 'transformer.h.21': 0, 'transformer.h.22': 0, 'transformer.h.23': 0, 'transformer.h.24': 0, 'transformer.h.25': 0, 'transformer.h.26.ln_1': 0, 'transformer.h.26.attn': 0, 'transformer.h.26.ln_2': 0, 'transformer.h.26.mlp.c_fc': 0, 'transformer.h.26.mlp.c_proj': 1, 'transformer.h.26.mlp.act': 1, 'transformer.h.26.mlp.dropout': 1, 'transformer.h.27': 1, 'transformer.h.28': 1, 'transformer.h.29': 1, 'transformer.h.30': 1, 'transformer.h.31': 1, 'transformer.h.32': 1, 'transformer.h.33': 1, 'transformer.h.34': 1, 'transformer.h.35': 1, 'transformer.h.36': 1, 'transformer.h.37': 1, 'transformer.h.38': 1, 'transformer.h.39': 1, 'transformer.ln_f': 1}

It can load the model Successfully but got an error when inference:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

How can i fix it.

Starcoder model integration in Huggingchat

Hello @ArmelRandy @lewtun , and other authors. In the readme file, I found that the "chat" Iink leads me to the Huggingchat page, but i can't seem to find any presence of Starcoder on the Huggingchat page.

If Starcoder has already been integrated into Huggingchat, would you mind directing me to where I can find the integration.

Thank you.

Obtaining different results when run locally

Hello everyone,
I'm trying to use StarcoderBase model locally to generate a docstring. The version in the bigcode-playground works perfectly, but when using the model locally I obtain really different results.

Since the app on the playground doesn't include if there are extra configurations for tokenizer or the model, I wondered if there is something that I was doing or maybe there is an actual problem when running the local model.
The example used in the playground:
\n\n/**\n@description <FILL_HERE>\n*/\npublic static int bitcount(int n) {\nint count = 0;\nwhile (n != 0) {\n n = (n ^ (n - 1));\n count++;\n}\nreturn count;\n}\n
My code so far is quite basic:

tokenizer = AutoTokenizer.from_pretrained(model_dir + model_name, local_files_only =True)
model = AutoModelForCausalLM.from_pretrained(model_dir + model_name, local_files_only=True, device_map="auto", trust_remote_code=True)
text = '<fim-prefix>\n\n/**\n@description <fim-suffix>\n*/\npublic static int bitcount(int n) {\nint count = 0;\nwhile (n != 0) {\n    n = (n ^ (n - 1));\n        count++;\n}\nreturn count;\n}\n<fim-middle>'

input_ids = tokenizer(text, return_tensors="pt", return_token_type_ids=False).input_ids.to('cuda')

generated_ids = model.generate(
input_ids, num_return_sequences=5,
do_sample=True, top_p= 0.9, #pad_token_id=tokenizer.pad_token_id,
temperature=0.3, max_new_tokens=128,
    repetition_penalty=1.2
)

I mainly get empty outputs or random HTML noise. Maybe someone has any advice or way to check that the model is up to date.

Best regards.

use the model offline

Can I deploy the model locally in a non-Internet environment and use StarCoder via the VS Code plugin? If so, how should I do it?

data preprocess code

From the hugging face blog
all code for data preprocessing will be released

Where are these codes?
Could you provide a link?

Thanks!

Cuda OOM during generate() call on 4 GPUs

I'm using the starcoderbase model across four GPUs to run inference. For some reason, one of the prompts I'm using causes a CUDA OOM error. Putting the tensor on GPU shows it's about 420 MB (the shape is [[1, 6086]]), which still seems high, but not high enough to cause the error. The error occurs during the call to generate, after the model has been parallelized and the input has been put on GPU. The full error is:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.59 GiB (GPU 0; 31.75 GiB total capacity; 
23.14 GiB already allocated; 2.50 GiB free; 28.32 GiB reserved in total by PyTorch) 
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Here's a min example of my code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, CodeGenForCausalLM, CodeGenTokenizer, AutoConfig
from huggingface_hub import hf_hub_download, snapshot_download
import accelerate


with open("single_prompt.txt", "r") as f:
    prompt = f.read()

tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoderbase", revision="main", device_map="auto")
weights_location = snapshot_download("bigcode/starcoderbase") # The model is sharded and there's a .index.json file to direct the shards

config = AutoConfig.from_pretrained("bigcode/starcoderbase", pad_token_id=tokenizer.eos_token_id)
with accelerate.init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)

model.tie_weights()

model = accelerate.load_checkpoint_and_dispatch(model, weights_location, device_map="auto", no_split_module_classes=["GPTBigCodeBlock"])

prompt_tokenized = tokenizer(prompt, return_tensors="pt", truncation=False)

# Manually truncate
max_length = 8192
max_new_tokens = 256

token_len = prompt_tokenized["input_ids"].shape[1]
if token_len > max_length - max_new_tokens:
    prompt_tokenized["input_ids"] = prompt_tokenized["input_ids"][:, token_len - (max_length - max_new_tokens):]
    prompt_tokenized["attention_mask"] = prompt_tokenized["attention_mask"][:, token_len - (max_length - max_new_tokens):]


prompt_tokenized = prompt_tokenized.to("cuda")
sample = model.generate(**prompt_tokenized, max_new_tokens=max_new_tokens)
pc = tokenizer.decode(sample[:, prompt_tokenized["input_ids"].shape[1]:][0], truncate_before_pattern=[r"\n\n^#", "^'''", "\n\n\n"])```

And, it's worth noting that I don't have the same issue with similarly sized models like ```CodeGen 16B```, although the max context length is smaller for those models which might contribute to the issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.