abacusai / long-context Goto Github PK

This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchmark tasks that evaluate a model’s information retrieval capabilities with context expansion. We also include key experimental results and instructions for reproducing and building on them.

License: Apache License 2.0

Python 92.30% Jupyter Notebook 6.64% Shell 1.06%

long-context's People

Contributors

Stargazers

Watchers

long-context's Issues

[Question] About Analysis of question and answer positioning

How can you change the answer position of the document?
Just extract the answer span and move it to other position of the document?

How do i increases the context of already fined tuned or base model of llama2 ?

I want to increase the context of the llama2 model, I have finetuned a model(70b and 7b) as well on my data now I want to increase their input context using yarn I was able to understand that we need data of 16k context if we want to increase the context can anyone clarify the procedures that follows after that

[OOM Error] Out of Memory with 32k tokens

Thank you for your valuable contribution! I have been experimenting with your evaluation codes on the LongChat-Lines dataset. However, I encountered an Out of Memory Error when the token length reached 32k.

I am fortunate to have multiple 80G A100 GPUs at my disposal. However, I noticed that your evaluation code does not incorporate parallel processing, and only one GPU is utilized during evaluation.

I would greatly appreciate it if you could provide more information about the resources used in the experimental section of your paper. Additionally, I am curious if you implemented any form of parallelization to enhance the evaluation process.

Thank you once again for your assistance!

The output is garbled

When I run (inference) the model on CPU, the output is garbled.
Here is my code:

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("/media/nvme/johnson/model-space/Giraffe-v1-Tokenizer", use_fast=False)
model = AutoModelForCausalLM.from_pretrained("/media/nvme/johnson/model-space/Giraffe-v1-delta-13b-scaled-16")
# model = AutoModelForCausalLM.from_pretrained("/media/nvme/johnson/model-space/13B-Alpaca-Base")
device = "cpu"

model.to(device)

generation_config = GenerationConfig(
    temperature=0.2,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.2,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    min_new_tokens=32,
    max_new_tokens=256,
)

prompts = [
    "Develop a C++ program that reads a text file line by line and counts the number of occurrences of a specific word in the file."
]

outputs = ""
for idx, prompt in enumerate(prompts):
    batch = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False).to(device)
    generated_ids = model.generate(**batch, generation_config=generation_config)
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True).lstrip()
    outputs += generated_text + "\n\n"
    print(f"=== EXAMPLE {idx} ===")
    print()
    print(generated_text)
    print()
    print("======================")
    print()

Here is my output. Obviously, it is garbled.

Could you please have a look at the issue? Thanks!

Errors installing: AttributeError: 'str' object has no attribute '_name_or_path'

Thanks for this, looking forward to getting stuck in, there's just some teething problems to get it all installed.

Issues:

Missing sentencepiece requirement
Error loading model from config

...

Update requirements

I needed to run pip install sentencepiece so I think you need to update your requirements.txt to include that.

Loading model

I had to update this line to:

base_model = AutoModelForCausalLM.from_pretrained(base_model_path, torch_dtype=torch.float16)

i.e. from AutoModelForCausalLM.from_config to AutoModelForCausalLM.from_pretrained because otherwise it tries to run base_model_path._name_or_path i.e. 'abacusai/Giraffe-v2-13b-32k'._name_or_path

Also, I think that this line is an error because delta_model is not defined as delta_model_path=None by default.

Maybe the function should be updated to reflect that:

def load_model(base_model_path: str, delta_model_path: str = None, **patch_args):
    '''Helper to load a model and patch it to support a longer context.
    
    For example to load Giraffe V2 with its trained scale:
    ```python
    model = load_model('abacusai/Giraffe-v2-13b-32k', scale=8)
    ```

    To load a delta model you need the original llama v1 weights available:
    ```python
    model = load_model('abacusai/Giraffe-v1-delta-13b-scaled-4', 'path/to/llama-13b', scale=4)

    See `ScaledLlamaRotaryEmbedding.patch` for information on additional arguments.
    '''
    from .interpolate import ScaledLlamaRotaryEmbedding
    import torch
    from transformers import AutoModelForCausalLM

    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_path,
        torch_dtype=torch.float16
    )
    
    if delta_model_path is None:
        ScaledLlamaRotaryEmbedding.patch(base_model, **patch_args)
        return base_model
    
    else:
        delta_model = AutoModelForCausalLM.from_config(delta_model_path, torch_dtype=torch.float16)
        for name, param in base_model.named_parameters():
            delta_param = delta_model.get_parameter(name)
            assert delta_param.shape == param.shape
            delta_param += param

        ScaledLlamaRotaryEmbedding.patch(delta_model, **patch_args)
        return delta_model

abacusai / long-context Goto Github PK

long-context's People

Contributors

Stargazers

Watchers

Forkers

long-context's Issues

[Question] About Analysis of question and answer positioning

How do i increases the context of already fined tuned or base model of llama2 ?

[OOM Error] Out of Memory with 32k tokens

The output is garbled

Errors installing: AttributeError: 'str' object has no attribute '_name_or_path'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent