Git Product home page Git Product logo

Comments (2)

dosubot avatar dosubot commented on June 29, 2024 2

Hey there, @syedfahimabrar! I'm here to help you with any bugs, questions, or contribution-related matters. Let me know how I can assist you!

To resolve the issue of the Mistral-7b model adding a duplicate BOS token, which results in a warning and exceeds the context window when using TreeSummarize with llamacpp, you need to ensure that the BOS token is not added multiple times during the repacking of text chunks. This can be managed by modifying the messages_to_prompt function to handle the BOS token correctly.

Here is the relevant part of the messages_to_prompt function that ensures the prompt starts with a system prompt and avoids duplicate BOS tokens:

def messages_to_prompt(messages):
    prompt = ""
    for message in messages:
        if message.role == 'system':
            prompt += f"<|system|>\n{message.content}</s>\n"
        elif message.role == 'user':
            prompt += f"<|user|>\n{message.content}</s>\n"
        elif message.role == 'assistant':
            prompt += f"<|assistant|>\n{message.content}</s>\n"

    # ensure we start with a system prompt, insert blank if needed
    if not prompt.startswith("<|system|>\n"):
        prompt = "<|system|>\n</s>\n" + prompt

    # add final assistant prompt
    prompt = prompt + "<|assistant|>\n"

    return prompt

This function ensures that the prompt starts with a system prompt and avoids adding a duplicate BOS token by checking if the prompt already starts with "<|system|>\n". If it does not, it prepends "<|system|>\n</s>\n" to the prompt [1].

Additionally, you can modify the repack method in the PromptHelper class to handle the BOS token correctly:

class PromptHelper:
    def repack(self, summary_template, text_chunks):
        # Ensure BOS token is not duplicated
        processed_chunks = []
        for chunk in text_chunks:
            if chunk.startswith(BOS_TOKEN):
                chunk = chunk[len(BOS_TOKEN):]
            processed_chunks.append(chunk)
        
        # Repack the chunks to fit the context window
        # (Assuming repack logic here)
        repacked_chunks = self._repack_logic(summary_template, processed_chunks)
        
        return repacked_chunks

    def _repack_logic(self, summary_template, text_chunks):
        # Implement the logic to repack text chunks to fit the context window
        # This is a placeholder for the actual repacking logic
        pass

Make sure to replace BOS_TOKEN with the actual BOS token used by the Mistral-7b model. This modification ensures that the BOS token is not duplicated during the repacking process, thus preventing the warning and context window overflow [2].

By implementing these changes, you should be able to avoid the issue of duplicate BOS tokens and the resulting context window overflow.

To continue talking to Dosu, mention @dosu.

from llama_index.

logan-markewich avatar logan-markewich commented on June 29, 2024

@syedfahimabrar set the tokenizer so that the token counting is correct

Settings.tokenizer = AutoTokenizer.from_pretrained(
    # I used llama2 here, use whatever model matches for you
    "NousResearch/Llama-2-7b-chat-hf" 
)

from llama_index.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.