Bug Deion I was trying to do Tree Summarize with llamacpp an

Hey there, <a class="user-mention notranslate" data-hovercard-type="user" data-hoverc

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Bug]: TreeSummarize Llamacpp llm Requested tokens exceed context window about llama_index HOT 2 OPEN

syedfahimabrar commented on June 29, 2024

[Bug]: TreeSummarize Llamacpp llm Requested tokens exceed context window

from llama_index.

Comments (2)

dosubot commented on June 29, 2024 2

Hey there, @syedfahimabrar! I'm here to help you with any bugs, questions, or contribution-related matters. Let me know how I can assist you!

To resolve the issue of the Mistral-7b model adding a duplicate BOS token, which results in a warning and exceeds the context window when using TreeSummarize with llamacpp, you need to ensure that the BOS token is not added multiple times during the repacking of text chunks. This can be managed by modifying the messages_to_prompt function to handle the BOS token correctly.

Here is the relevant part of the messages_to_prompt function that ensures the prompt starts with a system prompt and avoids duplicate BOS tokens:

def messages_to_prompt(messages):
    prompt = ""
    for message in messages:
        if message.role == 'system':
            prompt += f"<|system|>\n{message.content}</s>\n"
        elif message.role == 'user':
            prompt += f"<|user|>\n{message.content}</s>\n"
        elif message.role == 'assistant':
            prompt += f"<|assistant|>\n{message.content}</s>\n"

    # ensure we start with a system prompt, insert blank if needed
    if not prompt.startswith("<|system|>\n"):
        prompt = "<|system|>\n</s>\n" + prompt

    # add final assistant prompt
    prompt = prompt + "<|assistant|>\n"

    return prompt

This function ensures that the prompt starts with a system prompt and avoids adding a duplicate BOS token by checking if the prompt already starts with "<|system|>\n". If it does not, it prepends "<|system|>\n</s>\n" to the prompt ^[1].

Additionally, you can modify the repack method in the PromptHelper class to handle the BOS token correctly:

class PromptHelper:
    def repack(self, summary_template, text_chunks):
        # Ensure BOS token is not duplicated
        processed_chunks = []
        for chunk in text_chunks:
            if chunk.startswith(BOS_TOKEN):
                chunk = chunk[len(BOS_TOKEN):]
            processed_chunks.append(chunk)
        
        # Repack the chunks to fit the context window
        # (Assuming repack logic here)
        repacked_chunks = self._repack_logic(summary_template, processed_chunks)
        
        return repacked_chunks

    def _repack_logic(self, summary_template, text_chunks):
        # Implement the logic to repack text chunks to fit the context window
        # This is a placeholder for the actual repacking logic
        pass

Make sure to replace BOS_TOKEN with the actual BOS token used by the Mistral-7b model. This modification ensures that the BOS token is not duplicated during the repacking process, thus preventing the warning and context window overflow ^[2].

By implementing these changes, you should be able to avoid the issue of duplicate BOS tokens and the resulting context window overflow.

To continue talking to Dosu, mention @dosu.

from llama_index.

logan-markewich commented on June 29, 2024

@syedfahimabrar set the tokenizer so that the token counting is correct

Settings.tokenizer = AutoTokenizer.from_pretrained(
    # I used llama2 here, use whatever model matches for you
    "NousResearch/Llama-2-7b-chat-hf" 
)

from llama_index.

Recommend Projects

[Bug]: TreeSummarize Llamacpp llm Requested tokens exceed context window about llama_index HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent