Git Product home page Git Product logo

Comments (6)

cdxeve avatar cdxeve commented on June 3, 2024 1

Hi, thanks for your feedback🤗! The prompt template that uses system prompt and "[/INST]" is specifically designed for the chat model.

We highly recommend switching from 'AdaptLLM/finance-LLM' to 'AdaptLLM/finance-chat' for improved response quality.

Regarding your use-case, here's an example using the recommended 'finance-chat' model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("AdaptLLM/finance-chat")
tokenizer = AutoTokenizer.from_pretrained("AdaptLLM/finance-chat", use_fast=False)

# Put your query here
query_str = 'xxx'

your_system_prompt = 'Please, check if the answer can be inferred from the pieces of context provided. If the answer cannot be inferred from the context, just state that the question is out of scope and do not provide any answer.'

# Please integrate 'your system prompt' into the input instruction part following 'our system prompt'.
query_prompt = f"<s>[INST] <<SYS>>\nYou are a helpful, respectful, and honest assistant. Always answer as helpfully as possible, while being safe. Your responses should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{your_system_prompt}\n{query_str} [/INST]"

# NOTE: another option might be: skipping our system prompt and directly starting from your system prompt like this:
# query_prompt = f"{your_system_prompt}\n{query_str} [/INST]"

inputs = tokenizer(query_prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_length=4096)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(f'### User Query:\n{query_str}\n\n### Assistant Output:\n{pred}')

Feel free to let us know if you have any more questions🤗.

from lmops.

mirix avatar mirix commented on June 3, 2024 1

Thanks for the advice, I will try that immediately.

But that configuration has been tested with many models. I have also tried going with the defaults and many other combinations.

from lmops.

mirix avatar mirix commented on June 3, 2024

Hi,

I am trying 'AdaptLLM/finance-chat' as suggested and it seems to work fine.

However, the generation configuration does not seem to be taken into account.

First, with transformers 4.36.2, I receive the following warning twice:

/home/emoman/.local/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:389: UserWarning: do_sampleis set toFalse. However, temperatureis set to0.9-- this flag is only used in sample-based generation modes. You should setdo_sample=Trueor unsettemperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.

It seems that the generation kwargs from the script are completely ignored and that they are read directly from 'generation_config.json'.

So, if I alter that file to:

{
    "_from_model_config": true,
    "bos_token_id": 1,
    "eos_token_id": 2,
    "pad_token_id": 32000,
    "do_sample": true,
    "temperature": 0.0000001,
    "top_p": 0.0000001,
    "top_p": 1,
    "repetition_penalty": 0.1,
    "transformers_version": "4.31.0.dev0"
}

The warnings disappear, but the model keeps repeating itself, which would seem to indicate that 'repetition_penalty' is being ignored.

Some people suggest setting '_from_model_config' to false, but it does not change anything.

from lmops.

cdxeve avatar cdxeve commented on June 3, 2024

Hi, thanks for the feedback. I think we can resolve this warning by unsetting temperature and top_p.

Remove temperature and top_p from the generation_config.json file, making it look like this:

{
    "_from_model_config": true,
    "bos_token_id": 1,
    "eos_token_id": 2,
    "pad_token_id": 32000,
    "transformers_version": "4.31.0.dev0"
}

I've tested this with transformers version 4.36.2, and it works fine now.

from lmops.

mirix avatar mirix commented on June 3, 2024

Yes, thank you. It works.

But, generally speaking, I believe that the generation kwargs explicitly set on the script should override the default configuration file.

I would seem that, if they don't, it means that transformers has switched to some sort of legacy mode.

Finally, the model does respond to repetition_penalty and other generation parameters.

But it is extremely capricious and I haven't found a way to consistently avoid repetition other than post-processing.

It is a pity because the model seem very good for my purposes.

I believe that this volatility may be intrinsic to vanilla Llama-2 and it is not a consequence of the "reading comprehension" adaptation.

That being the case, perhaps the best solution would be to replace vanilla Llama with something better stabilised such as Mistral, for instance. Tulu also shows a very steady behaviour.

from lmops.

cdxeve avatar cdxeve commented on June 3, 2024

Hi,

Thanks for your recommendation to switch our base models to Mistral and Tule. Mistral is indeed in our future plans.

Regarding this issue:

But, generally speaking, I believe that the generation kwargs explicitly set on the script should override the default configuration file

I completely agree that "generation kwargs explicitly set on the script should override the default configuration file".

But there might exist some conflicts in your config settings.

{
    "_from_model_config": true,
    "bos_token_id": 1,
    "eos_token_id": 2,
    "pad_token_id": 32000,
    "do_sample": true,
    "temperature": 0.0000001,
    "top_p": 0.0000001,
    "top_p": 1,
    "repetition_penalty": 0.1,
    "transformers_version": "4.31.0.dev0"
}

According to the official documentation: https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/text_generation#generation

Firstly, setting the temperature to an extremely small value near 0 (0.0000001) creates a highly concentrated token distribution, behaving similarly to "do_sample"=false. This contradicts your setting of "do_sample": true.

Secondly, there are conflicting values for top_p in your configuration.

Then, the repetition_penalty value of 0.1 would make the problem even worse, and a value higher than 1 such as 1.2 is recommended to solve repetition.

The simplest setting for your config is like this, and you may refer to the official documentation for your specific usecase

{
    "_from_model_config": true,
    "bos_token_id": 1,
    "eos_token_id": 2,
    "pad_token_id": 32000,
    "repetition_penalty": 1.2,
    "transformers_version": "4.31.0.dev0"
}

from lmops.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.