Comments (6)
Hi, thanks for your feedback🤗! The prompt template that uses system prompt and "[/INST]" is specifically designed for the chat model.
We highly recommend switching from 'AdaptLLM/finance-LLM' to 'AdaptLLM/finance-chat' for improved response quality.
Regarding your use-case, here's an example using the recommended 'finance-chat' model:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("AdaptLLM/finance-chat")
tokenizer = AutoTokenizer.from_pretrained("AdaptLLM/finance-chat", use_fast=False)
# Put your query here
query_str = 'xxx'
your_system_prompt = 'Please, check if the answer can be inferred from the pieces of context provided. If the answer cannot be inferred from the context, just state that the question is out of scope and do not provide any answer.'
# Please integrate 'your system prompt' into the input instruction part following 'our system prompt'.
query_prompt = f"<s>[INST] <<SYS>>\nYou are a helpful, respectful, and honest assistant. Always answer as helpfully as possible, while being safe. Your responses should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{your_system_prompt}\n{query_str} [/INST]"
# NOTE: another option might be: skipping our system prompt and directly starting from your system prompt like this:
# query_prompt = f"{your_system_prompt}\n{query_str} [/INST]"
inputs = tokenizer(query_prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_length=4096)[0]
answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)
print(f'### User Query:\n{query_str}\n\n### Assistant Output:\n{pred}')
Feel free to let us know if you have any more questions🤗.
from lmops.
Thanks for the advice, I will try that immediately.
But that configuration has been tested with many models. I have also tried going with the defaults and many other combinations.
from lmops.
Hi,
I am trying 'AdaptLLM/finance-chat' as suggested and it seems to work fine.
However, the generation configuration does not seem to be taken into account.
First, with transformers 4.36.2, I receive the following warning twice:
/home/emoman/.local/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:389: UserWarning:
do_sampleis set to
False. However,
temperatureis set to
0.9-- this flag is only used in sample-based generation modes. You should set
do_sample=Trueor unset
temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
It seems that the generation kwargs from the script are completely ignored and that they are read directly from 'generation_config.json'.
So, if I alter that file to:
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 32000,
"do_sample": true,
"temperature": 0.0000001,
"top_p": 0.0000001,
"top_p": 1,
"repetition_penalty": 0.1,
"transformers_version": "4.31.0.dev0"
}
The warnings disappear, but the model keeps repeating itself, which would seem to indicate that 'repetition_penalty' is being ignored.
Some people suggest setting '_from_model_config' to false, but it does not change anything.
from lmops.
Hi, thanks for the feedback. I think we can resolve this warning by unsetting temperature
and top_p
.
Remove temperature
and top_p
from the generation_config.json
file, making it look like this:
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 32000,
"transformers_version": "4.31.0.dev0"
}
I've tested this with transformers version 4.36.2, and it works fine now.
from lmops.
Yes, thank you. It works.
But, generally speaking, I believe that the generation kwargs explicitly set on the script should override the default configuration file.
I would seem that, if they don't, it means that transformers has switched to some sort of legacy mode.
Finally, the model does respond to repetition_penalty and other generation parameters.
But it is extremely capricious and I haven't found a way to consistently avoid repetition other than post-processing.
It is a pity because the model seem very good for my purposes.
I believe that this volatility may be intrinsic to vanilla Llama-2 and it is not a consequence of the "reading comprehension" adaptation.
That being the case, perhaps the best solution would be to replace vanilla Llama with something better stabilised such as Mistral, for instance. Tulu also shows a very steady behaviour.
from lmops.
Hi,
Thanks for your recommendation to switch our base models to Mistral and Tule. Mistral is indeed in our future plans.
Regarding this issue:
But, generally speaking, I believe that the generation kwargs explicitly set on the script should override the default configuration file
I completely agree that "generation kwargs explicitly set on the script should override the default configuration file".
But there might exist some conflicts in your config settings.
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 32000,
"do_sample": true,
"temperature": 0.0000001,
"top_p": 0.0000001,
"top_p": 1,
"repetition_penalty": 0.1,
"transformers_version": "4.31.0.dev0"
}
According to the official documentation: https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/text_generation#generation
Firstly, setting the temperature
to an extremely small value near 0 (0.0000001) creates a highly concentrated token distribution, behaving similarly to "do_sample"=false
. This contradicts your setting of "do_sample": true
.
Secondly, there are conflicting values for top_p
in your configuration.
Then, the repetition_penalty value of 0.1 would make the problem even worse, and a value higher than 1 such as 1.2 is recommended to solve repetition.
The simplest setting for your config is like this, and you may refer to the official documentation for your specific usecase
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 32000,
"repetition_penalty": 1.2,
"transformers_version": "4.31.0.dev0"
}
from lmops.
Related Issues (20)
- Backward pass is invalid for module in evaluation mode during minillm training with ZeRO parameter offload HOT 4
- RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
- the logits between MP=1 and MP=4 is different when control all other variables to be the same HOT 9
- Details for GPT4 evaluation
- The file name is missing l HOT 2
- Paper:ADAPTING LARGE LANGUAGE MODELS VIA READING COMPREHENSION HOT 3
- prompt_optimization HOT 2
- iS LLMA lossless? HOT 1
- top-p < 1 fails inf assertion HOT 1
- why is the mpu/cross_entropy missing a softmax_logits_t HOT 2
- [MiniLLM] sft of llama2-7b out of memory on V100 HOT 2
- [MiniLLM]LLama sft on Dolly hard to reproduce results in paper. HOT 2
- Questions about the free-law data used in the paper "Adapt LLM to domains" HOT 2
- 【MiniLLM】About the number of training data of dolly HOT 4
- [MiniLLM]Why dolly only has 12435 training samples? HOT 2
- [MiniLLM] About the gradient accumulation in finetune.py HOT 2
- [tuna] Libraries are conflicting and/or very aged HOT 5
- Missing Jailbreak dataset from protegi? HOT 2
- ImportError: cannot import name 'mpu' from 'transformers' HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lmops.