Comments (4)
I believe this is due to the model having a default max_seq_len
of 2048. Our mosaicml/mpt-7b-chat
model should be able to extrapolate to longer sequences, but you have to set a different max_seq_len
via the script arguments.
For example
python hf_chat.py -n mosaicml/mpt-7b-chat --max_seq_len 4096 ...
from llm-foundry.
I think it'll eventually overflow unless the history is periodically pruned.
from llm-foundry.
True. Thanks for the suggestion.
We would need to add logic for that into the model's source code on the HF hub. I don't think we can make that change via llm-foundry code, so I'll close this issue for now and see if automatic pruning is an option for later.
Feel free to add more comments if other issues arise and I'll reopen if necessary.
from llm-foundry.
One possibility would be to prune the conversation in the ht_chat.py script to some number of previous Q/A pairs, e.g.,
def conversation(model, tokenizer: Tokenizer, user_inp: str, history: str,
**generate_kwargs: Dict[str, Any]) -> Tuple[str, str, float]:
if history != '':
+ if len(history.split("<|im_start|>")) > 12:
+ # note first element from split is the empty string
+ # so skip that and first user input and assistant response
+ newhistory = "<|im_start|>" + history.split("<|im_start|>")[1]
+ for y in history.split("<|im_start|>")[4:]:
+ newhistory += "<|im_start|>" + y
+ history = newhistory
(A little ugly, but it works for me.)
from llm-foundry.
Related Issues (20)
- Fly tokenization with multiple streams HOT 12
- Setting Dropout in MPT Prefix-LM after Exporting to HuggingFace Crashes during Fine-tuning HOT 2
- Evaluation for long_context_tasks failed with a KeyError: 'continuation_indices' HOT 3
- Can you add the pre-training of dbrx? HOT 3
- Installation issue from habana_alpha branch HOT 2
- How to record loss curve?
- Triton attention patch from Mistral HOT 1
- Can't create environment on A100 server HOT 2
- How to support multi-threaded parallel data preprocessing? HOT 11
- Freeze when using cpu offload HOT 2
- Issues in using FP8 for MPT baselines on H100 HOT 2
- FP8 not working HOT 2
- convert_dataset_hf.py example stuck HOT 2
- Is flops calculation correct? HOT 2
- Loss curve differences for pretraining HOT 1
- How to run inference/convert_composer_to_hf.py with MPT-1B model on Habana Gaudi 2, file formats do not match HOT 4
- `ValueError` when following finetuning `mpt-7b-arc-easy--gpu.yaml` example with different default batch size HOT 2
- Issue when installing "pip install -e ".[gpu-flash2]"" HOT 3
- Wrong number of samples for C4? HOT 2
- Composer crashes when attempting to load sharded checkpoint HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-foundry.