Comments (2)
Sorry for being late on this issue - I just noticed it this morning.
I don't have a good guess for the reason. Let me test it today or tomorrow, and come back later.
from tk-instruct.
Hi @timoschick I finally figured out the reason - It's because of the space at the end of the input.
The 3b models are trained on GPUs. When doing that, we used the src/ni_collator.py
to convert an example to input/output pair. For the example you provided above, here is encoding output of the collator:
'Definition: In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by "#". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set.\n\n Positive Example 1 -\nInput: mountain#ski#skier.\n Output: Skier skis down the mountain.\n\n Positive Example 2 -\nInput: call#character#contain#wallpaper.\n Output: queen of wallpaper containing a portrait called film character .\n\nNow complete the following example -\nInput: cob#corn#eat.\nOutput: '
You can notice there are \n
tokens in the middle and a space in the end. I tried using this as the input (or without the \n
in the middle), and the model gave me the right output.
With this being said, I am quite surprised that the model is so unstable to the space in the end. I guess the same thing also happens for any model <= 3b, which we trained on GPUs with the collator. The 11b model is trained on TPU without this space.
As for solutions, I think the simplest way for you is just to add a space at the end of the input. For us, maybe we should retrain the models <= 3b?
cc @danyaljj
from tk-instruct.
Related Issues (20)
- [Question] parameters for performance reproduction in paper HOT 6
- Cannot load `tk-instruct-11b-def` with Huggingface transformers HOT 3
- How to debug deepspeed in vscode?
- About the Evaluation Metrics
- Fine tune of Multi-news dataset HOT 2
- The `max_num_instances_per_task` Parameter in Experiments
- Able to predict more than one test case in one call?
- Predicting a topic that doesnot exist in the list
- What is the minimum samples required for a new Task?
- Low ROUGE scores for Tk-instruct large? HOT 1
- How to use TK-instruct on our own dataset
- finetune 11b model HOT 1
- Input format for the model for the complete schema
- question about the train learning rate
- [Errno 2] No such file or directory: 'data/tasks/.json
- Tokenizer & Model info
- Maximum input sequence length? HOT 1
- Datasets folder? HOT 1
- Evaluation time estimate? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tk-instruct.