I'm unable to reproduce the predictions found in Tk-Instruct

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Unable to reproduce Tk-Instruct predictions on Natural Instructions test about tk-instruct HOT 2 CLOSED

yizhongw commented on May 26, 2024 1

Unable to reproduce Tk-Instruct predictions on Natural Instructions test

from tk-instruct.

Comments (2)

yizhongw commented on May 26, 2024

Sorry for being late on this issue - I just noticed it this morning.
I don't have a good guess for the reason. Let me test it today or tomorrow, and come back later.

from tk-instruct.

yizhongw commented on May 26, 2024

Hi @timoschick I finally figured out the reason - It's because of the space at the end of the input.

The 3b models are trained on GPUs. When doing that, we used the src/ni_collator.py to convert an example to input/output pair. For the example you provided above, here is encoding output of the collator:
'Definition: In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by "#". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set.\n\n Positive Example 1 -\nInput: mountain#ski#skier.\n Output: Skier skis down the mountain.\n\n Positive Example 2 -\nInput: call#character#contain#wallpaper.\n Output: queen of wallpaper containing a portrait called film character .\n\nNow complete the following example -\nInput: cob#corn#eat.\nOutput: '

You can notice there are \n tokens in the middle and a space in the end. I tried using this as the input (or without the \n in the middle), and the model gave me the right output.

With this being said, I am quite surprised that the model is so unstable to the space in the end. I guess the same thing also happens for any model <= 3b, which we trained on GPUs with the collator. The 11b model is trained on TPU without this space.

As for solutions, I think the simplest way for you is just to add a space at the end of the input. For us, maybe we should retrain the models <= 3b?

cc @danyaljj

from tk-instruct.

Recommend Projects

Unable to reproduce Tk-Instruct predictions on Natural Instructions test about tk-instruct HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent