Git Product home page Git Product logo

Comments (2)

yizhongw avatar yizhongw commented on May 26, 2024

Sorry for being late on this issue - I just noticed it this morning.
I don't have a good guess for the reason. Let me test it today or tomorrow, and come back later.

from tk-instruct.

yizhongw avatar yizhongw commented on May 26, 2024

Hi @timoschick I finally figured out the reason - It's because of the space at the end of the input.

The 3b models are trained on GPUs. When doing that, we used the src/ni_collator.py to convert an example to input/output pair. For the example you provided above, here is encoding output of the collator:
'Definition: In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by "#". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set.\n\n Positive Example 1 -\nInput: mountain#ski#skier.\n Output: Skier skis down the mountain.\n\n Positive Example 2 -\nInput: call#character#contain#wallpaper.\n Output: queen of wallpaper containing a portrait called film character .\n\nNow complete the following example -\nInput: cob#corn#eat.\nOutput: '

You can notice there are \n tokens in the middle and a space in the end. I tried using this as the input (or without the \n in the middle), and the model gave me the right output.

With this being said, I am quite surprised that the model is so unstable to the space in the end. I guess the same thing also happens for any model <= 3b, which we trained on GPUs with the collator. The 11b model is trained on TPU without this space.

As for solutions, I think the simplest way for you is just to add a space at the end of the input. For us, maybe we should retrain the models <= 3b?

cc @danyaljj

from tk-instruct.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.