Git Product home page Git Product logo

Comments (3)

jenkinv avatar jenkinv commented on August 24, 2024

Generating after lora training CAN NOT Stop Properly

The code at lora/data/wikisql.py removes the bos_token and eos_token, assuming the tokenizer will add them automatically. However, this is not the case for all tokenizers, as demonstrated with the Mistral-7B-v0.1 tokenizer. This leads to problems where the generated text doesn't stop properly after training with the wikisql dataset.

from utils import load

_, tokenizer, _ = load("mistralai/Mistral-7B-v0.1")

print(tokenizer.encode("a"))

This will output the sequence with bos_token, but without eos_token:

[1, 264]

To resolve this issue, we need to explicitly enable the addition of eos_token for the tokenizer. Here's the corrected code snippet:

from utils import load

_, tokenizer, _ = load("mistralai/Mistral-7B-v0.1")

# Enable adding eos_token
tokenizer.add_eos_token = True

print(tokenizer.encode("a"))

This will output the correct sequence with both bos_token and eos_token:

[1, 264, 2]

where:

  • 1 is the id of bos_token
  • 264 is the id of 'a'
  • 2 is the id of eos_token

Therefore, we need to add the following line to mlx-examples/lora/lora.py within the train function:

# Add this line to turn on add_eos_token
tokenizer.add_eos_token = True

This ensures the model is trained with proper sequence termination and generates complete text after training with the wikisql dataset.

This solution is specifically tested with the Mistral-7B tokenizer and may need adjustments for other tokenizers.

python lora.py --model mistralai/Mistral-7B-v0.1 \
               --adapter-file adapters.npz \
               --max-tokens 50 \
               --temp 0.2 \
               --prompt "table: Order
columns: Name,City,Amount,Category,Date
Q: Tomorrow is 2024/05/01.What is the total amount of HangZhou yesterday?
A: "
Loading pretrained model
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 82080.31it/s]
Total parameters 7243.436M
Trainable parameters 1.704M
Loading datasets
Generating
table: Order
columns: Name,City,Amount,Category,Date
Q: Tomorrow is 2024/05/01.What is the total amount of HangZhou yesterday?
A: SELECT SUM Amount FROM Order WHERE City = 'HangZhou' AND Date = '2024/05/01'
Q: What is the total amount of HangZhou yesterday?
A:
==========

This is the case that generating doesn't work properly.

from mlx-examples.

awni avatar awni commented on August 24, 2024

It might be good to make it an option and default enable it.

Also for the example case you showed, does training with the eos token fix it?

from mlx-examples.

jenkinv avatar jenkinv commented on August 24, 2024

It might be good to make it an option and default enable it.

Good advice, I would follow it.

Also for the example case you showed, does training with the eos token fix it?

Yes, training with the eos token fix it.

This solution has been confirmed to work for Mistral-7B, gemma-2b, and MiniCPM-2B. However, it may not be compatible with Meta-Llama-3-8B and Qwen1.5-4B. Further investigation is required to determine the specific reasons for this incompatibility and develop appropriate solutions.

#760

from mlx-examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.