Regarding <a href="https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LO

[Feature Request] When generating using mlx_lm, specify data format about mlx-examples HOT 2 CLOSED

konarkm commented on June 25, 2024

[Feature Request] When generating using mlx_lm, specify data format

from mlx-examples.

Comments (2)

awni commented on June 25, 2024

So usually in generate it uses the models default chat template (so it's a chat). You can use the raw prompt (so it's just the text) by specifying --ignore-chat-template. There is currently no way to do the completion version in the CLI. But if you use the API you could do it like this:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-Instruct-v0.1")

prompt = ""
completion = ""

text = tokenizer.apply_chat_template(
    [
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": completion},
    ],
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(model, tokenizer, prompt=text)

from mlx-examples.

konarkm commented on June 25, 2024

I see, thank you!

In case it helps anyone down the line:
My goal was to generate using fine tuned adapters, as well as with a system prompt and user prompt in the chat format.

Based on generate.py and utils.py, I load the model with the adapters like this:

from mlx_lm import load, generate

model_repo = "mlx-community/Meta-Llama-3-8B-Instruct-4bit"
adapter_path = "/adapters"

model, tokenizer = load(model_repo, adapter_path=adapter_path)

Then, I can generate:

system_prompt = "Be a helpful assistant"
prompt = "Hey, tell me about Llama"

text = tokenizer.apply_chat_template(
    [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt},
    ],
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(model, tokenizer, prompt=text, verbose=True)

from mlx-examples.

Recommend Projects