Hello, I'm using the following to fine tune the llama3 model

Fusing adapters with llama3 cause bad performances about mlx-examples HOT 3 OPEN

Timelessprod commented on June 30, 2024

Fusing adapters with llama3 cause bad performances

from mlx-examples.

Comments (3)

awni commented on June 30, 2024 1

I'm not certain this is the problem so it would be good to validate it. But fusing can cause precision issues. In low precision: c = a + b can give very inexact results if a and b have very different magnitudes. For example if a is big and b is small then c = a + b = a. In your case if the adapters have small values and the original weight matrix has large values, then fusing can wipe-out the adapters rendering the baseline model.

Now, I'm not sure that's happening. There's a couple things you could do to check.

Inspect the magnitudes of the weights and adapters
Try fusing and running the model in higher precision (e.g. fp32) just as a test that it works.
Using a larger scale sometimes (but not always) helps here also.

Another option is to avoid fusing entirely. There may be a way to run unfused models with llama.cpp (see #816 (comment)). Or you could use MLX LM to run the fine-tuned model instead of using llama.cpp?

from mlx-examples.

applecool commented on June 30, 2024

Facing pretty much the same problem on my end too with a different model (Mistral). #849

from mlx-examples.

applecool commented on June 30, 2024

This makes sense :) Thank you. Will have to look into that :)

from mlx-examples.

Fusing adapters with llama3 cause bad performances about mlx-examples HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent