ml-explore / mlx-examples Goto Github PK

View Code? Open in Web Editor NEW

5.2K 57.0 750.0 4.53 MB

Examples in the MLX framework

License: MIT License

Python 99.96% Shell 0.04%

mlx

mlx-examples's Introduction

MLX Examples

This repo contains a variety of standalone examples using the MLX framework.

The MNIST example is a good starting point to learn how to use MLX.

Some more useful examples are listed below.

Text Models

Transformer language model training.
Large scale text generation with LLaMA, Mistral, Phi-2, and more in the LLMs directory.
A mixture-of-experts (MoE) language model with Mixtral 8x7B.
Parameter efficient fine-tuning with LoRA or QLoRA.
Text-to-text multi-task Transformers with T5.
Bidirectional language understanding with BERT.

Image Models

Image classification using ResNets on CIFAR-10.
Generating images with Stable Diffusion.
Convolutional variational autoencoder (CVAE) on MNIST.

Audio Models

Speech recognition with OpenAI's Whisper.

Multimodal models

Joint text and image embeddings with CLIP.
Text generation from image and text inputs with LLaVA.

Other Models

Semi-supervised learning on graph-structured data with GCN.
Real NVP normalizing flow for density estimation and sampling.

Hugging Face

Note: You can now directly download a few converted checkpoints from the MLX Community organization on Hugging Face. We encourage you to join the community and contribute new models.

Contributing

We are grateful for all of our contributors. If you contribute to MLX Examples and wish to be acknowledged, please add your name to the list in your pull request.

Citing MLX Examples

The MLX software suite was initially developed with equal contribution by Awni Hannun, Jagrit Digani, Angelos Katharopoulos, and Ronan Collobert. If you find MLX Examples useful in your research and wish to cite it, please use the following BibTex entry:

@software{mlx2023,
  author = {Awni Hannun and Jagrit Digani and Angelos Katharopoulos and Ronan Collobert},
  title = {{MLX}: Efficient and flexible machine learning on Apple silicon},
  url = {https://github.com/ml-explore},
  version = {0.0},
  year = {2023},
}

mlx-examples's People

Contributors

Stargazers

Watchers

Forkers

braincomputingsantosh sartaj0107 harshavardhank tomchapin pavanyellow janakan2466 adiesha chriszs btbujiangjun ttt-tkmr jaglinux amoahcp omygpt thargyi74 yaddayaddayaddayadda 1-ashraful-islam gotfusion yparvej srini1971 oxideai cservin69 chatgpt-1 ricardo-larosa nrvo krzyzinskim gavi macewan andypermarobotics leonericsson cnp-ciimar vaibhavs10 josebrunods adhishthite lewieyasuz iamhitarth logesh45 isaacbmiller tyrannicawe hqy168 elliottbarnes tiansiyuan techthiyanes rishabh135 grej r2k0 gekko-z seungkyua stvales honorrong andysingal letarrask benzhaidang rushyam jadenyifanhe tensojka jspaul2003 andrewhazelden kengggg zhutony sorokinvld robertmccraith michael-erasmus tangentleman gravitasse inkwash zb0413 snapbuy harshilbhatia niixxaaa magnusmoaner jacksonzhang0316 hellobiondi jbarrow bbelescot mmnga schaferk pravinacharya freefrancisco manas034 albertomatamoros amazewor-69 godzillayellowquestan bloodsolz 63-gorgeousguru shashipal95 cubacken-l juanwcg sabahamidi2 waybeauty-r x-xglossynn scorpions11 certready3grimbel noticeiver-senterban captainn-icyness l-readyna q-thegaro undshing-l educate-w buggytraumalawrt storiesgigaolympicry

mlx-examples's Issues

Object detection task

It would be more useful to have a object detection model since most of the computer vision modules are directed to NVIDIA.

Mixtral example is giving non-sensical output

It seems that mixtral example is giving non-sensical output when used with non-default arguments. Is this expected?
Here is an example

python mixtral.py --model_path ../../mixtral/ -m 2000 --prompt "keep continuing the following story. In the beginning the Universe was created."
[INFO] Loading model from disk.
[INFO] Starting generation...
keep continuing the following story. In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move"""
import random


def test(actual, expected):
    """
      >>> test(10, 10)
      True
    """
    if actual == expected:
        return True
    else:
        return False


def secret(a=1, b=3, c=6, d=12):
    x, y, z, w = a, b, c, d
    a, b, c, d = my_secret(a, b, c, d, y)
    b, y = my_Formatting(a, b, c, d, y)
    x, z = my_XOR(a, b, c, d, x, y, z, w)
    z, w = my_SecretNa(x, z, w)
    y, z, w = my_Formatting(x, y, z, w)
    w, x, y, z = my_XOR(x, y, z, w, x, y, z, w)
    y, z, w = my_Formatting(x, y, z, w)
    w, x, y, z = my_XOR(x, y, z, w, x, y, z, w)
    y, z, w = my_Formatting(x, y, z, w)
    w, x, y, z = my_XOR(x, y, z, w, x, y, z, w)
    b, c = my_NewSecret(x, y, z, w, b, c)
    w, z, x, y = my_NewInfo(a, b, c, d, w, z, x, y)
    b, y, z, x = my_NewChoice(a, b, c, d, b, y, z, x)
    task = my_NextTask(b, y, z, x, task)

    return task


def my_secret(a, b, c, d, x):
    a, b, c, d = 3, 2, 5, 4
    x, y, z, w = 100, 300, 500, 800
    b, y, z, x = 200, 100, 120, 1600

    return x


def my_Formatting(a, b, c, d, x):
    a, b, c, d = 3, 2, 5, 4
    x, y, z, w = 100, 300, 500, 800
    b, y, z, x = 200, 100, 120, 1600

    return x


def my_XOR(a, b, c, d, x, y, z, w):
    a, b, c, d = 3, 2, 5, 4
    x, y, z, w = 100, 300, 500, 800
    b, y, z, x = 200, 100, 120, 1600

    return x


def my_secret2(a, b, c, d, x):
    a, b, c, d = 3, 2, 5, 4
    x, y, z, w = 100, 300, 500, 800
    b, y, z, x = 200, 100, 120, 1600

    return x


def my_NewSecret(a, b, c, d, x):
    a, b, c, d = 3, 2, 5, 4
    x, y, z, w = 100, 300, 500, 800
    b, y, z, x = 200, 100, 120, 1600

    return x


def my_NextTask(a, b, c, d, x):
    a, b, c, d = 3, 2, 5, 4
    x, y, z, w = 100, 300, 500, 800
    b, y, z, x = 200, 100, 120, 1600

    return x


def main():
    print ("starting the following code from the beginning")

    for random.choice('secret'):

        while random.loop()

            when random.choice ('secret2') and random.loop():

                if random.loop() and random.loop():

                    break

                else:

                    continue

        elif random.loop() and random.loop():

            break

        else:

            continue


# for _ in range(10):
#     secret()
#     secret()

# secret()
# if __name__ == "__main__":
#     main()

# check if it gives a different result.
test(1, 1)
# print ("okay long buddy.")
# print ("WOW YOU FOOLED ME!")
# # print ("here is a frog that doesn't like to be exposed to direct sunlight.") ц񋀀  񋀀并并并并并起适不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不不但不但不但不但不但不但不但不但不但不但不但不但不但不但不但不但不但不但不但不都不\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

How can I edit the training dataset for Lora?

I'm trying to finetune on Lora, but I can't find where the Training Dataset is located so I can create my own. Anyone managed to do it on their own dataset?

Please support jax metal example with 64bit float

error in transformer_lm example

I tried to run the transformer LM example in mlx-examples/transformer_lm/main.py as follows:

python main.py --gpu

which produced the output:

Training a transformer with 153.883 M parameters
Iter 10: Train loss 9.027, It/sec 2.114
Iter 20: Train loss 8.388, It/sec 2.272
...
Iter 990: Train loss 6.922, It/sec 2.280
Iter 1000: Train loss 6.916, It/sec 2.277
Traceback (most recent call last):
  File "/Users/vogel/data/mlx-examples/transformer_lm/main.py", line 192, in <module>
    main(args)
  File "/Users/vogel/data/mlx-examples/transformer_lm/main.py", line 115, in main
    val_loss = eval_fn(params, valid)
                       ^^^^^^
NameError: name 'params' is not defined. Did you mean: 'nparams'?

From a quick skim of the code, I suspect model should be passed to eval_fn() instead of params, e.g.:

val_loss = eval_fn(model, valid)

and eval_fn() should be something like:

def eval_fn(model, dataset):
        inputs, targets = map(mx.array, to_samples(context_size, dataset))
        loss = 0
        for s in range(0, targets.shape[0], batch_size):
            bx, by = inputs[s : s + batch_size], targets[s : s + batch_size]
            bx, by = map(mx.array, (bx, by))
            losses = model.loss(bx, by, reduce=False)
            loss += mx.sum(losses).item()
        return loss / len(targets)

Mistral example misses numpy in its requirements.txt

Hi,

It looks like numpy is required to convert the mistral model. But it is missing from the requirements.txt file.

GPT-2 training examples: Gradient accumulation, learning rate decay, fp16 training

thank you so much for MLX, it rocks!!

I've been implementing GPT-2 training in MLX. Please feel free to check it out here: https://github.com/dx-dtran/gpt2-mlx

I had a few questions regarding gradient accumulation, learning rate decay, and float16 training:

I implemented gradient accumulation in the following way. Is this proper, or is there a better way to do it?

# outside training loop:
# allocate memory to accumulate gradients
# same shape as the gradients array (which is the same as the model parameters)

accumulated_grads = tree_map(
    lambda x: mx.zeros_like(x), model.parameters()
)

        # inside training loop:
        # accumulate the gradients by adding to the pre-allocated accumulator array

        accumulated_grads = tree_map(
            lambda acc, new: acc + new * (1.0 / num_grad_accumulation_steps),
            accumulated_grads,
            grads,
        )

        # evaluate the grads immediately each mini-batch step 
        # so we don't build up too much memory for when we eventually update the model parameters

        tree_map(
            lambda grad: mx.eval(grad),
            accumulated_grads,
        )

you can see my full code here

I also tried implementing learning rate decay, but had to define a custom optimizer because I couldn't find whether the AdamW learning_rate attribute was exposed with a setter. Is there already a learning_rate setter?
Finally, I tried implementing float16 training but couldn't find a way to set a custom dtype for the nn.Linear layers. But I'm sure I have to patiently wait for this right? 😄 Because I read you guys are focusing on implementing quantization

Is merging something like my training script into the main repo something you guys would be interested in? If so I'd be happy to help!

Here are some results I was able to get in full float32 precision:

Hardware	Model	Batch Size	Grad Accum Steps	Time per full iteration
M1 Max 64GB	GPT-2 124M	2	1	0.6 seconds
M1 Max 64GB	GPT-2 124M	12	1	4 seconds
M1 Max 64GB	GPT-2 124M	12	40	142 seconds
M1 Max 64GB	GPT-2 124M	16	1	8.13 seconds
M1 Max 64GB	GPT-2 XL 1.5B	2	1	28 seconds
M1 Pro 16GB	GPT-2 124M	1	4	2.4 seconds
M1 Pro 16GB	GPT-2 124M	3	4	10.5 seconds
M1 Pro 16GB	GPT-2 124M	2	240	270 seconds
M1 Pro 16GB	GPT-2 124M	2	1	1.35 seconds

llama lora fune tune using previous adapters weights

Can we reuse a previous adapters.npz in a second fine tune process iteration ?

let say I fine tune for 600 iteration during the first fine tune process, I want to restart the fine tune a second time but I want to benefits of the first run adapters.npz weights. Can we add an option to load them in order to go close to the previous fine tune version of model ?

lora on mistral error

I did convert the model using the mistral instructions not the lora instructions.

I move the mistral-7B-v0.1 path from mistral to lora folder and run this instructions and got the following error:

python lora.py --model mistral-7B-v0.1/ \                                   
               --train \
               --iters 600
Loading pretrained model
Traceback (most recent call last):
  File "/Users/ethangodin/Github/mlx-examples/lora/lora.py", line 288, in <module>
    model, tokenizer = load_model(args.model)
  File "/Users/ethangodin/Github/mlx-examples/lora/lora.py", line 270, in load_model
    model_args = ModelArgs(**config)
TypeError: __init__() got an unexpected keyword argument 'sliding_window'

convert.py doesn't know how to handle large models

Instructions here don't work for large models, which are often stored as shards. Can /llama/convert.py be adapted to handle models available on hugginface such as this? Even saving the shards individually with mlx support can work.

Support the Mistral AI official huggingface weights for Mixtral-8x7B-v0.1

per the current example:

Download the models from HuggingFace:
git clone https://huggingface.co/someone13574/mixtral-8x7b-32kseqlen

I'd really rather not have to re-download many GBs of weight files, and less-so from someone13574 (no offense) when there weights are posted by Mistral AI itself: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

would this work already? could the example be updated to use these weights?

stable diffusion malloc error

mlx seemed to install fine but I tried the stable diffusion example and keep getting this:

134217728 bytes seems small, there should not be a problem allocating only ~134MB memory, the machine has 8GB RAM & ~4GB free. It's an Apple M2 Mac mini (running headless & connected to via VNC, could that cause issues?),
Or am I misunderstanding something basic, does this need an M2 Pro or something? Though it does go through all the (default) 50 steps before failing

I ran the MLX unit tests and they seemed to run fine, said '154 tests ran, 4 skipped'

(foo) david@Davids-Mac-mini stable_diffusion % python txt2image.py "A photo of an astronaut riding a horse on Mars." --n_images 1 --n_rows 1
/Users/david/mlx/foo/lib/python3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
100%|
 [00:00<?, ?it/s]libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 134217728 bytes.
zsh: abort      python txt2image.py "A photo of an astronaut riding a horse on Mars."  1  1
(foo) david@Davids-Mac-mini stable_diffusion % /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

python --version 3.9.6. OS: Ventura 13.6.1. tried Python 3.11.5 and does same
It cuts off at "There appears to be %d '

Briefly tried installing via source & also tried on conda, and Python 3.12, also didn't work

In-Depth Query: Enhancements in Large-Scale Text Generation Using LLaMA and Mistral Models in MLX Framework

Dear MLX Framework Contributors,

Firstly, I'd like to express my admiration for the comprehensive range of examples provided in the MLX repository, particularly in the realms of language models and large-scale text generation. The implementation of both LLaMA and Mistral models caught my attention, and I have a couple of in-depth queries and suggestions that might contribute to furthering the robustness and versatility of these models.

Model Scalability and Performance: In the context of large-scale deployments, how do the LLaMA and Mistral models scale in terms of computational efficiency and memory usage? Are there any benchmarks available comparing their performance against other prominent models in similar tasks?
Model Fine-Tuning and Adaptability: While the examples showcase the impressive capabilities of these models, I am curious about their adaptability to more niche domains. Specifically, how effective are the LLaMA and Mistral models in adapting to specialized vocabularies or styles? Furthermore, are there any guidelines or best practices for fine-tuning these models on custom datasets?
Integration with Other MLX Components: Considering the MLX framework's modularity, what are the possibilities and existing examples of integrating LLaMA and Mistral with other components of MLX, such as parameter-efficient tuning with LoRA or image generation with Stable Diffusion?
Future Roadmap: Lastly, I would be interested to learn about any future enhancements or features planned for these models within the MLX framework. Are there ongoing developments that we, as a community, can look forward to or potentially contribute to?

I believe addressing these queries could not only benefit users like myself who are deeply interested in the technical aspects of these models but also enhance the MLX framework's documentation for a broader audience.

Thank you for your time and dedication to this project. I eagerly await your insights and further discussions on these topics.

Best regards,
yihong1120

Slow Lora for llama

I am using a macbook pro m2 max running sonoma 14.2. I tried following the example to fine-tune a llama 7b model using lora on the wikisql dataset. One iter took about 2 hours, the example uses 600 iterations. I am still quite new at this, is it expected to be this slow?

There was an error with the bfloat16 type but I solved it using a solution posted in pull requests #10 . Could this pose a problem?

mlx-examples/lora /convert.py isn't properly handling HF pytorch model format

This overlaps somewhat with #35 and #81. It is probably an enhancement since the Wiki expects Mistral is downloaded from mistral.ai, but it would be a useful one as the framework could be used with Hugging Face model files.

In particular, I'm trying to perform LoRa training on mistralai/Mistral-7B-v0.1 and failing on the initial example step of conversion

I was able to update the script to handle the 'sharded' pytorch model files (not the safetensor files) with this modification to the part of it that loads the weights (this, of course, assumes each .bin file is non-overlapping as suggested by the naming convention):

    from glob import glob
    weight_dict = {}
    for state_path in glob(str(torch_path / "pytorch_model-0000*.bin")):
        print("Weights from", state_path)
        state = torch.load(state_path)
        weight_dict.update(**{k: v.to(torch.float16).numpy() for k, v in state.items()})
    np.savez(str(mlx_path / "weights.npz"), weight_dict)

I immediately run into issues in the last part that copies the parameters. It was looking for a params.json file and (via some minimal fussing) it looks like config.json might be the file it needs to refer to instead. Looking in there, there is a num_key_value_heads entry, which might what is needed instead of the n_heads key the script expects. However, this is as far as I got with the parameter copying logic.

llama confusion

https://github.com/facebookresearch/llama

fetches weights for 13B and 70B weights in multiple files, but by mlx-example/llama expects single file path for weights (not directory). does this mean only 7B supported?

mlx-example/llama can not convert weights from llama2

nikolaydubina@Macintosh llama % python3 convert.py ../../llama/llama-2-7b-chat/consolidated.00.pth mlx_llama_weights.npz
Traceback (most recent call last):
  File "/Users/nikolaydubina/Workspace/mlx-examples/llama/convert.py", line 47, in <module>
    **{k: v for k, v in starmap(map_torch_to_mlx, state.items()) if k is not None}
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nikolaydubina/Workspace/mlx-examples/llama/convert.py", line 47, in <dictcomp>
    **{k: v for k, v in starmap(map_torch_to_mlx, state.items()) if k is not None}
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nikolaydubina/Workspace/mlx-examples/llama/convert.py", line 35, in map_torch_to_mlx
    return key, value.numpy()
                ^^^^^^^^^^^^^
TypeError: Got unsupported ScalarType BFloat16

stable diffusion is not working when I try to pip install

pip install -r requirements.txt is not working

Slow mnist in M1

I am using Apple M1 Pro and ran main.py in Python 3.11. When I used the option --gpu, it's slower than cpu. Is this expected?

llama.py failed to generate on large mode.

For example, the program reports the following error on llama-2-13b-chat:

python3 llama.py llama-2-13b-chat.npz tokenizer.model 'Hello, who are you?'
[INFO] Loading model from disk.
Press enter to start generation
------
Traceback (most recent call last):
  File "/Users/starfish/working/llm/mlx-examples/llama/llama.py", line 306, in <module>
    generate(args)
  File "/Users/starfish/working/llm/mlx-examples/llama/llama.py", line 180, in generate
    for token in model.generate(x, args.temp):
  File "/Users/starfish/working/llm/mlx-examples/llama/llama.py", line 125, in generate
    x, c = l(x, mask=mask)
           ^^^^^^^^^^^^^^^
  File "/Users/starfish/working/llm/mlx-examples/llama/llama.py", line 77, in __call__
    y = self.norm1(x)
        ^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/mlx/nn/layers/normalization.py", line 85, in __call__
    return self.weight * x * n
           ~~~~~~~~~~~~^~~
ValueError: Shapes (5120) and (1,7,2560) cannot be broadcast.

And the program reports the following error on llama-2-70b:

python3 llama.py llama-2-70b.npz tokenizer.model 'Hello, who are you?'
[INFO] Loading model from disk.
Press enter to start generation
------
Traceback (most recent call last):
  File "/Users/starfish/working/llm/mlx-examples/llama/llama.py", line 306, in <module>
    generate(args)
  File "/Users/starfish/working/llm/mlx-examples/llama/llama.py", line 180, in generate
    for token in model.generate(x, args.temp):
  File "/Users/starfish/working/llm/mlx-examples/llama/llama.py", line 125, in generate
    x, c = l(x, mask=mask)
           ^^^^^^^^^^^^^^^
  File "/Users/starfish/working/llm/mlx-examples/llama/llama.py", line 77, in __call__
    y = self.norm1(x)
        ^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/mlx/nn/layers/normalization.py", line 85, in __call__
    return self.weight * x * n
           ~~~~~~~~~~~~^~~
ValueError: Shapes (8192) and (1,7,1024) cannot be broadcast.

Error when running mnist example

When I run python main.py in mnistfolder, here's the output

HELLO WORLD!
Traceback (most recent call last):
  File "/Users/renyi/Code/mlx-examples/mnist/main.py", line 8, in <module>
    import mlx.core as mx
ModuleNotFoundError: No module named 'mlx.core'

I've already installed mlx with pip install mlx but it still doesn't work

Here are some information about my machine and environment

MacBook Air M2 with 24GB memory

ProductName: macOS
ProductVersion: 14.1.1
BuildVersion: 23B81

Python Version: 3.12.0
Pip Version: 23.3.1

llama: Got unsupported ScalarType BFloat16 when converting weights

When trying to convert the PyTorch weights, for example:

python convert.py ../../llama-2-7b/consolidated.00.pth mlx_llama-2-7b.npz

I get:

File "../ml-explore/mlx-examples/llama/convert.py", line 35, in map_torch_to_mlx
    return key, value.numpy()
                ^^^^^^^^^^^^^
TypeError: Got unsupported ScalarType BFloat16

Implemented a fix/workaround with PR #10

Error while finetuning with LoRA: Unable to allocate 12 bytes.

I tried to follow the example in the lora directory, but I get the following error

libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 12 bytes.

I tried to run with the default settings and only changed the iters value to 100

lora % python lora.py --model mlx_model --train --iters 100

I get the following output before I get the error above:

Loading pretrained model
Total parameters 7243.436M
Trainable parameters 1.704M
Loading datasets
Training

Question: migrating existing projects that use NumPy to use MLX

Hi there, I was playing around with SDXL models via Fooocus and the performance there on m1 pro is terrible and I was trying to find a way to optimise that and found out this recently published library exists.

I was wondering if projects such as Fooocus that are using NumPy could use some kind of universal wrapper around NumPy and MLX which would be easily implementable.

Here is my naive idea pseudo code:

import platform
import numpy as original_np

try:
    import mlx.core as ml_np
    mlx_available = True
except ImportError:
    mlx_available = False


def is_apple_silicon():
    return platform.machine() == 'arm64' and platform.system() == 'Darwin'


class NumpyMLXWrapper:
    def __init__(self):
        self.use_mlx = is_apple_silicon() and mlx_available

    def __getattr__(self, name):
        def method(*args, **kwargs):
            if self.use_mlx:
                # Try MLX first
                try:
                    return getattr(ml_np, name)(*args, **kwargs)
                except AttributeError:
                    pass
            # Fallback to NumPy
            return getattr(original_np, name)(*args, **kwargs)
        return method


np = NumpyMLXWrapper()

and then the projects would instead of this

import numpy as np

do this

import numpy_mlx_wrapper as np

Is this something that makes sense to do, and would likely gain a performance boost?

Mixtral conversion to nzp not working on M3 max 128gb

Hi,

I have an M3 max 128gb and I wanted to try mixral, but after downloading the weights and combining them, I tried to run the converter and it start but then the process gets killed.

Do we need more than 128gb for this to work?

Thanks in advance.

Question of speed and some methods

Thanks for this work, I've been looking forward to the framework running on Apple silicon for a long time!
The device I'm using is a MacBook Pro-14ich by M3Max.
I was surprised to find that the mnist project using the mlx framework(main.py) runs slower than the Pytorch(torch_main.py). Can I ask what causes this? Pytorch is a great framework and I use it all the time for learning, but it still surprises me that mlx as a dedicated framework would be slower than Pytorch.
As well, does the mlx framework currently have or have plans to develop some tools like torch.utils.data.Dataset, torch.utils.data.DataLoader classes and the torchvision? I just tried to reproduce a yolo algorithm using mlx, but I didn't find these methods in the documentation.
Finally, looking forward to the continued development of mlx, I'm very much looking forward to the future that UMA and dedicated frameworks can create!

Support for Phi-2 model

I'd be interested in helping contribute this but would need some guidance. right now, it doesn't seem like other convert.py scripts will work out of the box

Microsoft has released the weights in .safetensors format on huggingface: https://huggingface.co/microsoft/phi-2/tree/main

would be great to be able to run this with MLX!

phi2/convert.py: requires flash_attn

When running

% pip install -r requirements.txt
...
% python convert.py
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "convert.py", line 24, in
convert()
File "convert.py", line 15, in convert
model = AutoModelForCausalLM.from_pretrained(
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 440, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/transformers/dynamic_module_utils.py", line 373, in get_class_from_dynamic_module
final_module = get_cached_module_file(
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/transformers/dynamic_module_utils.py", line 247, in get_cached_module_file
modules_needed = check_imports(resolved_module_file)
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/transformers/dynamic_module_utils.py", line 141, in check_imports
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn

pip install flash_attn fails because of cuda requirements (nvcc is required), but as far as I can tell there is not support for nvcc on MacOS. Is this expected on the AArch64 Macs?

LLama example doesn't work with HF format models?

First of all, thank you for the great work in providing us with all those examples. However, when I tried to use the LLama model in HF format, it didn't work. I was able to convert the HF format model weights to npz format by reverse engineering the convert.py and transformers' convert HF from pytorch script. The model seems to load without any errors, but it doesn't output any meaningful words; just a series of random characters.
Here is what I did to convert the HF format model:

Merge all the shard weights into one .pth file, using the following script:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_path = "./models/NousResearch/Llama-2-7b-chat-hf"

out_dir =  "./models/mlx/llama2-chat"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype= torch.float16)

torch.save(model.state_dict(),'./models/mlx/llama2-chat/model.pth')
tokenizer.save_pretrained(out_dir)

Use the modified convert.py as follows to convert HF format weight to npz format.

import argparse
from itertools import starmap

import numpy as np
import torch


def map_torch_to_mlx(key, value):
    if "embed_tokens.weight" in key:
        key = "embedding.weight"

    elif "norm" in key:
        key = key.replace("input_layernorm", "norm1").replace("post_attention_layernorm", "norm2")

    elif "q_proj" in key:
        key= key.replace("q_proj","query_proj")

    elif "k_proj" in key:
        key= key.replace("k_proj","key_proj")  

    elif "v_proj" in key:
        key= key.replace("v_proj","value_proj")  

    elif "o_proj" in key:
        key= key.replace("o_proj","out_proj")  

    elif "gate_proj" in key:
        key= key.replace("mlp.gate_proj","linear1")  

    elif "down_proj" in key:
        key= key.replace("mlp.down_proj","linear3")  

    elif "up_proj" in key:
        key= key.replace("mlp.up_proj","linear2")  

    elif "lm_head" in key:
        key = key.replace("lm_head", "out_proj")

    elif "rope" in key:
        return None, None
    
    key = key.replace('model.', '')
    
    return (
        key,
        value.numpy()
        if value.dtype != torch.bfloat16
        else value.to(torch.float32).numpy(),
    )


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert Llama weights to MLX")
    parser.add_argument("torch_weights")
    parser.add_argument("output_file")
    args = parser.parse_args()

    state = torch.load(args.torch_weights)
    np.savez(
        args.output_file,
        **{k: v for k, v in starmap(map_torch_to_mlx, state.items()) if k is not None}
    )

Run the llama.py as instructed in the readme.

Wondering if I missed anything? I had a look at the llama.py and it looks fine to me. I'm not very sure why it doesn't work.

convert.py load large torch model with CUDA device

If the model is large, for example, llama-2-13b, the convert.py will try to load the model with CUDA device and reports an error.

 ✘ python3 ./convert.py ../../llama/llama-2-13b/consolidated.00.pth llama-2-13b
Traceback (most recent call last):
  File "/Users/starfish/working/llm/mlx-examples/llama/./convert.py", line 49, in <module>
    state = torch.load(args.torch_weights)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/serialization.py", line 1014, in load
    return _load(opened_zipfile,
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/serialization.py", line 1422, in _load
    result = unpickler.load()
             ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/serialization.py", line 1392, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/serialization.py", line 1366, in load_tensor
    wrap_storage=restore_location(storage, location),
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/serialization.py", line 381, in default_restore_location
    result = fn(storage, location)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/serialization.py", line 274, in _cuda_deserialize
    device = validate_cuda_device(location)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/serialization.py", line 258, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Swift examples?

All of the examples are python-based.

Perhaps there could be Swift examples. Perhaps those should be in a different repo which is not only Swift-focused but also assumes Xcode.

Tried to use SDXL-turbo

I made the changes to

_DEFAULT_MODEL = "stabilityai/sdxl-turbo"
_MODELS = {
    # See https://huggingface.co/stabilityai/sdxl-turbo for the model details and license
    "stabilityai/sdxl-turbo": {
        "unet_config": "unet/config.json",
        "unet": "unet/diffusion_pytorch_model.safetensors",
        "text_encoder_config": "text_encoder/config.json",
        "text_encoder": "text_encoder/model.safetensors",
        "vae_config": "vae/config.json",
        "vae": "vae/diffusion_pytorch_model.safetensors",
        "diffusion_config": "scheduler/scheduler_config.json",
        "tokenizer_vocab": "tokenizer/vocab.json",
        "tokenizer_merges": "tokenizer/merges.txt",
    }
}

but I get the error

% python txt2image.py "A photo of an astronaut riding a horse on Mars." --n_images 1
  0%|                                                                                                                                           | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "txt2image.py", line 36, in <module>
    for x_t in tqdm(latents, total=args.steps):
  File "/Users/ageorgios/.pyenv/versions/3.8.18/lib/python3.8/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/Users/ageorgios/Models/mlx-examples/stable_diffusion_turbo/stable_diffusion/__init__.py", line 84, in generate_latents
    eps_pred = self.unet(x_t_unet, t_unet, encoder_x=conditioning)
  File "/Users/ageorgios/Models/mlx-examples/stable_diffusion_turbo/stable_diffusion/unet.py", line 395, in __call__
    x, res = block(
  File "/Users/ageorgios/Models/mlx-examples/stable_diffusion_turbo/stable_diffusion/unet.py", line 252, in __call__
    x = self.attentions[i](x, encoder_x, attn_mask, encoder_attn_mask)
  File "/Users/ageorgios/Models/mlx-examples/stable_diffusion_turbo/stable_diffusion/unet.py", line 117, in __call__
    x = block(x, encoder_x, attn_mask, encoder_attn_mask)
  File "/Users/ageorgios/Models/mlx-examples/stable_diffusion_turbo/stable_diffusion/unet.py", line 70, in __call__
    y = self.attn2(y, memory, memory, memory_mask)
  File "/Users/ageorgios/.pyenv/versions/3.8.18/lib/python3.8/site-packages/mlx/nn/layers/transformer.py", line 68, in __call__
    keys = self.key_proj(keys)
  File "/Users/ageorgios/.pyenv/versions/3.8.18/lib/python3.8/site-packages/mlx/nn/layers/linear.py", line 33, in __call__
    x = x @ self.weight.T
ValueError: [matmul] Last dimension of first input with shape (2,13,768) must match second to last dimension of second input with shape (2048,320).

Whisper AttributeError: 'tuple' object has no attribute 'flatten'

As per the readme suggest I tried:

import whisper

speech_file = "test.mp3"

text = whisper.transcribe(speech_file)["text"]

print(text)

And got

(venv) (base) ➜  whisper git:(main) ✗ python app.py 
Traceback (most recent call last):
  File "/Users/adrian/Developer/mlx-examples/whisper/app.py", line 5, in <module>
    text = whisper.transcribe(speech_file)["text"]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adrian/Developer/mlx-examples/whisper/whisper/transcribe.py", line 303, in transcribe
    timestamps = tokens[timestamp_tokens.nonzero().flatten()]
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'flatten'

Performance with M1 Pro 16GB: Is it Normal?

Hello, can you provide a minimum configuration for model usage?

macOS 13.4.1 14-inch M1 Pro 16GB

LLaMA. It takes so long time to chat and not available in fact.

(mlx) llama % python3 llama.py Llama-2-7b-chat.npz tokenizer.model "who are you?"
[INFO] Loading model from disk.
Press enter to start generation
------

The memory consumption reaches around 13GB.
2. Stable Diffusion

(mlx) stable_diffusion % python3 txt2image.py "a beautiful flower" --output flower.png
  2%|█▊                                                                                       | 1/50 [00:20<16:55, 20.72s/it]

The memory consumption reaches around 11GB , and it takes more than ten mins.

Unfortunately, given these observations, it seems that the mlx framework is almost unavailable for machines with 16GB M1 Pro.

Llama 2 conversion doesn't work (but should it?)

As the title says, I couldn't get a usable model for Llama 2 (specifically llama-2-13b). It's not clear whether the sample should only work with llama 1 (and if so, why? llama 2 has been around for quite a while) or also with llama 2.

Here are my steps

git clone https://github.com/ml-explore/mlx-examples.git

conda create -n mlx-experiments python=3.10 -y
conda activate mlx-experiments
pip install mlx==0.0.3

cd mlx-examples/llama
pip install -r requirements.txt

python convert.py /<path>/llama.cpp.2/models/llama-2-13b mlx_llama_weights.npz

> ModuleNotFoundError: No module named 'numpy'

pip install numpy

python convert.py /Users/vladi/Projects/llama.cpp.2/models/llama-2-13b mlx_llama_weights.npz

> IsADirectoryError: [Errno 21] Is a directory: '/<path>/Projects/llama.cpp.2/models/llama-2-13b'

python convert.py /Users/vladi/Projects/llama.cpp.2/models/llama-2-13b/consolidated.00.pth mlx_llama_weights.npz

> RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

# add map_location=torch.device('mps') to torch.load

> TypeError: BFloat16 is not supported on MPS

# add map_location=torch.device('cpu')

> TypeError: Got unsupported ScalarType BFloat16

# replace "return key, value.numpy()" with "return key, value.numpy() if value.dtype != torch.bfloat16 else value.to(torch.float32).numpy()" as per https://github.com/ml-explore/mlx-examples/pull/10/commits/429ddb30dca199c9cfbfe2280cf47875cc3f0be9

python llama.py mlx_llama.npz tokenizer.model "hello"

> RuntimeError: [load] Failed to open file mlx_llama.npz

python llama.py mlx_llama_weights.npz tokenizer.model "hello"

(mlx-experiments) ➜  llama git:(main) ✗ python llama.py mlx_llama_weights.npz tokenizer.model "hello"
[INFO] Loading model from disk.
Press enter to start generation
------
Traceback (most recent call last):
  File "/<path>/Projects/mlx-examples/llama/llama.py", line 306, in <module>
    generate(args)
  File "/<path>/Projects/mlx-examples/llama/llama.py", line 180, in generate
    for token in model.generate(x, args.temp):
  File "/<path>/Projects/mlx-examples/llama/llama.py", line 125, in generate
    x, c = l(x, mask=mask)
  File "/<path>/Projects/mlx-examples/llama/llama.py", line 77, in __call__
    y = self.norm1(x)
  File "/<path>miniconda3/envs/mlx-experiments/lib/python3.10/site-packages/mlx/nn/layers/normalization.py", line 85, in __call__
    return self.weight * x * n
ValueError: Shapes (5120) and (1,2,2560) cannot be broadcast.

At this point I gave up in frustration and set a reminder three weeks from now to check the state of this project.

I'm running an M2 Max with Ventura 13.6.2

EDIT: added pip install -r requirements.txt

[Request] Add demo for ChatGLM3/Baichuan2

ChatGLM3 and Baichuan2 are also very popular open-source large language models, and I hope there will be demos about them.

mistral test failed on M3 max

======================================================================
FAIL: test_generate (__main__.TestMistral)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ethangodin/Github/mlx-examples/mistral/test.py", line 76, in test_generate
    self.assertEqual(tokens, expected)
AssertionError: Lists differ: [302, 272, 16762, 9588, 12807, 2867, 2135, 28723, 851[116 chars] 297] != [302, 272, 11843, 11837, 1587, 28723, 851, 349, 865, [108 chars], 13]

First differing element 2:
16762
11843

  [302,
   272,
-  16762,
-  9588,
+  11843,
+  11837,
-  12807,
?   ^ -

+  1587,
?   ^

-  2867,
-  2135,
   28723,
   851,
   349,
   865,
   264,
   1369,
   28723,
   13,
   13,
+  3381,
+  456,
+  654,
-  1014,
-  16762,
-  9588,
-  12807,
-  2867,
-  2135,
-  325,
-  28749,
-  8340,
-  28731,
-  403,
   264,
-  1587,
-  297]
+  1353,
+  11843,
+  28725,
+  368,
+  682,
+  347,
+  2240,
+  767,
+  298,
+  511,
+  28723,
+  13]

----------------------------------------------------------------------
Ran 2 tests in 20.792s

FAILED (failures=1)

python configuration:

Package                       Version
----------------------------- ---------
appnope                       0.1.2
argon2-cffi                   21.1.0
async-generator               1.10
attrs                         21.2.0
backcall                      0.2.0
backports.functools-lru-cache 1.6.4
bleach                        4.1.0
brotlipy                      0.7.0
certifi                       2021.10.8
cffi                          1.15.0
charset-normalizer            2.0.9
colorama                      0.4.4
conda                         4.11.0
conda-package-handling        1.7.3
cryptography                  36.0.0
debugpy                       1.5.1
decorator                     5.1.0
defusedxml                    0.7.1
entrypoints                   0.3
filelock                      3.13.1
fsspec                        2023.12.1
idna                          3.1
importlib-metadata            4.9.0
importlib-resources           5.4.0
ipykernel                     6.6.0
ipython                       7.30.1
ipython-genutils              0.2.0
ipywidgets                    7.6.5
jedi                          0.18.1
Jinja2                        3.0.3
jsonschema                    4.3.1
jupyter                       1.0.0
jupyter-client                7.1.0
jupyter-console               6.4.0
jupyter-core                  4.9.1
jupyterlab-pygments           0.1.2
jupyterlab-widgets            1.0.2
MarkupSafe                    2.0.1
matplotlib-inline             0.1.3
mistune                       0.8.4
mlx                           0.0.4
mpmath                        1.3.0
nbclient                      0.5.9
nbconvert                     6.3.0
nbformat                      5.1.3
nest-asyncio                  1.5.4
networkx                      3.2.1
notebook                      6.4.6
numpy                         1.26.2
packaging                     21.3
pandocfilters                 1.5.0
parso                         0.8.3
pexpect                       4.8.0
pickleshare                   0.7.5
pip                           21.3.1
prometheus-client             0.12.0
prompt-toolkit                3.0.24
ptyprocess                    0.7.0
pycosat                       0.6.3
pycparser                     2.21
Pygments                      2.10.0
pyOpenSSL                     21.0.0
pyparsing                     3.0.6
pyrsistent                    0.18.0
PySocks                       1.7.1
python-dateutil               2.8.2
pyzmq                         22.3.0
requests                      2.26.0
ruamel-yaml-conda             0.15.80
Send2Trash                    1.8.0
sentencepiece                 0.1.99
setuptools                    59.4.0
six                           1.16.0
sympy                         1.12
terminado                     0.12.1
testpath                      0.5.0
torch                         2.1.1
tornado                       6.1
tqdm                          4.62.3
traitlets                     5.1.1
typing_extensions             4.9.0
urllib3                       1.26.7
wcwidth                       0.2.5
webencodings                  0.5.1
wheel                         0.37.0
widgetsnbextension            3.5.2
zipp                          3.6.0

Phi-2 speed is limited by memory on 8GB Macs

Currently Phi-2 inferences at ~20s/tok on an 8GB machine, in spite of using ~10-20% of CPU/GPU at any given time. Bigger machines see much, much faster inference speeds. Imo, this points to the model being memory bound (although it uses ~5.4GB in fp16). I believe this will be solved by support for quantized models in MLX, but maybe somebody will see an obvious optimization in the model in the mean time.

[llama] Using precomputed weights from mlx-llama no longer works.

I think f0c57c1
broke the example of using the precomputed weights from mlx-llama as these precomputed weights come in the wrong file name convention: expected weights.npz, actual Llama-2-7b-chat.npz.
This can be easily fixed by symlinking: ln -s Llama-2-7b-chat.npz weights.npz. However, llama.py now expects that there is a params.json, which does not exist for the precomputed weights.

So I'd suggest to either update the mlx-llama repo or simply remove the following paragraph from the README.md.

Alternatively, you can also download a select converted checkpoints from the
mlx-llama community organisation on Hugging
Face and skip the conversion step.

I successfully ran the llama example with precomputed weights prior to f0c57c1.

Converting weights from the ai.meta.com model still works as documented.

Mistral is not working on M1 pro !!!

For me, the output gets stuck for the given example as well as for the other prompts.

[INFO] Loading model from disk.
[INFO] Starting generation...
It is a truth universally acknowledged,

Documentation Support

Thanks for this amazing library for Apple Silicon. I would like to know if I can start contributing code examples for each function available in the library so that it will be very useful for newcomers.

@awni, I would like to get confirmation from you before starting to contribute.

regards,
Muhammad Waseem

can you add the Mistral-7B-Instruct-v0.1 model ?

That would be nice to that the chat version of mistral.

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1

Mistral Lora example

Would be nice to get a Mistral 7B Lora example, similar to the Llama Lora one. Thanks!

mistral example seems to hang on Loading model from disk.

I am trying to run the mistral model 7B following the README's instructions (and adding the missing numpy dep)

This is all I see for a very long time

> python mistral.py --prompt "It is a truth universally acknowledged,"  --temp 0
[INFO] Loading model from disk.

Hardware: Apple M1 Max
OS: 14.1.2 (23B92)
Python 3.11.5
conda 23.7.4

My python process stays at a really low CPU and I am not seeing any disk access that is worthy of fiddling with 14GB of data.

Any way to better debug this or get to something working?

ERROR: No matching distribution found for mlx - Wheels missing for Python 3.12.x

Following the instructions from mistral/readme.md pip fails to find a package called mlx:

(mlx)samm@samm-mbp:~/git/mlx-examples/mistral@main pip install -r requirements.txt                                             <region:ap-southeast-2>
ERROR: Could not find a version that satisfies the requirement mlx (from versions: none)
ERROR: No matching distribution found for mlx

Python 3.12.0
Conda 23.11.0
macOS 14.1.2

*edit: It looks like the wheels are missing for 3.12.x

stable_diffusion : `ValueError: Shapes (2,127,1024) and (77,1024) cannot be broadcast.` with long prompts

Short description: a long prompts text could cause cannot be broadcast error:

prompts = (
    "A picturesque and pristine view of San Francisco, capturing the essence of a"
    "clean and beautiful city. The illustration should depict the iconic landmarks of"
    "San Francisco, like the Golden Gate Bridge, the Transamerica Pyramid, and the"
    "Painted Ladies Victorian houses. The scene is bathed in a warm, golden sunlight,"
    "enhancing the city's charm. Streets are clean, with lush green parks and vibrant"
    "flowers in bloom, showcasing a community that takes pride in their environment."
    "The bay area is visible in the background, with clear blue waters and a peaceful"
    "atmosphere, conveying a sense of harmony and unity within the city."
)

Output:

  0%|                                                                                                                                                   | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/xxxxx/stable_diffusion/demo.py", line 61, in <module>
    for x_t in tqdm(latents, total=args.steps):
  File "/xxxxx/venv/lib/python3.11/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/xxxxx/stable_diffusion/stable_diffusion/__init__.py", line 68, in generate_latents
    conditioning = self.text_encoder(tokens)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxxx/stable_diffusion/stable_diffusion/clip.py", line 62, in __call__
    x = x + self.position_embedding.weight[:N]
        ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ValueError: Shapes (2,127,1024) and (77,1024) cannot be broadcast.

Full code:

import argparse
from PIL import Image
from tqdm import tqdm

import mlx.core as mx

from stable_diffusion import StableDiffusion

prompts = (
    "A picturesque and pristine view of San Francisco, capturing the essence of a"
    "clean and beautiful city. The illustration should depict the iconic landmarks of"
    "San Francisco, like the Golden Gate Bridge, the Transamerica Pyramid, and the"
    "Painted Ladies Victorian houses. The scene is bathed in a warm, golden sunlight,"
    "enhancing the city's charm. Streets are clean, with lush green parks and vibrant"
    "flowers in bloom, showcasing a community that takes pride in their environment."
    "The bay area is visible in the background, with clear blue waters and a peaceful"
    "atmosphere, conveying a sense of harmony and unity within the city."
)

parser = argparse.ArgumentParser(
    description="Generate images from a textual prompt using stable diffusion"
)
parser.add_argument("--n_images", type=int, default=1)
parser.add_argument("--steps", type=int, default=50)
parser.add_argument("--cfg", type=float, default=7.5)
parser.add_argument("--negative_prompt", default="")
parser.add_argument("--n_rows", type=int, default=1)
parser.add_argument("--decoding_batch_size", type=int, default=1)
parser.add_argument("--output", default="demo_out.png")
args = parser.parse_args()

sd = StableDiffusion()

# Generate the latent vectors using diffusion
latents = sd.generate_latents(
    prompts,
    n_images=args.n_images,
    cfg_weight=args.cfg,
    num_steps=args.steps,
    negative_text=args.negative_prompt,
)
for x_t in tqdm(latents, total=args.steps):
    mx.simplify(x_t)
    mx.simplify(x_t)
    mx.eval(x_t)

# Decode them into images
decoded = []
for i in tqdm(range(0, args.n_images, args.decoding_batch_size)):
    decoded.append(sd.decode(x_t[i : i + args.decoding_batch_size]))
    mx.eval(decoded[-1])

# Arrange them on a grid
x = mx.concatenate(decoded, axis=0)
x = mx.pad(x, [(0, 0), (8, 8), (8, 8), (0, 0)])
B, H, W, C = x.shape
x = x.reshape(args.n_rows, B // args.n_rows, H, W, C).transpose(0, 2, 1, 3, 4)
x = x.reshape(args.n_rows * H, B // args.n_rows * W, C)
x = (x * 255).astype(mx.uint8)

# Save them to disc
im = Image.fromarray(x.__array__())
im.save(args.output)

requirements version:

charset-normalizer 3.3.2
llvmlite           0.41.1
mlx                0.0.3
more-itertools     10.1.0
mpmath             1.3.0
numba              0.58.1
numpy              1.26.2
sympy              1.12
tqdm               4.66.1

Python environment not easy to set up with requirements for Llama examples

Hello,

First: thanks for creating this repo and initiative to show concrete examples of MLX at work!

Running the Llama examples, and following the doc to install Python packages, I ran into issues of missing packages (especially if using HuggingFace and skipping the step of model weight conversions).

Would you be interested if I shared a Poetry Python env that I created?

Pros:

have a Python virtual env setup in a one-liner with poetry install
accessible to everyone, without the cost of having to install and fix Python environment issues -> it can help democratize MLX by lower the installation cost
setting up Python env with Pyenv, or Poetry, is now considered a better practice than pip + requirements.txt

Cons

poetry would be an added dependency to the project, but it can be as simple as brew install poetry

I'd be happy to collaborate by starting with a dedicated env for the Llama example if you believe it's a good idea.

Thank you!

mnist.mnist() doesn't exist?

the mnist example imports the python mnist library and calls the mnist() member, which doesn't seem to exist:

~/repos/mlx-examples
venv > python3 -V                                                                             Tue Dec  5 23:36:09 2023
Python 3.11.6

~/repos/mlx-examples
venv > pip install mnist                                                                      Tue Dec  5 23:36:15 2023
Collecting mnist
  Using cached mnist-0.2.2-py2.py3-none-any.whl (3.5 kB)
Collecting numpy (from mnist)
  Using cached numpy-1.26.2-cp311-cp311-macosx_11_0_arm64.whl.metadata (115 kB)
Using cached numpy-1.26.2-cp311-cp311-macosx_11_0_arm64.whl (14.0 MB)
Installing collected packages: numpy, mnist
Successfully installed mnist-0.2.2 numpy-1.26.2

~/repos/mlx-examples
venv > python -c "import mnist; mnist.mnist()"                                                Tue Dec  5 23:36:33 2023
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'mnist' has no attribute 'mnist'

nor in the source

error is located here

am i using this wrong/supposed to be importing mnist from another source? replacing the mnist.mnist() call with calls to mnist.train_images/train_labels/test_images/test_labels() seems to work fine

Issue with trying the Phi-2 model

I am having an issue with this thing and am not able to resolve only for the PHI-2:

python3.10 phi2.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO] Generating with Phi-2...
Write a detailed analogy between mathematics and a lighthouse.2023-12-15 16:38:19.662443: F tensorflow/c/experimental/stream_executor/stream_executor.cc:743] Non-OK-status: stream_executor::MultiPlatformManager::RegisterPlatform( std::move(cplatform)) status: INTERNAL: platform is already registered with name: "METAL"
zsh: abort      python3.10 phi2.py

Attempted Fixes

Upgraded to the latest TensorFlow version
Set TF_FORCE_UNIFIED_MEMORY=1
Uninstalled and cleanly reinstalled TensorFlow

Mixtral weights to MLX

Hi,

First of all thank you so much for introducing mlx.

I was looking at the Mixtral convert.py code and I think that line 19

state = torch.load(str(model_path / "consolidated.00.pt"))

should be

state = torch.load(str(model_path / "consolidated.00.pth"))

Combine part from readme:

cd mixtral-8x7b-32kseqlen/
cat consolidated.00.pth-split0 [...] consolidated.00.pth-split10 > consolidated.00.pth

llora/convert.py not working

I could not get the file llora/convert.py to work.
There is a typo with n_heads. it should be:
if "n_kv_heads" not in config: config["n_kv_heads"] = config["n_heads"]; if "head_dim" not in config: config["head_dim"] = config["dim"] // config["n_heads"];
Instead of config["n_kv_heads"] = n_heads; etc.

But even after fixing this, the resulting file did not work. The convert file in /llama/ did work.