ml-explore / mlx Goto Github PK
View Code? Open in Web Editor NEWMLX: An array framework for Apple silicon
Home Page: https://ml-explore.github.io/mlx/
License: MIT License
MLX: An array framework for Apple silicon
Home Page: https://ml-explore.github.io/mlx/
License: MIT License
I would like to request the addition of native support for the Upsample operation in MLX. Currently, the absence of Upsample functionality limits the flexibility of certain tasks that require resizing or upsampling of data (like UNet definition).
In the meanwhile, does someone have alternative methods for emulating the upsample functionality?
Thank you!
random.uniform returns all 0s when running on either cpu/gpu and dtype is set to float16
To reproduce
import mlx.core as mx
mx.random.uniform(shape=[2,2], dtype=mx.float16)
array([[0, 0],
[0, 0]], dtype=float16)
I would like to request a api for go to
Utilize this library.
When building within a docker container with OpenBLAS installed I'm of course getting xcrun: not found
Obtaining file:///app/mlx
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Installing backend dependencies ... done
Preparing editable metadata (pyproject.toml) ... done
Building wheels for collected packages: mlx
Building editable for mlx (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building editable for mlx (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [172 lines of output]
running editable_wheel
creating /tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx.egg-info
writing /tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx.egg-info/dependency_links.txt
writing requirements to /tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx.egg-info/requires.txt
writing top-level names to /tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx.egg-info/top_level.txt
writing manifest file '/tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx.egg-info/SOURCES.txt'
creating '/tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx-0.0.4.dev2023127+2e126ae.dist-info'
creating /tmp/pip-wheel-z5lwk8ns/.tmp-a96b8jhe/mlx-0.0.4.dev2023127+2e126ae.dist-info/WHEEL
running build_py
running build_ext
-- The CXX compiler identification is GNU 12.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Metal not found. Unable to build GPU
-- Accelerate not found, using default backend.
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/aarch64-linux-gnu/libopenblas.so
-- /usr/lib/aarch64-linux-gnu/libopenblas.so
-- /usr/include/aarch64-linux-gnu
-- Building Python bindings.
-- Found Python: /usr/local/bin/python3.9 (found version "3.9.18") found components: Interpreter Development Development.Module Development.Embed
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Found pybind11: /usr/local/include (found version "2.11.1")
-- Configuring done (1.1s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/tmpoi00n5vt.build-temp/mlx.core
[ 1%] Building arange.air
[ 2%] Building indexing.air
[ 4%] Building unary.air
[ 5%] Building sort.air
/bin/sh: 1: xcrun: not found
[ 7%] Building softmax.air
/bin/sh: 1: xcrun: not found
/bin/sh: 1: xcrun: not found
[ 8%] Building scan.air
/bin/sh: 1: xcrun: not found
gmake[2]: *** [mlx/backend/metal/kernels/CMakeFiles/mlx-metallib.dir/build.make:97: mlx/backend/metal/kernels/arange.air] Error 127
/bin/sh: 1: xcrun: not found
In the case of llama.cpp Makefile to build with metal support there is no need of xcrun
Everything I do is on the new Mac chips, so I'm very excited. I use C++/cmake/Juce and onnx/torch script. I'm trying to squeeze every last second of performance to do real time inference locally. I've tried swapping backend in onnx for CoreML, but didn't notice an improvement.
In short, I've tried out a few ways to increase inference speed using onnx and torch, I'm very intrigued by mlx.
This is this question. Is there any plan to support parsing of a graph, such as from Onnx? This would be an incredibly useful feature.
I've had a brief look at export weights, and looks like you're parsing and renaming some layer names, and exporting to a format that can be loaded in mlx. That script is for torch weights, how about one for onnx files?
I think it would be amazing if you supported parsing of graphs, I appreciate there's an example with 200 lines of code, the llama inference code. I think it could be helpful (potentially) if you were to show the original for comparison, if there's a torch equivalent, show the major conceptual changes/differences in setting up a model.
(You probably think I'm an idiot) The reason for showing differences is that; if there is a difference you highlight them, like remove torch calls, swap xname for yname, then we can visualise the changes needed and potentially make a manual model parser.
And please forgive my naivety, is there anything that would prevent a model parser currently?
For context I've built very simple neural net libraries for MLPs in C++ without any matrix libraries Eigen etc, and have a persistent obsession to understand the lower levels. I would be happy to invest some of my time to make an onnx parser if you could confirm the framework is feature rich enough. Or, should the first tool be a model checker, to see if a model can be converted, whether the ops are supported?
I wonder how this compares to llama.cpp for example in terms of performance in the same settings?
Getting segfault with unit tests on Apple M1 Pro and 13.6.2
$ pip install numpy
Collecting numpy
Downloading numpy-1.26.2-cp312-cp312-macosx_11_0_arm64.whl.metadata (61 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.2/61.2 kB 3.0 MB/s eta 0:00:00
Downloading numpy-1.26.2-cp312-cp312-macosx_11_0_arm64.whl (13.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.7/13.7 MB 9.6 MB/s eta 0:00:00
Installing collected packages: numpy
Successfully installed numpy-1.26.2
$ env CMAKE_BUILD_PARALLEL_LEVEL="" pip install .
Processing ml/mlx
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: mlx
Building wheel for mlx (pyproject.toml) ... done
Created wheel for mlx: filename=mlx-0.0.3.dev2023126+8c96b9a-cp312-cp312-macosx_13_0_arm64.whl size=9421096 sha256=6aaf46747055c8825696a7892f42f43d959842b44446079e28b4933dc9ae2da3
Stored in directory: /private/var/folders/nm/nyjkgrfn3fg207v5z5rz1w8m0000gp/T/pip-ephem-wheel-cache-93ng7che/wheels/1a/d0/9d/cbc077676fa323205d1fc73c17c324df59cac918a9352a52e5
Successfully built mlx
Installing collected packages: mlx
Attempting uninstall: mlx
Found existing installation: mlx 0.0.3.dev2023126+8c96b9a
Uninstalling mlx-0.0.3.dev2023126+8c96b9a:
Successfully uninstalled mlx-0.0.3.dev2023126+8c96b9a
Successfully installed mlx-0.0.3.dev2023126+8c96b9a
$ python -m unittest discover python/tests
.............................................ssss..........[1] 72687 segmentation fault python -m unittest discover python/tests
This feature request proposes the extension of BatchNorm1d
to the library. NaiveSyncBatchNorm1d
is an extension of the existing nn.BatchNorm1d
module, designed to support synchronization across multiple devices, either locally or globally.
The motivation behind this feature request is to enhance capabilities in distributed deep learning scenarios. the proposed NaiveSyncBatchNorm1d
module would fill this gap.
I propose the addition of the NaiveSyncBatchNorm1d
module. The module provides the following features:
While there are alternative ways to implement synchronized batch normalization, NaiveSyncBatchNorm1d
offers a simple and efficient solution.
Here's an example of how NaiveSyncBatchNorm1d
can be used in PyTorch:
sync_bn = NaiveSyncBatchNorm1d(num_sync_devices=4, global_sync=False, num_features=64)
output = sync_bn(input_tensor)
I would like to create a PR for the same.
Creating a meta issue for clarity / visibility so others can upvote / the team can prioritize.
It would be extremely useful to have a Swift ml-explore SDK that can:
One might not even need to code the model in Python mlx
in the first place: popular open source architectures could be written directly in Swift and used by many developers.
Hello there, great work !
I was checking if models with int8, int5, int4 quant formats can be used with this package ? Could you please create an example if possible ?
Thanks for the nice framework! However, the medical imaging community is missing 3d operations, such as conv3d
.
It would be awesome if linear algebra operations could be supported directly in MLX, for example the equivalent of PyTorch linalg.solve, which PyTorch currently supports only on CPU but not on mps .
Thank you.
When running mlx-examples/mnist on MacOS 14.1.1 VM running via Parallels 19.1.1:
user@Users-Virtual-Machine mnist % python main.py --gpu
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[AppleParavirtDevice newArgumentEncoderWithLayout:]: unrecognized selector sent to instance 0x149029e00'
*** First throw call stack:
(
0 CoreFoundation 0x000000018a682800 __exceptionPreprocess + 176
1 libobjc.A.dylib 0x000000018a179eb4 objc_exception_throw + 60
2 CoreFoundation 0x000000018a7343bc -[NSObject(NSObject) __retain_OA] + 0
3 AppleParavirtGPUMetalIOGPUFamily 0x0000000101fe1118 doUncompressedBlit + 11160
4 Metal 0x0000000194729184 -[_MTLDevice newArgumentEncoderWithArguments:structType:] + 136
5 libmlx.dylib 0x00000001113f660c _ZN3mlx4core6Gather8eval_gpuERKNSt3__16vectorINS0_5arrayENS2_9allocatorIS4_EEEERS4_ + 1380
6 libmlx.dylib 0x00000001113fc54c _ZNSt3__110__function6__funcIZN3mlx4core5metal9make_taskERNS3_5arrayENS_6vectorINS_13shared_futureIvEENS_9allocatorIS9_EEEENS_10shared_ptrINS_7promiseIvEEEEbE3$_2NSA_ISH_EEFvvEEclEv + 148
7 libmlx.dylib 0x0000000110d5ff14 _ZN3mlx4core9scheduler12StreamThread9thread_fnEv + 500
8 libmlx.dylib 0x0000000110d600d0 _ZNSt3__114__thread_proxyB7v160006INS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEEMN3mlx4core9scheduler12StreamThreadEFvvEPSA_EEEEEPvSF_ + 72
9 libsystem_pthread.dylib 0x000000018a531034 _pthread_start + 136
10 libsystem_pthread.dylib 0x000000018a52be3c thread_start + 8
)
libc++abi: terminating due to uncaught exception of type NSException
zsh: abort python main.py --gpu
I've had similar problems trying to use MPS inside a VM. Is there any plans to support the use of Metal inside VMs?
The top level readme mentions that current device support is limited to COU and GPU, is ANE support in the works?
I kept encountering the below error while trying the stable diffusion sample in mlx-examples on an 8GB M2 Mac Mini here. After some investigation (detailed here: ml-explore/mlx-examples#21) I found changing one line of code in MetalAllocator::MetalAllocator() in mlx/backend/metal/allocator.cpp to a much higher limit seems to have fixed the problem (this 1.5 seems maybe a bit conservative for low-RAM Macs):
block_limit_(1.5 * device_->recommendedMaxWorkingSetSize()) {}'
https://github.com/davidjoffe/mlx/blob/main/mlx/backend/metal/allocator.cpp
I made a fork with this change, and built from source to test.
I'd like to submit a Pull Request. This change should help for low-RAM Macs like 8GB Macs, though effectively just allows it to use swap instead of failing - arguably better than failing, but in the long run this behavior may need further improvement/refining, and/or giving users more control over whether/how they want this, or perhaps warning, or something.
(foo) david@Davids-Mac-mini stable_diffusion % python txt2image.py "A photo of an astronaut riding a horse on Mars." --n_images 1 --n_rows 1
/Users/david/mlx/foo/lib/python3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
100%|
[00:00<?, ?it/s]libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 134217728 bytes.
zsh: abort python txt2image.py "A photo of an astronaut riding a horse on Mars." 1 1
(foo) david@Davids-Mac-mini stable_diffusion % /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
For running on-device, it would be hugely beneficial if mlx
could support quantized kernels. I am the author of AutoAWQ which is a framework that allows users to quantize all the new large language models with minimal loss in performance.
When developing AutoAWQ, I always have to SSH into a machine with a CUDA device since it is not adapted to Metal kernels. If mlx
adds support for a quantized GEMV kernel, that could change, and it would mean users could run inference on-device.
Reference to CUDA kernel: https://github.com/casper-hansen/AutoAWQ/blob/main/awq_cuda/quantization/gemv_cuda.cu
Other Metal kernels that are related to AWQ: https://github.com/mit-han-lab/TinyChatEngine/tree/main/kernels/metal
Thanks to the mlx team for creating and sharing mlx.
I have managed to get a small CIFAR-10 image classification CNN up and running rather quickly in mlx (inspired by the PyTorch CIFAR-10 tutorial). I have found that pooling layers (e.g. MaxPool2D) are not available yet. I hope that they will be available in the next release(s).
Code is here: https://github.com/menzHSE/mlx-cifar-10-cnn
Heavily borrows from the mnist example in mlx (https://github.com/ml-explore/mlx-examples/tree/main/mnist)
Are Swift bindings to MLX in the roadmap/within scope?
I attempted to wrap up the built library + metallib file into a macOS .framework
bundle, giving it a correct .modulemap
and importing it into a Swift target with C++ interop enabled, but soon ran into this roadblock:
From Swift's documentation here:
Swift currently cannot import C++ modules introduced in the C++20 language standard.
I understand that C++20 is the blocker here, but I'm wondering if somehow the headers could be made backwards compatible for a version that the Swift compiler can understand.
Running either env CMAKE_BUILD_PARALLEL_LEVEL="" pip install .
or env CMAKE_BUILD_PARALLEL_LEVEL="" pip install -e .
will result in the following error:
error: can't copy '/var/folders/b8/6mjky64x0kn0v0s2l_4_pwm00000gn/T/tmpwy1qmvjs.build-lib/mlx/core.cpython-310-darwin.so': doesn't exist or not a regular file
[end of output]
Our team is developing on device training and inference for iOS devices. We wish to know whether MLX supports native model deployment on iOS device hardware such as GPU and CPU?
Hello there, great work!
Please tell me about the relationship between Stream and Device.
I recognized Stream as a so-called calculation graph where data flows, is it correct? And I thought that there was a keyword called Stream in the argument of each function to specify which device the calculation graph should be executed on, is that correct?
Hi, there:
May i ask what about support for the apple M3 silicons.
According to the documentation, these are the parameters available for Conv2d
.
However, the dilation and groups parameter as defined in the torch implementation of Conv2d, would be desirable to implement well-known architectures, e.g. ASPP. The following is the description of these two parameters according to torch:
dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.
groups control the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example, at groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
Thank you for your attention 👍
I am trying to use mx.pad with the optional constant array containing the padding values to be inserted, but it seems that only the first element of the constant array is utilized whatever I try. Am I interpreting the description of mlx.core.pad() the wrong way?
`import numpy as np
import mlx.core as mx
nx, ny = 10, 10 # Set grid dimensions
x = mx.array(np.linspace(0,1,nx))
y = mx.array(np.linspace(0,1,ny))
T = mx.zeros([nx, ny])
Tconst = mx.array(np.linspace(9,0,10))
print(Tconst)
T = mx.pad(T, ((1, 1), (1, 1)),Tconst)
print(T)`
array([9, 8, 7, ..., 2, 1, 0], dtype=float32)
array([[9, 9, 9, ..., 9, 9, 9],
[9, 0, 0, ..., 0, 0, 9],
[9, 0, 0, ..., 0, 0, 9],
...,
[9, 0, 0, ..., 0, 0, 9],
[9, 0, 0, ..., 0, 0, 9],
[9, 9, 9, ..., 9, 9, 9]], dtype=float32)
Process finished with exit code 0
See end of this tutorial:
https://ml-explore.github.io/mlx/build/html/examples/llama-inference.html
the link after "The full example code is available in" directs to the wrong place. (https://ml-explore.github.io/mlx/build/html/examples/code)
it should be directed to here https://github.com/ml-explore/mlx/tree/main/examples
There's also no examples of the transformer LLM/llama inference in either python or Cpp folders: https://github.com/ml-explore/mlx/tree/main/examples/python
Could you please add examples for both py + C++?
When I build on my MacBook, error is:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/15.0.0/include/arm_neon.h:28:2: error: "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
#error "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
^
/Volumes/long_yw/mlx/mlx/backend/accelerate/softmax.cpp:59:8: error: unknown type name 'float16x8_t'; did you mean 'float16_t'?
inline float16x8_t neon_fast_exp(float16x8_t x) {
^~~~~~~~~~~
float16_t
/Volumes/long_yw/mlx/mlx/types/half_types.h:16:29: note: 'float16_t' declared here
typedef struct _MLX_Float16 float16_t;
How to avoid this issue.
as title
Thanks for this library! It is very useful so far.
I am trying to use vmap
with a function where argument 1 is the indices into argument 2 (effectively a boolean mask to sum over). However, when trying to use vmap
, I get the runtime error Gather vmap is NYI, please change slices instead
. However, when I look at the Slice::vmap
function in primitives.cpp
, I see that it is just a placeholder with // TODO implement
https://github.com/ml-explore/mlx/blob/v0.0.4/mlx/primitives.cpp#L1907
Hi,
can you write example how to train model with 100+ pages x 100 document and ask system for summarization and generation new document with new data or train correct and false for checking new ones , second to train pdf or image document and labeling errors for checking new document. Simple api for this purposes
Hi everyone,
Proposal:
I would like to propose the addition of several other activation functions to the framework:
I am willing to contribute these or others.
Is there any way to import pretrained SOTA CV models (e.g. MobileNet) instead of creating toy CV models? I didn't find examples. Thanks for any response
Hi, I thought it could be nice if there were a few more losses ready to go in mlx. To start with I was thinking:
Maybe also:
Think this should pretty straightforward to do? Simply a case of adding them to python/mlx/nn/losses.py
Any thoughts? And I am happy to work on this issue.
Can you tell me how to get pip install of the lastest commit ?
pip install --upgrade --no-cache-dir https://github.com/ml-explore/mlx/archive/refs/heads/master.zip
or
pip install git+https://github.com/ml-explore/mlx@[commit-sha]
Is there aversion to including type annotations in the python implementation?
While the python interpreter doesn’t directly check/enforce type annotations it can have a non-trivial impact on developer experience to have well-typed libraries.
CPU:intel I7 9700K
Memory:16GB
MacOS:MacOS 14
Can I use mlx?
Can you add more mps accelerated functions to mlx like sine, cosine, bessel functions?
Hello, is there a roadmap for future releases of MLX? I see there are a number of requests regarding features like missing operations (e.g. pooling/upsampling), quantized inference, profiling tools, a Swift API, and so on. Is there anywhere public these are prioritized?
I've been playing with MLX and have been enjoying using it for toy models and am attempting to implement StripedHyena currently. However, for anything more serious, I find it hard to justify using MLX over other options: PyTorch and JAX support MPS, are usable on other platforms, utilize Nvidia GPU's for larger jobs, have a substantially larger community. MLX also doesn't have any conversion support to ONNX (or ironically, CoreML).
I'm probably speaking for many people, but a roadmap would help immensely for determining how much time to invest in this framework, what kind of work would be best suited for it (e.g. I presume pre-training a foundation model is non-starter), and what the level of commitment there will be from Apple AI/ML.
It looks like this is still missing many matrix operations like QR, SVD, einsum, etc. Is there a clear path to using these with or without MLX?
This has been a similar issue with the PyTorch MPS backend. While there is a long tail of these operations to support, they are essential to many machine learning models. As can be seen in the PyTorch issue, not including them limits the utility of packages like this.
When building the mlx project I'm getting an error in regards to : NEON intrinsics not available with the soft-float ABI
Please use -mfloat-abi=softfp or -mfloat-abi=hard
ChatGPT suggests updating MakeFile with, but this doesn't work
CFLAGS += -mfloat-abi=softfp
or updating CMakeLists.txt (in project root) didn't help either.
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mfloat-abi=softfp")
git clone [email protected]:ml-explore/mlx.git mlx && cd mlx
mkdir -p build && cd build
cmake .. && make -j
throws
/Library/Developer/CommandLineTools/usr/lib/clang/15.0.0/include/arm_neon.h:28:2: error: "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
#error "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
^
/Users/stephan/projects/mlx/mlx/backend/accelerate/softmax.cpp:59:8: error: unknown type name 'float16x8_t'; did you mean 'float16_t'?
inline float16x8_t neon_fast_exp(float16x8_t x) {
^~~~~~~~~~~
float16_t
/Users/stephan/projects/mlx/mlx/types/half_types.h:16:29: note: 'float16_t' declared here
typedef struct _MLX_Float16 float16_t;
^
/Users/stephan/projects/mlx/mlx/backend/accelerate/softmax.cpp:59:34: error: unknown type name 'float16x8_t'; did you mean 'float16_t'?
inline float16x8_t neon_fast_exp(float16x8_t x) {
^~~~~~~~~~~
float16_t
...
Apple M1 Max
I'm on Sonoma 14.1.1 (23B81)
cmake version 3.27.9
Python 3.10.5
Issue Type: Enhancement
Proposal:
I would like to propose an enhancement to the framework by adding a variety of optimizers. While the current implementation includes Stochastic Gradient Descent (SGD) and Adam optimizers, the addition of more optimization algorithms could greatly benefit users with diverse requirements.
Potential Optimizers to Include:
I would appreciate your guidance on the next steps for contributing to this enhancement. Whether it's providing additional details, discussing the proposed optimizers, or collaborating on the implementation, I am eager to contribute to the growth of this fantastic framework.
The Python array API standard standardises common functionality across Python array/tensor libraries. NumPy, PyTorch and CuPy are planning to have full implementations, and Dask and JAX also have implementations in progress. You could implement this in your main namespace or a separate namespace.
Why should you do this? As well as making it easier for users to convert existing NumPy/PyTorch/CuPy code to MLX, there is potential for interoperability with other libraries. For example, from the NumPy ecosystem, SciPy and scikit-learn have partial experimental support for arrays which comply with the standard.
If you are interested in this, the consortium would love to hear feedback over at https://github.com/data-apis/consortium-feedback/. Some potential pain points, such as missing float64
support, have already been discussed very briefly in data-apis/array-api#719.
It is amazing that mlx supports efficient training with metal.
While we have the LLM inference example with Llama and Mistral, could you share an example or advice on how to fine-tune LLM (or quantized model) using MLX?
I have implemented a simple solution of the 2D Heat Conduction Equation with 2 Neumann and 2 Dirichlet BCs. I have the code implemented both using PyTorch and the MLX framework and I am testing the relative performance on an M2 Ultra with 128GB memory.
The MLX code is included below. So far, performance in various tests (on the same machine) show the MLX version to be somewhere between X2 and X10 faster depending on the problem size.
However, I have an issue that I need to understand. Depending on the problem size, I need to include the line
if step % 15000== 0: mx.eval(T)
to avoid segmentation fault. I imagine this has to do with the lazy evaluation and arrays being in buffer? My issue is that currently I figure each time how often I need to mx.eval
empirically. Is there some programmatic and more elegant way to automatically issue the mx.eval
at the right frequency based on the problem size?
Here is the complete code below. Thank you for all your help @awni !
# Solving the 2D Heat Conduction Equation with 2 Neumann and 2 Dirichlet PCs
import numpy as np
import matplotlib.pyplot as plt
import time
import mlx.core as mx
# Convergence tolerance to stop early (currently disabled)
#convergence_tolerance = 1e-8
# Grid size and material properties setup
nx, ny = 5000, 5000 # Set grid dimensions
k = 1.0 # Thermal conductivity
# Time-stepping parameters
desired_dt = 0.01 # Desired time step
max_steps = 10000 # Maximum number of time steps
# Creating a linearly spaced grid
x = mx.array(np.linspace(0,1,nx))
y = mx.array(np.linspace(0,1,ny))
dx = x[1] - x[0] # Grid spacing in x direction
dy = y[1] - y[0] # Grid spacing in y direction
# Function to calculate the maximum stable time step for the explicit Euler method
def calculate_max_stable_dt(alpha, dx, dy):
return (1 / (2 * alpha)) * (1 / (1/dx**2 + 1/dy**2))
# Material properties for stability calculation
rho = 1.0 # Density
cp = 1.0 # Specific heat capacity
alpha = k / (rho * cp) # Thermal diffusivity
# Compute maximum stable time step
dt_max = calculate_max_stable_dt(alpha, dx, dy)
dt = min(dt_max, desired_dt) # Use the smaller of the desired or maximum stable time step
# Initializing the temperature field on the GPU
T = mx.zeros([nx, ny])
T_old = mx.zeros_like(T)
# Applying Dirichlet boundary conditions
T[:, 0] = 0.0 # Set left boundary temperature
T[:, -1] = 1.0 # Set right boundary temperature
# Time-stepping loop for the heat equation
start_time = time.time() # Capture start time
for step in range(max_steps):
T_old = mx.broadcast_to(T,shape=T.shape)
# Update interior points using finite difference method
# Pad the interior points for broadcasting
T = mx.pad(mx.pad( (T_old[1:-1,1:-1] + dt * k * (
(T_old[2:, 1:-1] - 2 * T_old[1:-1, 1:-1] + T_old[:-2, 1:-1]) / dx**2 +
(T_old[1:-1, 2:] - 2 * T_old[1:-1, 1:-1] + T_old[1:-1, :-2]) / dy**2
)), ((0,0),(0,1)),1),((0,0),(1,0)), 0)
# Update Neumann boundaries (zero-flux) at top and bottom
T = mx.concatenate([mx.expand_dims(T[0, :], (0)), T, mx.expand_dims(T[-1, :], (-0))], axis=0)
if step % 15000== 0:
mx.eval(T)
end_time = time.time() # Capture end time
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time:.2f} seconds")
# Visualizing the temperature field using matplotlib
plt.imshow(T, cmap='hot', interpolation='nearest')
plt.colorbar() # Add a color bar to indicate temperature scales
plt.show()
In case anyone runs into the following error when installing from source:
xcrun: error: unable to find utility "metal", not a developer tool or in PATH
It's solvable with instructions from here: gfx-rs/gfx#2309
I had previously only installed Xcode command-line tools (xcode-select --install
), but was running into the above error. Installing full Xcode and running:
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
Allowed me to install from source (env CMAKE_BUILD_PARALLEL_LEVEL="" pip install -e .
)
It would also be nice to have a binary for this on conda-forge. You can see my start at implementing a recipe for this at https://github.com/conda-forge/staged-recipes/pull/24687/files As the CI over there doesn't build for Apple silicon, we don't see any failure. Thus I would use this issue to reach out for help with build issues.
I've noticed that the current mlx builds support Python versions 3.8 to 3.11. As Python 3.12 is gaining traction, I'm initiating a compatibility check by building mlx from source on Python 3.12. 3.7 support is declared but it don't see the build in the pip package here: https://pypi.org/project/mlx/#files
Python 3.11 (cp311)
Python 3.10 (cp310)
Python 3.9 (cp39)
Python 3.8 (cp38)
My goal is to identify any compatibility issues and report back with detailed findings. Depending on the results, I'm also willing to contribute fixes or updates needed to ensure Python 3.12 support.
I'm going to try building it on 3.12 python version from the source code and I will report back the findings.
we can start there:
https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/JAX/tutorial7/GNN_overview.html
or here:
https://danielegrattarola.github.io/posts/2021-03-12/gnn-lecture-part-2.html
but ideally GIN, DMPNN and AttentiveFP are part of the best models so far: https://github.com/aimat-lab/gcnn_keras/tree/master/kgcnn/literature
Seems like the pip install does not install the correct package on my env
>> pip install mlx
Collecting mlx
Using cached mlx-0.0.0-py3-none-any.whl.metadata (505 bytes)
Using cached mlx-0.0.0-py3-none-any.whl (2.1 kB)
Installing collected packages: mlx
Successfully installed mlx-0.0.0
If you trace the installed package it gives
>> cd ~/opt/anaconda3/envs/test/lib/python3.9/site-packages/mlx
>> ls
__init__.py __pycache__
>> cat __init__.py
print("HELLO WORLD!")
Environment:
M1 Mac Air + Miniconda
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.