When running mlx-examples/mnist on MacOS 14.1.1 VM running via Parallels 19.1.1: <

We should be able to run on virtual devices once <a class="issue-link js-issue-link" d

Segfaults when running examples using GPU inside a VM about mlx HOT 13 CLOSED

ml-explore commented on July 28, 2024

Segfaults when running examples using GPU inside a VM

from mlx.

Comments (13)

nullhook commented on July 28, 2024 1

it could be possible metal argument buffers aren't supported in a VM environment? a simple check could justify that:

device->argumentBuffersSupport() device must be a tier2 to support argument buffers.

from mlx.

jagrit06 commented on July 28, 2024 1

We should be able to run on virtual devices once #683 lands!

from mlx.

jagrit06 commented on July 28, 2024

I'll admit I don't have any experience using MacOS VMs on Parallels - do you have access to AppleSilicon GPUs on your VM ?
Running the command system_profiler SPDisplaysDataType might help us figure out if there is a GPU with metal support

If there isn't metal support, then I'm afraid we won't be able to help you much further

from mlx.

jerpeter1 commented on July 28, 2024

system_profiler SPDisplaysDataType doesn't report anything. Running llama.cpp with MPS support compiled in, it reports this:

ggml_metal_init: allocating
ggml_metal_init: found device: Apple Paravirtual device
ggml_metal_init: picking default device: Apple Paravirtual device
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: loading '/Users/user/llama.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple Paravirtual device
ggml_metal_init: GPU family: MTLGPUFamilyApple5 (1005)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  1024.00 MiB
ggml_metal_init: maxTransferRate               = built-in GPU

Very basic usage of Metal, e.g. https://github.com/neurolabusc/Metal/blob/main/minimal run as expected, but more advanced examples e.g. https://github.com/neurolabusc/Metal/tree/main/mmul have errors (in that particular example, the MPSMultiplication results are all 0).

from mlx.

nullhook commented on July 28, 2024

is this a VM running on Intel or Apple Silicon? your report says "hasUnifiedMemory: true" so, i'm assuming it's silicon?

from mlx.

jerpeter1 commented on July 28, 2024

Yes, it's a VM running on a M2 Pro.

from mlx.

jagrit06 commented on July 28, 2024

Oh, if llama.cpp works, then it must just be a missing ArgumentEncoder function for virtual devices down a Metal Framework level rather than something bigger like I was worried about
Let me see if I can look into it any further, but since this is something on the Metal Framework level, I wouldn't expect a quick update

That said, we only use Metal Argument Encoders for Gather and Scatter primitives to do multi-dimensional indexing. It is needed since we have a container that holds multiple device buffers of indices that are all to be used by the kernel.
There is a possibility that someone can write a few simple cases of the Gather and Scatter primitives without a Metal Argument Buffer by just simply unrolling those containers into a different arguments
For anyone interested in looking into that, these 2 files would a starting point:
https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/indexing.cpp
https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/kernels/indexing.metal

from mlx.

jerpeter1 commented on July 28, 2024

I've confirmed that argumentBuffersSupport is reporting 0 in my MacOS VM.

from mlx.

jerpeter1 commented on July 28, 2024

if llama.cpp works

I wouldn't say that llama.cpp works - it detects the paravirtualized GPU, but it reports errors when attempting to offload any layers onto the GPU.

from mlx.

jagrit06 commented on July 28, 2024

if llama.cpp works

I wouldn't say that llama.cpp works - it detects the paravirtualized GPU, but it reports errors when attempting to offload any layers onto the GPU.

In that case, unfortunately, it might be a larger issue of metal support on virtual devices - are you aware if a simple metal program is able to run on the machine ? Something like this maybe: https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu?language=objc

from mlx.

nullhook commented on July 28, 2024

if its reporting 0 then its Tier1 and most probably arguments buffer apis won't work. well, it may work but there's just a lot of limitations

from mlx.

jerpeter1 commented on July 28, 2024

are you aware if a simple metal program is able to run on the machine ? Something like this maybe: https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu?language=objc

Yes, that program runs successfully:

user@Users-Virtual-Machine MetalComputeBasic % ./MetalComputeBasic 
Compute results as expected
2023-12-11 10:11:22.711 MetalComputeBasic[656:4830] Execution finished

from mlx.

jerpeter1 commented on July 28, 2024

Yes, that program runs successfully:

Increasing the amount of memory usage makes this sample program fail in the VM (e.g. from const unsigned long arrayLength = 1 << 24; to const unsigned long arrayLength = 1 << 27;:

Compute ERROR: index=0 result=0 vs 0.216261=a+b
Assertion failed: (result[index] == (a[index] + b[index])), function -[MetalAdder verifyResults], file MetalAdder.m, line 158.

from mlx.

Segfaults when running examples using GPU inside a VM about mlx HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent