Comments (10)
Hi! I'd be interested in working on this. I've spent some time thinking about a rough plan after reading through the code:
- Move
candle-kernels
tocandle-cuda
orcandle-cuda-kernels
(the name can be bikeshed'd in the PR) - Make a
candle-wgsl(-kernels)
crate, with kernels implementing the ops needed. Can maybe re-use some implementations based on https://github.com/webonnx/wonnx - Add a new backend implementation using wgpu-rs to execute the kernels
- Add tests, info on how to run things
- Maybe add flash attention kernels -- might be a lot of work so probably worth its own follow-up issue.
Some questions:
- Is wrapping with wgpu-rs acceptable? I see that the cuda wrappers use cudarc, and wgpu-rs seems to be the closest equivalent for wgsl shaders for compute. It would also be an easy way to bridge things on native to Vulkan and Metal as well (would take more work though, e.g. integrating naga).
- Do you want all kernels in one crate? In the above I suggest splitting them into their own backend-specific crates, but I guess that could be abstracted into the
candle-kernels
crate itself if you prefer. - I will probably focus on the web for now, because I think it is the most promising, but I also will probably follow up with testing Vulkan, as I think that will make inference much more portable.
from candle.
WebGPU is certainly on our radar, we already have some wasm based demos for llama2.c and whisper that you can try in a web browser. When using wasm, candle should leverage your cpu simd instructions, but having WebGPU on top of this would bring it to a far better level.
from candle.
And if/when candle adds WebGPU support, I'll add it as a backend to Transformers.js! 🚀 Really exciting times! 🔥
from candle.
is now support webgpu ?
from candle.
Sounds like a very reasonable plan.
I think we can start working without tying too much to candle
, maybe other projects could be interested on having webgpu support (that's why having cudarc is great it can be used by other projects, not necessarily candle, and why we keep pushing changes upstream as much as possible, like the NCCL support).
wgpu-rs
: Last time I tried Vulkan, and doing compute shaders, the performance was abysmal. And it makes sense, it's not really designed for ML. In general I would go for the most performant solutions from the start, not have backends just for the sake of it. AMD already has libraries intended for ML: https://www.amd.com/en/graphics/servers-solutions-rocm we could link to that directly if it makes more sense.
For Metal I would have the same opinion, we should try and make metal usable outside of this crate and be mere users of it.
For any new backend, it is very important to create a way for USERS to create their own kernel/op. It's impossible to keep up with all the innovation imho so the most important thing is to allow users of candle to use any op they want, without having to wait for us to implement it.
from candle.
candle webassembly Is there any plan to support WebGPU?
from candle.
re wgpu-rs, I certainly agree that native backends are the best, I only bring up Vulkan/Metal as bonuses. I was suggesting wgpu-rs because it is the major WebGPU library for Rust, it looks like Burn uses it. So I think it is the best library for the job, I just wanted to see if adding the dependency was acceptable. The alternative would be to write a bunch of bindings via web-sys around webgpu APIs.
For any new backend, it is very important to create a way for USERS to create their own kernel/op.
Certainly! I mostly was discussing the crate rename/split focused on candle-provided kernels. For user written kernels, would it not be best to simply add wgpu_fwd
to the Op{N}
traits that the user may implement? Are there other details I should be aware of?
from candle.
Certainly! I mostly was discussing the crate rename/split focused on candle-provided kernels. For user written kernels, would it not be best to simply add wgpu_fwd to the Op{N} traits that the user may implement? Are there other details I should be aware of?
Basically yes. Tensor is Send+Sync, therefore Op
needs to be Send+Sync
(because it's kept for gradients). That could end up being a limitation: https://github.com/huggingface/candle/blob/main/candle-examples/examples/llama_multiprocess/model.rs#L33-L38
I think it is the best library for the job
What other libraries or alternatives are there ?
Looking at this: https://www.reddit.com/r/rust/comments/159cbto/announcing_burnwgpu_new_deep_learning/jtf80xq/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button I have the feeling like it's not the correct way. We need only webGPU, not those 10 other things.
In any case this is not in our short term roadmap.
from candle.
Basically yes. Tensor is Send+Sync
Good news there, WebAssembly doesn't have OS-style threads! The webworkers-based "threads" might require things to be Send/Sync, but I will have to look closer at that.
What other libraries or alternatives are there ?
Honestly, I didn't find any that seemed currently maintained or more than toys.
We need only webGPU, not those 10 other things.
Yeah, its possible that wgpu isn't the right project, it is pretty large, but the other part is those other features are optional, so I don't know how much it hurts to include it.
In any case this is not in our short term roadmap.
Fair enough!
from candle.
One general comment.
Move candle-kernels to candle-cuda or candle-cuda-kernels (the name can be bikeshed'd in the PR)
I feel like writing those compute shaders in glsl might be a better option. I have done some rough testing on different gpgpu performance and vulkan with glsl seems to be able to keep up with cuda while wgpu with wgsl reaches bottleneck pretty early with the same optimization tricks. On top of that, webgpu supports glsl as well, so we could have not only a webgpu backend but a vulkan one as well (I guess for folks who still want to run it natively by don't have the luxury of a nvidia GPU but an intel/AMD GPU)
from candle.
Related Issues (20)
- Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16 HOT 1
- quantized exmaple with CUDA Error: not a f64 F32(1e-5) HOT 2
- calculation result is incorrect on metal backend HOT 9
- no cuda implementation for rms-norm HOT 2
- Add docs for argmax_keepdim and specify what happens in the event of a tie
- Can't loop over model implementation based off examples more than N times (7-20+ it ends up breaking) HOT 12
- Update Installation Page for Windows Requirements
- candle tensor operations are bit slower than pytorch tensor operations HOT 9
- Tensor copy from noncontiguous tensor still make noncontiguous tensor HOT 2
- bias is not randomly initialized on metal HOT 4
- Flash Attention not working on CUDA 12.1 HOT 10
- Quantized much slower than llama.cpp with same model and settings... HOT 22
- [help] how to update a portion of a long tensor HOT 5
- The current `squeeze` and `unsqueeze` implementations may be degreasing in some EXAMPLES. HOT 2
- .sum() with a u8 tensor overflows after 255 HOT 1
- support for json (or other?) grammar? HOT 1
- Support multimodal LLMs? HOT 2
- Latest tensor squeeze impl make cuda matmal fail HOT 13
- Issue Labelling for Good First Issues
- 1.58 bit implementation HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from candle.