Right now https://github.com/rustformers/llm doesn't support runtime accelerator selection (rustformers/llm#386) which means that the current binaries are metal only for macos, or CPU only for Windows. Until then, maybe we build all Windows binaries for cuda (with feature cublas).
Currently we support a warm-up prompt where you can prompt where you can pre-condition the model before chatting with it. You can drop big text summaries in that model and then ask questions about the text, but this is difficult if there's a lot of text. It would be great if we supported file uploads. Let's start with .txt and .pdf. We won't be able to handle images in PDFs yet, but we should be able to at least deal with the text for now.
I was testing i loaded a 2B model then wanted to change the system prompt, so i changed it and hit start, and the new gen was much slower, when i checked i managed to verify the issue, it seems that the app is not unloading the GPU memory before loading the new model so it's getting loaded into shared memory instead of the native gpu memory.
Gotta say one of the easiest to get going (once i ran from source due to the other ticket issue lol)
Did notice a few UI/UX issues i wanted to mention that would be cool to see adjusted, not sure if it's windows issues only or all builds but since it's tauri i'd imagine it's the same across all...
No way to stop generation, I asked it a question, and wow it was determined to give me the full 2048 response lol and doesn't seem to be a way to stop generation, would be nice if there was a button next to the spinner to kill the current generation somehow. Especially if the LLM starts going down a rabbit hole that doesn't fulfill the original question intension...
The left panel could really use a max-width attribute it taking up 50% of the screen on a desktop for settings gets a little silly looking when maximized, in fact after loading the model it could probably collapse to the side behind a hamburger menu to really give a nice feel, as it is now it looks nice when it's small
Sending a message scrolls down but doesn't seem to fully scroll to show the message that was sent, it appears to focus at the middle of the first line of text that was sent at the bottom of the window oddly enough.
Would be really nice if the interface showed the tokens/s as part of the processing animations, as I noticed something odd and i'm not sure if it's an issue or my imagination, i loaded a model, was generating really fast, changed the system message, and started again and this time it felt slower, but i'm not sure if it actually is or if it's my imagination, and without a readout of tokens/s it's hard to tell.
Right now it's a bunch of garbage that is manually typed and prone to failures. Maybe we can consider parsing the quantization level as a parameter, the quantization method, the parameter count and a short description and then pretty-printing that data.
I'm just getting the "os error 3" error code affter launch ("path not found") and i can't do anything inside the app. I've tried the exe and msi installer on german Win11 22H2 Build 22621.1992.