Did you know that across 19,000 models in the hugginface text-gen
category, there is 15.5% of duplication in weights? This results in approximately 43 Terabytes of redundantly stored weights. Arvix paper with the full result coming soon™️.
cake is an more efficient way to download and store Machine Learning models from 🤗 Hugging Face. Think of it as 🐋 docker, but for ML models.
Leveraging the hugginface/safetensors format, it enables:
- Parallelising downloads of multiple layers at the same time.
- Robustness against network failures.
cake
caches each layer to disk, so halting half-way and retrying will not re-download already downloaded layers. - Deduplication of layers based on their contents, even across different models. If you download
Mistral-7B-v0.1
followed by a fine-tune of it which only modified the top two layers, thencake
will only download the top two layers.
- Setup linting in CI
- Setup local storage based on layer hashes
- On push to
main
, build the executable and create a release - Make CLI arguments easier to use for download (example:
cake download foo
instead ofcake download --model-id foo
) - Setup config and allow overriding of storage folder, registry URL, etc
- Setup a public facing instance of the hashes registry
Currently cake
can only be built from source. Pre-built binaries coming soon™️.
cake help
to view how to use it.
cake download <MODEL_ID>
to download a model to a folder relative to cake
called download
(config coming soon™️).
Example: cake download KoboldAI/fairseq-dense-1.3B
will download this model: https://huggingface.co/KoboldAI/fairseq-dense-1.3B from the main
branch.
cake
at this time is a personal project of mine with two main aims:
- Introducing better tooling into ML workflows
- Learning the
rust
programming language
Contributions targetting either of the above are appreciated and will be reviewed on a best-effort basis.
Given a model name (example: Mistral-7B-OpenOrca
):
- Extract the layer hashes for the model
- Check if all the layers are stored locally
- Create a diff of the layers available locally and the layers required
- For each layer required:
- Pull only the layers required from the remote storage [1]
- Compress it for local storage
- Once all layers are available, export a new full model file
As marked with [1], the "remote storage" is not fully figured out yet. Docker has the idea of a registry that could also work here. Using the Range
HTTP header has allowed us to pull only specific layers from Huggingface so far.
Example curl: curl --range 262175808-379616319 -L https://huggingface.co/KoboldAI/fairseq-dense-1.3B/resolve/main/model.safetensors\?download\=true -o model.safetensors