Comments (2)
Thank you @mrshu for reporting this,
The reason for this error is that is we need to update transformers
code for that.
In PyTorch you can do:
a = torch.zeros((2, 2))
b = a[:1, :]
torch.save({"a": a, "b": b})
And everything will work fine, and the tensors will be shared when loading the weights.
This is not possible in safetensors
because it can lead to tricky issues where you allocate more than you need (when loading only "b" for instance, you will allocate the whole "a" tensor, and only slice into it). It also leads to more issues when doing things like distributed training/saving (where the weights cannot be shared).
You can use this script to convert existing weights: https://github.com/huggingface/safetensors/blob/main/bindings/python/convert.py . More specifically those lines: https://github.com/huggingface/safetensors/blob/main/bindings/python/convert.py#L91-L94
I'll open something in transformers
to fix this. It's only a matter of skipping the decoder.weight
weight while saving for it to work.
This won't cause any issues for transformers
because the weight is automatically tied when you are loading with AutoModel.from_pretrained(...)
. Meaning the tensors will be correctly shared.
from safetensors.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
from safetensors.
Related Issues (20)
- Official C++ API lib/API HOT 1
- Inaccurate performance benchmarks HOT 1
- Support multi-device loading for load_file() with pt framework HOT 2
- Can't seem to skip parameter initialization while using the `safetensors.torch.load_model` API! HOT 1
- safetensors.tensorflow.save modifies its input argument HOT 1
- Error when using map_location="cuda:0" with multiple GPUs HOT 1
- ModuleNotFoundError: No module named 'torch._higher_order_ops' HOT 1
- misleading error message in `safe_open` HOT 4
- Register a MIME type for the `safetensors` format. HOT 1
- README is inaccurate HOT 3
- Unable to save llama3 since complex64 is not supported HOT 1
- How to save model checkpoint from a distributed training from multiple nodes? HOT 2
- Compilation fail due to test error HOT 3
- SafetensorError is not pickleable, potentially causing hangs in multi-process environments HOT 2
- inconsistent behavior of slicing in certain model file HOT 4
- get_slice has the same speed as get_tensor HOT 1
- get_tensor returns an empty tensor on LoRa safetensors adapter saved by SFTTrainer HOT 1
- pytorch: safetensors library hardcodes using CUDA if only device index is provided HOT 3
- Wrong device when using device="cpu" with torch.device HOT 3
- Problem serializing quantized weights HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from safetensors.