Git Product home page Git Product logo

safetensors's Introduction

Hugging Face Safetensors Library

Python Pypi Documentation Codecov Downloads

Rust Crates.io Documentation Codecov Dependency status

safetensors

Safetensors

This repository implements a new simple format for storing tensors safely (as opposed to pickle) and that is still fast (zero-copy).

Installation

Pip

You can install safetensors via the pip manager:

pip install safetensors

From source

For the sources, you need Rust

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Make sure it's up to date and using stable channel
rustup update
git clone https://github.com/huggingface/safetensors
cd safetensors/bindings/python
pip install setuptools_rust
pip install -e .

Getting started

import torch
from safetensors import safe_open
from safetensors.torch import save_file

tensors = {
   "weight1": torch.zeros((1024, 1024)),
   "weight2": torch.zeros((1024, 1024))
}
save_file(tensors, "model.safetensors")

tensors = {}
with safe_open("model.safetensors", framework="pt", device="cpu") as f:
   for key in f.keys():
       tensors[key] = f.get_tensor(key)

Python documentation

Format

  • 8 bytes: N, an unsigned little-endian 64-bit integer, containing the size of the header
  • N bytes: a JSON UTF-8 string representing the header.
    • The header data MUST begin with a { character (0x7B).
    • The header data MAY be trailing padded with whitespace (0x20).
    • The header is a dict like {"TENSOR_NAME": {"dtype": "F16", "shape": [1, 16, 256], "data_offsets": [BEGIN, END]}, "NEXT_TENSOR_NAME": {...}, ...},
      • data_offsets point to the tensor data relative to the beginning of the byte buffer (i.e. not an absolute position in the file), with BEGIN as the starting offset and END as the one-past offset (so total tensor byte size = END - BEGIN).
    • A special key __metadata__ is allowed to contain free form string-to-string map. Arbitrary JSON is not allowed, all values must be strings.
  • Rest of the file: byte-buffer.

Notes:

  • Duplicate keys are disallowed. Not all parsers may respect this.
  • In general the subset of JSON is implicitly decided by serde_json for this library. Anything obscure might be modified at a later time, that odd ways to represent integer, newlines and escapes in utf-8 strings. This would only be done for safety concerns
  • Tensor values are not checked against, in particular NaN and +/-Inf could be in the file
  • Empty tensors (tensors with 1 dimension being 0) are allowed. They are not storing any data in the databuffer, yet retaining size in the header. They don't really bring a lot of values but are accepted since they are valid tensors from traditional tensor libraries perspective (torch, tensorflow, numpy, ..).
  • 0-rank Tensors (tensors with shape []) are allowed, they are merely a scalar.
  • The byte buffer needs to be entirely indexed, and cannot contain holes. This prevents the creation of polyglot files.
  • Endianness: Little-endian. moment.
  • Order: 'C' or row-major.

Yet another format ?

The main rationale for this crate is to remove the need to use pickle on PyTorch which is used by default. There are other formats out there used by machine learning and more general formats.

Let's take a look at alternatives and why this format is deemed interesting. This is my very personal and probably biased view:

Format Safe Zero-copy Lazy loading No file size limit Layout control Flexibility Bfloat16/Fp8
pickle (PyTorch) 🗸 🗸 🗸
H5 (Tensorflow) 🗸 🗸 🗸 ~ ~
SavedModel (Tensorflow) 🗸 🗸 🗸 🗸
MsgPack (flax) 🗸 🗸 🗸 🗸
Protobuf (ONNX) 🗸 🗸
Cap'n'Proto 🗸 🗸 ~ 🗸 🗸 ~
Arrow ? ? ? ? ? ?
Numpy (npy,npz) 🗸 ? ? 🗸
pdparams (Paddle) 🗸 🗸 🗸
SafeTensors 🗸 🗸 🗸 🗸 🗸 🗸
  • Safe: Can I use a file randomly downloaded and expect not to run arbitrary code ?
  • Zero-copy: Does reading the file require more memory than the original file ?
  • Lazy loading: Can I inspect the file without loading everything ? And loading only some tensors in it without scanning the whole file (distributed setting) ?
  • Layout control: Lazy loading, is not necessarily enough since if the information about tensors is spread out in your file, then even if the information is lazily accessible you might have to access most of your file to read the available tensors (incurring many DISK -> RAM copies). Controlling the layout to keep fast access to single tensors is important.
  • No file size limit: Is there a limit to the file size ?
  • Flexibility: Can I save custom code in the format and be able to use it later with zero extra code ? (~ means we can store more than pure tensors, but no custom code)
  • Bfloat16/Fp8: Does the format support native bfloat16/fp8 (meaning no weird workarounds are necessary)? This is becoming increasingly important in the ML world.

Main oppositions

  • Pickle: Unsafe, runs arbitrary code
  • H5: Apparently now discouraged for TF/Keras. Seems like a great fit otherwise actually. Some classic use after free issues: https://www.cvedetails.com/vulnerability-list/vendor_id-15991/product_id-35054/Hdfgroup-Hdf5.html. On a very different level than pickle security-wise. Also 210k lines of code vs ~400 lines for this lib currently.
  • SavedModel: Tensorflow specific (it contains TF graph information).
  • MsgPack: No layout control to enable lazy loading (important for loading specific parts in distributed setting)
  • Protobuf: Hard 2Go max file size limit
  • Cap'n'proto: Float16 support is not present link so using a manual wrapper over a byte-buffer would be necessary. Layout control seems possible but not trivial as buffers have limitations link.
  • Numpy (npz): No bfloat16 support. Vulnerable to zip bombs (DOS). Not zero-copy.
  • Arrow: No bfloat16 support.

Notes

  • Zero-copy: No format is really zero-copy in ML, it needs to go from disk to RAM/GPU RAM (that takes time). On CPU, if the file is already in cache, then it can truly be zero-copy, whereas on GPU there is not such disk cache, so a copy is always required but you can bypass allocating all the tensors on CPU at any given point. SafeTensors is not zero-copy for the header. The choice of JSON is pretty arbitrary, but since deserialization is <<< of the time required to load the actual tensor data and is readable I went that way, (also space is <<< to the tensor data).

  • Endianness: Little-endian. This can be modified later, but it feels really unnecessary at the moment.

  • Order: 'C' or row-major. This seems to have won. We can add that information later if needed.

  • Stride: No striding, all tensors need to be packed before being serialized. I have yet to see a case where it seems useful to have a strided tensor stored in serialized format.

Benefits

Since we can invent a new format we can propose additional benefits:

  • Prevent DOS attacks: We can craft the format in such a way that it's almost impossible to use malicious files to DOS attack a user. Currently, there's a limit on the size of the header of 100MB to prevent parsing extremely large JSON. Also when reading the file, there's a guarantee that addresses in the file do not overlap in any way, meaning when you're loading a file you should never exceed the size of the file in memory

  • Faster load: PyTorch seems to be the fastest file to load out in the major ML formats. However, it does seem to have an extra copy on CPU, which we can bypass in this lib by using torch.UntypedStorage.from_file. Currently, CPU loading times are extremely fast with this lib compared to pickle. GPU loading times are as fast or faster than PyTorch equivalent. Loading first on CPU with memmapping with torch, and then moving all tensors to GPU seems to be faster too somehow (similar behavior in torch pickle)

  • Lazy loading: in distributed (multi-node or multi-gpu) settings, it's nice to be able to load only part of the tensors on the various models. For BLOOM using this format enabled to load the model on 8 GPUs from 10mn with regular PyTorch weights down to 45s. This really speeds up feedbacks loops when developing on the model. For instance you don't have to have separate copies of the weights when changing the distribution strategy (for instance Pipeline Parallelism vs Tensor Parallelism).

License: Apache-2.0

safetensors's People

Contributors

alvarobartt avatar b-kamphorst avatar cccntu avatar chainyo avatar cospectrum avatar jbn avatar julien-c avatar junnyu avatar kevinhu avatar kiszk avatar kolanich avatar lachlancahill avatar laurentmazare avatar lhoestq avatar lsb avatar mcpatate avatar mfuntowicz avatar mishig25 avatar narsil avatar nickkolok avatar osanseviero avatar patrickvonplaten avatar paulbricman avatar pcuenca avatar sgugger avatar stevhliu avatar thomasw21 avatar wauplin avatar wouterzwerink avatar zouharvi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

safetensors's Issues

Question about determinism across multiple saves

Hello, thanks for this initiative!

I have faced hashing/comparison problems with pickle as the serialized output is known to be non-deterministic for the same object (e.g. dictionaries with the same items but with different insert histories). Does this library (and its associated format) provide a guarantee regarding the serialization being deterministic for the same input?

[Critical] Design defect: endianness is not stored and can be arbitrary

import numpy as np
import safetensors.numpy

a = safetensors.numpy.save({"a": np.array(range(6), dtype='>u4')})
b = safetensors.numpy.save({"a":np.array(range(6), dtype='<u4')})
print(a)
print(b)
b'7\x00\x00\x00\x00\x00\x00\x00{"a":{"dtype":"U32","shape":[6],"data_offsets":[0,24]}}\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05'
b'7\x00\x00\x00\x00\x00\x00\x00{"a":{"dtype":"U32","shape":[6],"data_offsets":[0,24]}}\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00'

Handle to model file remains open after `safe_open` goes out of scope?

Hi, I want to read the metadata and tensors of a .safetensors file, update the metadata and write the file back (overwriting the original). I tried to do that with this code:

#!/usr/bin/env python

import sys
import os.path
import safetensors.torch

model_path = sys.argv[1]
if not model_path:
    print("Provide a model path.")
    exit(1)

if not os.path.splitext(model_path)[1] == '.safetensors':
    print("Must be .safetensors file.")
    exit(1)

tensors = {}
metadata = {}
with safetensors.torch.safe_open(model_path, framework="pt") as f:
  metadata = f.metadata()
  for k in f.keys():
    tensors[k] = f.get_tensor(k)

safetensors.torch.save_file(tensors, model_path, metadata)

But I get this error:

Traceback (most recent call last):
  File "C:\stable-diffusion-webui\extensions\sd-scripts\tset.py", line 24, in <module>
    safetensors.torch.save_file(tensors, model_path, metadata)
  File "C:\stable-diffusion-webui\extensions\sd-scripts\venv\lib\site-packages\safetensors\torch.py", line 71, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
Exception: Error while serializing: IoError(Os { code: 1224, kind: Uncategorized, message: "The requested operation cannot be performed on a file with a user-mapped section open." })

I expect that the handle is closed when safe_open goes out of scope. I'm not using the "GPU fast load" option set by an envvar either

Also if I wait before the program exits and try to delete the file Explorer won't let me, I checked with LockHunter and it says no process is using the file

So if I use safe_open on several model files in a program they all get locked until the program exits

I am on Windows 10, using safetensors 0.2.8 from pip

[bug] can't save CLIP model. `TypeError: can't multiply sequence by non-int of type 'float'`

environment

google colab
python = 3.7.15
transformers = 4.24.0
safetensors = 0.2.3
numpy = 1.21.6

I also got the same error in safetensors/convert space with my clip model.

code

from transformers import AutoModel

repo = "openai/clip-vit-base-patch32"
model = AutoModel.from_pretrained(repo)
model.save_pretrained("clip-vit-base-patch32", safe_serialization=True)
/usr/local/lib/python3.7/dist-packages/safetensors/torch.py:15 in <dictcomp>                     │
│                                                                                                  │
│    12 │   │   │   "shape": v.shape,                                                              │
│    13 │   │   │   "data": _tobytes(v, k),                                                        │
│    14 │   │   }                                                                                  │
│ ❱  15 │   │   for k, v in tensors.items()                                                        │
│    16 │   }                                                                                      │
│    17                                                                                            │
│    18                                                                                            │
│                                                                                                  │
│ ╭───────────────────── locals ──────────────────────╮                                            │
│ │ .0 = <dict_itemiterator object at 0x7f7ab0254290> │                                            │
│ │  k = 'logit_scale'                                │                                            │
│ │  v = tensor(4.6052)                               │                                            │
│ ╰───────────────────────────────────────────────────╯                                            │
│                                                                                                  │
│ /usr/local/lib/python3.7/dist-packages/safetensors/torch.py:119 in _tobytes                      │
│                                                                                                  │
│   116ptr = tensor.data_ptr()                                                                │
│   117newptr = ctypes.cast(ptr, ctypes.POINTER(ctypes.c_ubyte))                              │
│   118 │                                                                                          │
│ ❱ 119data = np.ctypeslib.as_array(newptr, (total_bytes,))  # no internal copy               │120 │                                                                                          │
│   121return data.tobytes()                                                                  │
│   122                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ bytes_per_item = 4                                                                           │ │
│ │         ctypes = <module 'ctypes' from '/usr/lib/python3.7/ctypes/__init__.py'>              │ │
│ │         length = 1.0                                                                         │ │
│ │           name = 'logit_scale'                                                               │ │
│ │         newptr = <safetensors.torch.LP_c_ubyte object at 0x7f7ab0153c20>                     │ │
│ │             np = <module 'numpy' from                                                        │ │
│ │                  '/usr/local/lib/python3.7/dist-packages/numpy/__init__.py'>                 │ │
│ │            ptr = 200870656                                                                   │ │
│ │         tensor = tensor(4.6052)                                                              │ │
│ │    total_bytes = 4.0                                                                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /usr/local/lib/python3.7/dist-packages/numpy/ctypeslib.py:515 in as_array                        │
│                                                                                                  │
│   512 │   │   │   │   raise TypeError(                                                           │
│   513 │   │   │   │   │   'as_array() requires a shape argument when called on a '               │
│   514 │   │   │   │   │   'pointer')                                                             │
│ ❱ 515 │   │   │   p_arr_type = ctypes.POINTER(_ctype_ndarray(obj._type_, shape))                 │
│   516 │   │   │   obj = ctypes.cast(obj, p_arr_type).contents                                    │
│   517 │   │                                                                                      │
│   518 │   │   return asarray(obj)                                                                │
│                                                                                                  │
│ ╭──────────────────────────── locals ─────────────────────────────╮                              │
│ │   obj = <safetensors.torch.LP_c_ubyte object at 0x7f7ab0153c20> │                              │
│ │ shape = (4.0,)                                                  │                              │
│ ╰─────────────────────────────────────────────────────────────────╯                              │
│                                                                                                  │
│ /usr/local/lib/python3.7/dist-packages/numpy/ctypeslib.py:348 in _ctype_ndarray                  │
│                                                                                                  │
│   345def _ctype_ndarray(element_type, shape):                                               │
│   346 │   │   """ Create an ndarray of the given element type and shape """                      │
│   347 │   │   for dim in shape[::-1]:                                                            │
│ ❱ 348 │   │   │   element_type = dim * element_type                                              │
│   349 │   │   │   # prevent the type name include np.ctypeslib                                   │350 │   │   │   element_type.__module__ = None                                                 │
│   351 │   │   return element_type                                                                │
│                                                                                                  │
│ ╭──────────────── locals ─────────────────╮                                                      │
│ │          dim = 4.0                      │                                                      │
│ │ element_type = <class 'ctypes.c_ubyte'> │                                                      │
│ │        shape = (4.0,)                   │                                                      │
│ ╰─────────────────────────────────────────╯                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: can't multiply sequence by non-int of type 'float'

colab

Loading tensors directly on GPU is slower than loading on CPU and then moving

When running the following flie:

from safetensors.torch import load_file as safe_load_file
import time
import sys

direct_on_gpu = bool(int(sys.argv[1]))

if direct_on_gpu:
    start_time = time.time()
    # !wget https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/unet/diffusion_pytorch_model.safetensors
    checkpoint = safe_load_file("/home/patrick_huggingface_co/stable-diffusion-v1-4/unet/diffusion_pytorch_model.safetensors", device=0)
    print("Directly on GPU", time.time() - start_time)
else:
    start_time = time.time()
    checkpoint = safe_load_file("/home/patrick_huggingface_co/stable-diffusion-v1-4/unet/diffusion_pytorch_model.safetensors")
    checkpoint = {k: v.to("cuda:0") for k, v in checkpoint.items()}
    print("On CPU", time.time() - start_time)

with:

1.)

python safetensors_bench.py 0

and:

safetensors_bench.py 1

I get the following results which are quite surprising to me:

On CPU 3.1686861515045166
Directly on GPU 3.930008888244629 

My versions are:

'1.13.1+cu116'
'0.2.8'

Tests done on a V100

What's the expected usage of `metadata` when saving a Dict with tensors?

Hi again @Narsil!

So after playing around for quite a bit with safetensors I wanted to ask you what's the intended purpose of storing the metadata during save (assuming it's just informative data), as it's not loaded back when calling load, so that's lost unless you manually check the file as that's readable-text.

Thanks for your work and help!

Compatibility with `torch.save()`?

Very cool project!

I was wondering if there's possibility to use this package as the "backend" for torch.save to enable easier integration with downstream projects that are already using torch.save/load...

seems like it might be possible since torch.save and torch.load have a pickle_module= argument. Something like:

import safetensors
torch.save(my_tensor, "my_tensor.pt", pickle_module=safetensors)   # or safetensors.torch

Ability to remotely parse metadata over small HTTP requests

In this branch: https://github.com/huggingface/safetensors/compare/julien-c/js I pushed a proof-of-concept of how, given the simplicity of the format, one can fetch metadata about the weights over small (Range) HTTP requests.

The code is JS (can run in browsers or in Node) but it would be similar in any language.

Here's an example on how to fetch the header in a single file for instance:

async function parseSingleFile(url: URL): Promise<FileHeader> {
	const bufLengthOfHeaderLE = await (
		await fetch(url, {
			headers: {
				Range: "bytes=0-7",
			},
		})
	).arrayBuffer();
	const lengthOfHeader = new DataView(bufLengthOfHeaderLE).getBigUint64(
		0,
		true
	);
	/// ^little-endian
	const header: FileHeader = await (
		await fetch(url, {
			headers: {
				Range: `bytes=8-${7 + Number(lengthOfHeader)}`,
			},
		})
	).json();
	/// no validation for now, we assume it's a valid FileHeader.
	return header;
}

where a FileHeader type is defined as:

type TensorName = string;
type Dtype =
	| "F64"
	| "F32"
	| "F16"
	| "I64"
	| "I32"
	| "I16"
	| "I8"
	| "U8"
	| "BOOL";

interface TensorInfo {
	dtype: Dtype;
	shape: number[];
	data_offsets: [number, number];
}

type FileHeader = Record<TensorName, TensorInfo> & {
	__metadata__: Record<string, string>;
};

Results

As a fun first experiment, I compute the number of params per dtype for all models currently with a safetensors version on the HuggingFace Hub.

Here's the results:

model safetensors params
gpt2 single-file { 'F32' => 137022720 }
roberta-base single-file { 'F32' => 124697433, 'I64' => 514 }
Jean-Baptiste/camembert-ner single-file { 'F32' => 110035205, 'I64' => 514 }
roberta-large single-file { 'F32' => 355412057, 'I64' => 514 }
bigscience/bloom-560m single-file { 'F16' => 559214592 }
hf-internal-testing/tiny-random-bert-safetensors single-file { 'F32' => 127463, 'I64' => 512 }
hf-internal-testing/tiny-random-bert-sharded-safetensors index-file { 'F32' => 87929, 'I64' => 512 }
Narsil/small3 index-file { 'F32' => 59159, 'I64' => 512 }
Narsil/small2 single-file { 'F32' => 59159, 'I64' => 512 }
hf-internal-testing/tiny-random-bert-safetensors-tf single-file { 'F32' => 87929 }

Thought it'd be fun to share! cc @mishig25 @osanseviero too

Support for model streaming (disk -> VRAM)?

Would it be possible (if it isn't already) for this format to support streaming from disk? Having to load the entire model into RAM, only to pass it off to the GPU, causes tools that use it to require considerable amounts of RAM, but only for small windows of time.

In memory-tight situations (e.g. VMs) being able to stream from disk would reduce model load time (due to avoiding hitting swap) and out-of-memory errors when multiple model changes are done (less opportunity for memory leaks).

Converting back to pickle (PyTorch)

I know, security and going backwards and all, but I'm looking into this to offer support to SD clients that don't yet support safetensors.

My plan is to convert SD models into both formats so that users can use which works for them. So PyTorch pickles would be converted to safetensors and vis-versa. Both files would be available for users.

Is this conversion currently possible?
I scanned the existing issues and since I'm not very versed in python I couldn't tell at first glance if this had already been answered.

Can't convert in colab anymore

Hello again.

I've been using this colab you provided to a user a while ago to convert my models online:

https://colab.research.google.com/drive/1YYzfYZEJTb3dAo9BX6w6eZINIuRsNv6l#scrollTo=Tq4DxOmTwr2M

But lately i'm not able to do it anymore. For example, changing the wget to this model: https://huggingface.co/darkstorm2150/Protogen_v2.2_Official_Release/resolve/main/Protogen_V2.2.ckpt i get this error:


KeyError Traceback (most recent call last)

in
1 import torch
----> 2 weights = torch.load("/content/Protogen_V2.2.ckpt")["state_dict"]

KeyError: 'state_dict'

If i remove the state_dict it seems to work fine. I don't know what it does tho. Should I remove it? Thank you

Torch SD-based models tensor invalid for input size

There might be a slight discrepancy between the loading and saving process in safetensors. When loading a SD-based model like sd-1.4 packaged into a PyTorch checkpoint, we'll call it sd-v1-4.ckpt. We can package its state_dict and discard the torch format.

Packaging as safe_tensors

    sf_filename = "sd-v1-4.safetensors"
    filename = "sd-v1-4.ckpt"

    loaded = torch.load(filename)
    loaded = loaded['state_dict']

    # appears to pop nothing in this case
    shared = shared_pointers(loaded)
    for shared_weights in shared:
        for name in shared_weights[1:]:
            loaded.pop(name)

    loaded = {k: v.contiguous() for k, v in loaded.items()}

    save_file(loaded, metadata={"format": "pt"})

    check_file_size(local, filename)

Loading the tensors

load_file('sd-v1-4.safetensors', device='cpu')

Results in error:

File "venv\lib\site-packages\safetensors\torch.py", line 99, in load_file
result[k] = f.get_tensor(k)
RuntimeError: shape '[1280, 1280, 3, 3]' is invalid for input of size 7290352

Expected behaviour: safetensors fails while trying to save unexpected tensor data or creates tensors which can be loaded
Affected version: safetensors=2.4.0, torch=1.12.1+cu113

ckpt size:
3.97 GB 4,265,381,888 bytes (4,265,380,512 bytes)
safetensor size:
3.97 GB 4,265,148,416 bytes (4,265,146,304 bytes)
SHA fe4efff1e174c627256e44ec2991ba279b3816e364b49f9be2abc0b3ff3f8556

Using pruned version of CompVis/stable-diffusion-v-1-4-original

Apologies if this is already fixed with addition of more dtypes. Will try to get more info by running through check output and debug info of this specific tensor

Zero copy is not used when torch memory is sparse and/or has not been garbage collected

Used

os.environ['SAFETENSORS_FAST_GPU'] = '1'

I observed on the webui that despite setting device='cuda' with this flag and using safetensors.torch.load_file, it was taking almost 45 seconds to load a 4GB .safetensors file. However, when trying to replicate it in a separate program but using the same libraries, the model loads fast and only takes a few seconds. These can be represented by these curves, each a 60s time chunk:

wrong

Long load time observed, CPU fallback

right2

The program is executed at the red line.

It appears that due to some pollution of memory, webui always falls back to loading from CPU by default. This pollution appears to persist, and if you terminate the webui and then run a separate program which uses safetensors, it also falls back to loading using slow CPU copy.

Steps to replicate:

  1. Load a program using loading various large torch files to cpu (regular torch files with torch.load), and loading a large file to GPU (safetensors).
  2. Terminate it after loading the torch CPU files and GPU for a few seconds.
  3. Within 10 seconds, try to launch an external program using safetensors to GPU with the fast gpu flag. The program resorts to slow copy via CPU.
    *Does not replicate if you interrupt the second external program and try again.

interrupted

Cancel to slow copy after pollution

safetensors tested: 0.2.5

Next steps:

  • print how much memory cuda sees as available at runtime

  • disable windows 10 Hardware-accelerated GPU scheduling

  • figure out how the webui pollutes space (I think it loads a few models with torch to CPU, 5GB to memory)

Flaw in the format design: lack of signature and version information

This format has a flaw, it lacks a signature, using which one can detect

  • that a file is a safetensors collection even if it has wrong filename extension
  • detect safetensors collections embedded into other formats without knowing their structure.
  • identify format version

We can fix it in a backward-compatible way. We can state that the JSON of the header must start with exactly the following

{
	"__metadata__": {
		"safetensors": [

(lf line endings)

As metadata is by default arbitrary, parsing such tensors won't cause failure on the legacy implementations. Newer implementations must serialize format version into __metadata__.safetensors as an array of 3 semver integers.

Signature scanners can scan for exactly this prefix, then go back by 8 bytes, and try to parse the format. Libraries MUST ensure that the prefix is present with all the whitespaces. If the JSON-serializing lib doesn't emit it, it is easy to fix: just cut the prefix emitted by the lib and then replace with our static prefix with the whitespaces.

For compatibility, when parsing libs MUST NOT fail if the prefix is not exactly this, but can emit a warning.

Get rid of pyO3, use `ctypes` + C FFI

Getting rid of pyO3 and using ctypes instead will allow to

  • get rid of the requirement of having and executing Rust compiler in order to build a wheel;
  • reduce bloat by reusing the shared library

Don't panic, use `NotImplementedError`

  1. it is semantically incorrect
  2. A dirty workaround has to be used, since pyo3_runtime.PanicException cannot be imported since pyo3_runtime is not a real module.
def getPanicExceptionType():
    try:
        safetensors.safetensors_rust.serialize({"tensor_name": {"dtype": "test", "shape": [0, 0], "data": b""}}, None)
    except BaseException as ex:
        return ex.__class__

PanicException = getPanicExceptionType()

Question about parallelism/rayon

Questions regarding parallelism:

  1. If I'm not mistaken, both tensor serialization & deserialization operations should be parallelizeable. Is this assumption correct? For example, I was thinking this line below can be replaced with par_extend
    buffer.extend(tensor.data);
  2. If point 1 is valid, does it make sense to do so? Or is it not worth it (i.e. speed up might not be great OR we don't want to introduce dependency on rayon)?

Thanks so much! 🤗
cc: @Narsil

Prebuilt package for m1/apple silicon

Is there a way to install this on M1/arm64 mac without installing the rust compiler?

Pip error
Collecting safetensors Using cached safetensors-0.2.7.tar.gz (28 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: safetensors Building wheel for safetensors (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for safetensors (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [24 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-11.0-arm64-cpython-38
creating build/lib.macosx-11.0-arm64-cpython-38/safetensors
copying py_src/safetensors/init.py -> build/lib.macosx-11.0-arm64-cpython-38/safetensors
copying py_src/safetensors/numpy.py -> build/lib.macosx-11.0-arm64-cpython-38/safetensors
copying py_src/safetensors/tensorflow.py -> build/lib.macosx-11.0-arm64-cpython-38/safetensors
copying py_src/safetensors/torch.py -> build/lib.macosx-11.0-arm64-cpython-38/safetensors
copying py_src/safetensors/flax.py -> build/lib.macosx-11.0-arm64-cpython-38/safetensors
running build_ext
running build_rust
error: can't find Rust compiler

Conda error
PackagesNotFoundError: The following packages are not available from current channels:
  • safetensors

Current channels:

Serialization in Safetensors implementation

I think it's a bit weird to use SafeTensors::deserialize(buffer) to deserialize but for serialize it's safetensors::tensor::serialize(data, data_info). Maybe the serialization can move in SafeTensors implementation, what do you think ?

Good use case for Apache Arrow?

Hello! I love the motivation behind saftensors, and was wondering if y'all had thought about using Apache Arrow for the byte-buffer part of things. It would bring a lot of cross-platform/architecture compatibility in, works great with very large amounts of data, and has some standards for storing multidimensional tensors (both dense and sparse): https://arrow.apache.org/docs/format/Other.html

Anyway, I'm sure it's already been suggested, but just in case, I wanted to bring it to your attention!

Explicit automatic alignment of header

My use case might be overly specific, but when writing/using vectorised code on mmap'd safetensors file header sometimes causes everything to have an odd-numbered pointer which breaks even 16 bit vectorisation. Is there a possibility that python bindings will get an option to save with an explicit alignment? Just padding the header should be enough for most use cases.

It rejects to serialize `uint8`

safetensors.numpy.save({"8": np.array(range(-3, 4), dtype=np.uint8)})

ValueError: Safetensor format only accepts little endian

Question about security of safetensors

I realize I might have created a post in the wrong section, so I'll redo it here:

Hi, coming from a non-IT field I would like to ask the owner of the code if they could spend a couple words talking about the security of this new format.

In simple words, what does it mean that this format is safe?

Does it mean I can convert any malicious .ckpt model that I find and once it is .safetensors it can't infect my machine anymore? Or is this format only protecting against some kinds of threats? if so, which ones? How concerned should I be running randomg converted .ckpts found over the internet once they are made .safetensors?

Support complex JSON metadata

Currently, metadata is limited to a string->string mapping. It would be much more powerful if metadata was able an arbitrary json value. This would allow, for instance, integers and nested dictionaries in metadata. This could be safely implemented by having the metadata field's type in rust be a serde_json::Value (to be clear I'm talking about the metadata field of the Metadata struct, not the struct itself). I'm unsure how the python bindings would work exactly, but likely it'd be serialized as a string on the python side and then temporarily deserialized on the rust side.

This wouldn't present a DoS vector because there's already a metadata size limit and serde_json has a deserialization depth limit to avoid a stack overflow. The alternative is letting the user implement a more advanced metadata format themselves by e.g. serializing their data to json and then storing that as a string in metadata, which seems more error prone than providing support in safetensors directly.

I'm looking at using this to store training metadata of stable diffusion textual inversion embeddings while supporting the safetensors format (see AUTOMATIC1111/stable-diffusion-webui#6625). I'd be happy to implement this but I wanted to make sure such a PR would be accepted first.

Running into "Some tensors share memory" `RuntimeError` when converting finetuned GPT2

First of all, thanks a bunch for all your work on safetensors @Narsil (and HuggingFace for opensourcing it)!

I was keen to try it out in a very straightforward manner with a finetuned GPT2 model with the following script:

from transformers import GPT2LMHeadModel, GPT2Tokenizer
gpt2_model = GPT2LMHeadModel.from_pretrained("./model")
gpt2_model.save_pretrained('./model-safetensor', safe_serialization=True)

Sadly, this resulted in the following RuntimeError:

  File "/Users/mrshu/Library/Caches/pypoetry/virtualenvs/gpt2-experiments-4HuXp8fp-py3.8/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1619, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/Users/mrshu/Library/Caches/pypoetry/virtualenvs/gpt2-experiments-4HuXp8fp-py3.8/lib/python3.8/site-packages/safetensors/torch.py", line 70, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/Users/mrshu/Library/Caches/pypoetry/virtualenvs/gpt2-experiments-4HuXp8fp-py3.8/lib/python3.8/site-packages/safetensors/torch.py", line 220, in _flatten
    raise RuntimeError(
RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'lm_head.weight', 'transformer.wte.weight'}]

This was rather strange for me to run into, as my naive reading of the Docs (https://huggingface.co/docs/safetensors/speed) was that gpt2 should be fairly straightforward to convert.

Can you see anything I might have done wrong? Could it be due to the fact that the model was originally serialized on an older version of transformers? I would appreciate any thoughts and suggestions you could spare.

Thanks!


As the listing above suggests, this was tried on Python 3.8, with the following packages installed:

certifi==2022.9.24
charset-normalizer==2.1.1
filelock==3.8.0
huggingface-hub==0.11.0
idna==3.4
numpy==1.23.5
packaging==21.3
Pillow==9.3.0
pyparsing==3.0.9
PyYAML==6.0
regex==2022.10.31
requests==2.28.1
safetensors==0.2.4
tokenizers==0.13.2
torch==1.13.0
torchaudio==0.13.0
torchvision==0.14.0
tqdm==4.64.1
transformers==4.24.0
typing_extensions==4.4.0
urllib3==1.26.12

If there is any other piece of information I could provide that would help debug this issue, please let me know.

Some clarification about PyTorch format

The table in the README.md is not entirely correct. PyTorch's default format is not just pickle, but a Zip file with raw tensor data and small-ish pickle file that just refers to the tensor data. It's been the case since version 1.6, see https://github.com/pytorch/pytorch/blob/master/torch/serialization.py#L405 and _use_new_zipfile_serialization.

This means that this format:

  • can be made "almost" zero-copy as the tensor data can be mmapped and transferred to the desired location one by one. We could consider "delayed" reading
  • lazy loading can be implemented with Tensor Subclasses that just retain pointer to the original file and load it on-demand on any operation. They can still provide accessible metadata / device similarly to fake tensors.
  • layout control is fine as the tensor shapes are recorded in a small pickle file. The ZIP file structure allows for random access from the director and PyTorch even takes care of aligning individual files on page boundary to make mmaping easier

The added benefit of the ZIP file is that it's ubiquitous - can be easily inspected (or modified!) with regular tools.

So the only outstanding problem is safe pickle implementation. Luckily we have one already implemented in torch.jit, it just needs a minor polish.

Here's a colab demonstrating internals of the format: https://colab.research.google.com/drive/1DUoqwbmBTiDsTUggZsBJL8UIKv_AjKSa?usp=sharing

cc @Narsil @sgugger

AttributeError: module 'safetensors' has no attribute 'torch'

Hey. Firstly, big thanks for all your amazing work on this! And for the PRs to diffusers. This is going to be so awesome for models deployed to a serverless GPU environment and I really can't wait to try it.

Should torch support work in the latest released version (0.2.5)?

I feel like I must be doing something stupid here, and admit that I'm very new to Python, but I'm trying to get this working in diffusers and get this error regardless of whether I install via conda (with python 3.9) or pip (with python 3.10).

Here's from diffusers:

pipeline.save_pretrained(args.output_dir, safe_serialization=True)

Traceback (most recent call last):
  File "handle_request", line 81, in handle_request
    FutureStatic,
  File "/api/server.py", line 36, in inference
    output = user_src.inference(model_inputs)
  File "/api/app.py", line 281, in inference
    result = TrainDreamBooth(model_id, pipeline, model_inputs, call_inputs)
  File "/api/train_dreambooth.py", line 120, in TrainDreamBooth
    result = main(args, pipeline)
  File "/api/train_dreambooth.py", line 712, in main
    pipeline.save_pretrained(args.output_dir, safe_serialization=True)
  File "/api/diffusers/src/diffusers/pipeline_utils.py", line 248, in save_pretrained
    save_method(
  File "/api/diffusers/src/diffusers/modeling_utils.py", line 224, in save_pretrained
    save_function = safetensors.torch.save_file if safe_serialization else torch.save
AttributeError: module 'safetensors' has no attribute 'torch'

and here's from the python interpreter:

$ python
Python 3.9.15 (main, Nov 24 2022, 14:31:59) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import safetensors

>>> print(safetensors)
<module 'safetensors' from '/opt/conda/envs/xformers/lib/python3.9/site-packages/safetensors/__init__.py'>

>>> safetensors.torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'safetensors' has no attribute 'torch'

>>> safetensors.__dict__
{'__name__': 'safetensors', '__doc__': None, '__package__': 'safetensors', '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7ff358dccac0>, '__spec__': ModuleSpec(name='safetensors', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7ff358dccac0>, origin='/opt/conda/envs/xformers/lib/python3.9/site-packages/safetensors/__init__.py', submodule_search_locations=['/opt/conda/envs/xformers/lib/python3.9/site-packages/safetensors']), '__path__': ['/opt/conda/envs/xformers/lib/python3.9/site-packages/safetensors'], '__file__': '/opt/conda/envs/xformers/lib/python3.9/site-packages/safetensors/__init__.py', '__cached__': '/opt/conda/envs/xformers/lib/python3.9/site-packages/safetensors/__pycache__/__init__.cpython-39.pyc', '__builtins__': {'__name__': 'builtins', '__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.", '__package__': '', '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>, origin='built-in'), '__build_class__': <built-in function __build_class__>, '__import__': <built-in function __import__>, 'abs': <built-in function abs>, 'all': <built-in function all>, 'any': <built-in function any>, 'ascii': <built-in function ascii>, 'bin': <built-in function bin>, 'breakpoint': <built-in function breakpoint>, 'callable': <built-in function callable>, 'chr': <built-in function chr>, 'compile': <built-in function compile>, 'delattr': <built-in function delattr>, 'dir': <built-in function dir>, 'divmod': <built-in function divmod>, 'eval': <built-in function eval>, 'exec': <built-in function exec>, 'format': <built-in function format>, 'getattr': <built-in function getattr>, 'globals': <built-in function globals>, 'hasattr': <built-in function hasattr>, 'hash': <built-in function hash>, 'hex': <built-in function hex>, 'id': <built-in function id>, 'input': <built-in function input>, 'isinstance': <built-in function isinstance>, 'issubclass': <built-in function issubclass>, 'iter': <built-in function iter>, 'len': <built-in function len>, 'locals': <built-in function locals>, 'max': <built-in function max>, 'min': <built-in function min>, 'next': <built-in function next>, 'oct': <built-in function oct>, 'ord': <built-in function ord>, 'pow': <built-in function pow>, 'print': <built-in function print>, 'repr': <built-in function repr>, 'round': <built-in function round>, 'setattr': <built-in function setattr>, 'sorted': <built-in function sorted>, 'sum': <built-in function sum>, 'vars': <built-in function vars>, 'None': None, 'Ellipsis': Ellipsis, 'NotImplemented': NotImplemented, 'False': False, 'True': True, 'bool': <class 'bool'>, 'memoryview': <class 'memoryview'>, 'bytearray': <class 'bytearray'>, 'bytes': <class 'bytes'>, 'classmethod': <class 'classmethod'>, 'complex': <class 'complex'>, 'dict': <class 'dict'>, 'enumerate': <class 'enumerate'>, 'filter': <class 'filter'>, 'float': <class 'float'>, 'frozenset': <class 'frozenset'>, 'property': <class 'property'>, 'int': <class 'int'>, 'list': <class 'list'>, 'map': <class 'map'>, 'object': <class 'object'>, 'range': <class 'range'>, 'reversed': <class 'reversed'>, 'set': <class 'set'>, 'slice': <class 'slice'>, 'staticmethod': <class 'staticmethod'>, 'str': <class 'str'>, 'super': <class 'super'>, 'tuple': <class 'tuple'>, 'type': <class 'type'>, 'zip': <class 'zip'>, '__debug__': True, 'BaseException': <class 'BaseException'>, 'Exception': <class 'Exception'>, 'TypeError': <class 'TypeError'>, 'StopAsyncIteration': <class 'StopAsyncIteration'>, 'StopIteration': <class 'StopIteration'>, 'GeneratorExit': <class 'GeneratorExit'>, 'SystemExit': <class 'SystemExit'>, 'KeyboardInterrupt': <class 'KeyboardInterrupt'>, 'ImportError': <class 'ImportError'>, 'ModuleNotFoundError': <class 'ModuleNotFoundError'>, 'OSError': <class 'OSError'>, 'EnvironmentError': <class 'OSError'>, 'IOError': <class 'OSError'>, 'EOFError': <class 'EOFError'>, 'RuntimeError': <class 'RuntimeError'>, 'RecursionError': <class 'RecursionError'>, 'NotImplementedError': <class 'NotImplementedError'>, 'NameError': <class 'NameError'>, 'UnboundLocalError': <class 'UnboundLocalError'>, 'AttributeError': <class 'AttributeError'>, 'SyntaxError': <class 'SyntaxError'>, 'IndentationError': <class 'IndentationError'>, 'TabError': <class 'TabError'>, 'LookupError': <class 'LookupError'>, 'IndexError': <class 'IndexError'>, 'KeyError': <class 'KeyError'>, 'ValueError': <class 'ValueError'>, 'UnicodeError': <class 'UnicodeError'>, 'UnicodeEncodeError': <class 'UnicodeEncodeError'>, 'UnicodeDecodeError': <class 'UnicodeDecodeError'>, 'UnicodeTranslateError': <class 'UnicodeTranslateError'>, 'AssertionError': <class 'AssertionError'>, 'ArithmeticError': <class 'ArithmeticError'>, 'FloatingPointError': <class 'FloatingPointError'>, 'OverflowError': <class 'OverflowError'>, 'ZeroDivisionError': <class 'ZeroDivisionError'>, 'SystemError': <class 'SystemError'>, 'ReferenceError': <class 'ReferenceError'>, 'MemoryError': <class 'MemoryError'>, 'BufferError': <class 'BufferError'>, 'Warning': <class 'Warning'>, 'UserWarning': <class 'UserWarning'>, 'DeprecationWarning': <class 'DeprecationWarning'>, 'PendingDeprecationWarning': <class 'PendingDeprecationWarning'>, 'SyntaxWarning': <class 'SyntaxWarning'>, 'RuntimeWarning': <class 'RuntimeWarning'>, 'FutureWarning': <class 'FutureWarning'>, 'ImportWarning': <class 'ImportWarning'>, 'UnicodeWarning': <class 'UnicodeWarning'>, 'BytesWarning': <class 'BytesWarning'>, 'ResourceWarning': <class 'ResourceWarning'>, 'ConnectionError': <class 'ConnectionError'>, 'BlockingIOError': <class 'BlockingIOError'>, 'BrokenPipeError': <class 'BrokenPipeError'>, 'ChildProcessError': <class 'ChildProcessError'>, 'ConnectionAbortedError': <class 'ConnectionAbortedError'>, 'ConnectionRefusedError': <class 'ConnectionRefusedError'>, 'ConnectionResetError': <class 'ConnectionResetError'>, 'FileExistsError': <class 'FileExistsError'>, 'FileNotFoundError': <class 'FileNotFoundError'>, 'IsADirectoryError': <class 'IsADirectoryError'>, 'NotADirectoryError': <class 'NotADirectoryError'>, 'InterruptedError': <class 'InterruptedError'>, 'PermissionError': <class 'PermissionError'>, 'ProcessLookupError': <class 'ProcessLookupError'>, 'TimeoutError': <class 'TimeoutError'>, 'open': <built-in function open>, 'quit': Use quit() or Ctrl-D (i.e. EOF) to exit, 'exit': Use exit() or Ctrl-D (i.e. EOF) to exit, 'copyright': Copyright (c) 2001-2022 Python Software Foundation.
All Rights Reserved.

Copyright (c) 2000 BeOpen.com.
All Rights Reserved.

Copyright (c) 1995-2001 Corporation for National Research Initiatives.
All Rights Reserved.

Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved., 'credits':     Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
    for supporting Python development.  See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object., '_': None}, '__version__': '0.2.5', 'safetensors_rust': <module 'safetensors.safetensors_rust' from '/opt/conda/envs/xformers/lib/python3.9/site-packages/safetensors/safetensors_rust.cpython-39-x86_64-linux-gnu.so'>, 'safe_open': <class 'builtins.safe_open'>}

hi, may ask differences between safetensors vs flatbuffer?

Since it looks like both of them are data store library. But flatbuffer not only can store Tensor, but also structures && many customized data types. Does safetensors also able to?

Furthermore, does safetensor able to using cross platform with a very tiny dependency lib? such as use it on Android or iOS

Can we improve `FileNotFoundError` when trying to read a file?

E                   File ".../save.py", line 459, in save
E                     with safetensors.safe_open(path, framework="pt", device=str(params.device)) as fi:
E                 FileNotFoundError: No such file or directory (os error 2)

Can we add some message errors to the FileNotFoundError. Typically knowing the path would help whenever we compute pathing dynamically.

Repo Description Typo

The description of this repo is currently "SImple, safe way to store and distribute tensors" but should probably be "Simple, safe way to store and distribute tensors" instead unless this went over my head.

Excited to follow this project!

Instruction on how to run python tests

@Narsil, could you add a readme section on how to run python test cases?
For example, test_pt_comapirson.py

Should I follow https://pyo3.rs/v0.17.1/getting_started.html & use maturin?

When testing python bindings, how does it find safetensors. Also, as I update the rust impl of safetensors, how can I make sure that those updates are available to the python bindings tests?

from safetensors.torch import save_file, load_file, load, to_dtype

format comparison: numpy's NPY format

https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html

Comes in two flavors: .npy for a single ndarray, and .npz for mulitple arrays (and is really nothing more than a .zip archive of .npy entries).

  • Safe: 🗸
  • Zero-copy: 🗸
  • Lazy loading: 🗸
  • No file size limit: 🗸
  • Layout control: ?
  • Flexibility: ?

Caveats: npy may contain pickles if the array is an array of Python objects. The loader does have an allow_pickle flag which is false by default as of numpy 1.16.3.

The original proposal is not the most current documentation of the format, but is useful to review for their thoughts on requirements and alternatives.

At a glance, it looks very similar to this format. The differences I see so far are:

  • No TENSOR_NAME key. .npy is a single tensor; in an .npz, the name is the name of the .zip entry.
  • Since .npy contains only one tensor, there are no offsets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.