Stable Model Format #143

iacore · 2023-04-19T18:01:48Z

iacore
Apr 19, 2023

The model format space is quite wild. I think we should at least have a good enough model format.

Problems with safetensors

not aligned: buffer not aligned huggingface/safetensors#231
no q4_*
need JSON parser

KerfuffleV2 · 2023-04-19T22:39:33Z

KerfuffleV2
Apr 19, 2023

It's going to be hard to do that because people are frequently adding stuff like new quantization types, formats, etc.

So we could invent a new format that had the current q4_* and then tomorrow q4_3 (actually on the horizon) could pop into existence. The rwkv.cpp project also has q4_1_O.

not aligned:

I really don't like safetensors' model saving stuff*, but the format at least doesn't seem like it precludes aligning the data. The way tensors are defined is just [start_offset, end_offset] so if you write the file yourself, you can put them wherever you want including allowing blank spaces for padding.

It also allows a __metadata__ key thats just a HashMap<String,String>so, for example, you could just define your tensors asu8and then add additional details such as describing that it'sggml_q4_1` in that metadata section. Or even just use that to define an offset/length for additional metadata that could be in whatever format you want.

* One thing I really don't like about the way safetensors saves stuff is you basically have to load the entire thing into memory. Also, if you don't pass it a file it will additionally build a gigantic buffer before saving. I think that's just a limitation of how they implemented it, the file format is basically okay (kinda weird it has no magic or anything, so you can't really know you have a safetensors format file just by looking at the data).

need JSON parser

No way around that one, but is it too much of an issue? The allowed format for the JSON in a ST file is also fairly limited, so you could just write your own very simple JSON parser for ST files if you didn't want to pull in a big dependency like serde.

3 replies

mert-kurttutan May 7, 2023

Hi,
Just a comment about loading for safetensors. Lazy loading seems possible for safetensors by using Memmap.
An example of this can be seen smelte-rs crate. Also, lazy loading is also implemented in its python package.

Narsil May 15, 2023

Yup lazy loading is possible definitely.

For saving, it's indeed just lack of work on that front to be less demanding on RAM, but most models do fit in RAM which is why that hasn't been implemented yet

iacore May 15, 2023
Author

Write your own saving code! .safetensors is very simple to write/parse.

philpax · 2023-04-20T00:26:50Z

philpax
Apr 20, 2023
Maintainer

This would be nice to have, but I agree with Kerfuffle's concerns: it's non-trivial just due to the variability of the ecosystem. I'm not sure what the best path forward here is - my gut feeling is to wait and see what the ecosystem produces.

0 replies

philpax · 2023-05-01T21:03:56Z

philpax
May 1, 2023
Maintainer

Thinking about this more after noticing just how bad the GGML format fragmentation is. safetensors is probably the best bet, but it doesn't support Q4/etc and I don't think it's generally meant for single-file deployment.

If we were to design such a format, I'd want it to feature the following:

store data of any kind, and support additional formats in the future, just like GGML, likely with alignment guarantee
easy to read/write - preferably easier than GGML, you shouldn't need to have to parse the entire file in order to know which tensors are located
be somewhat self-describing/resilient to change - it should be possible to parse sections of the file without having to parse the entire file (e.g. being able to know what tensors are present without reading the hyperparameters)
include what architecture it is, so that the user doesn't have to specify it and you can't accidentally run it with the wrong code
maybe include a sample computation graph, but that would be pushing it

Are there any existing binary formats that meet that criteria / can encode that kind of data? The first thing that comes to mind is something like BSON with a schema.

3 replies

iacore May 2, 2023
Author

Let's extend current safetensors to store data. That means ABI for q4_* types.

Currently ggml f32/f16 are stored by little endian (on amd64). Is that a concern?

ONNX has the computation graph, although contributing to it would be hard. It's a lot more complex.

philpax May 14, 2023
Maintainer

Back to thinking about this after the latest GGML format break.

safetensors seems like a reasonable next step, but from the discussion here they don't seem to be targeting the single-file deployment case where everything (aside from the computation graph) is provided for an inference engine to be able to run the model. (@Narsil feel free to correct me, that was my read of the conversation)

We could use the JSON to provide all of this information (hyperparameters, vocabulary, other configuration settings), but I think that would become an extension of the safetensors "spec" that would not necessarily be well-standardised and would be out of scope for what that JSON was meant to be used for.

If the single-file deployment use-case was solved, I'd be happy moving forward with ST (assuming the mmap / quant stuff is sorted, which it looks like it is).

iacore May 14, 2023
Author

You can store as hyperparams as metadata in JSON. It's well in spec.

iacore · 2023-05-02T11:02:14Z

iacore
May 2, 2023
Author

If we go with LLVM naming it would be [16]i4*w (q4_2) and [16]i4*w+b (q4_3).

0 replies

iacore · 2023-05-02T13:29:24Z

iacore
May 2, 2023
Author

I've made a script to write quantized weights to safetensors

https://github.com/iacore/model-conversions/tree/main/quantize-wizard

8 replies

LLukas22 May 21, 2023
Collaborator

@KerfuffleV2 Thanks, i'll definitely look into this in the future. 👍

Narsil May 22, 2023

they don't have to be contiguous or anything as far as I can see.

They do actually. Contiguous, without holes and accessing the entire file. (So as it cannot be made into polyglot file)

KerfuffleV2 May 22, 2023

They do actually. Contiguous, without holes and accessing the entire file. (So as it cannot be made into polyglot file)

Can you be more specific about why you think I'm wrong?

Just to be clear: I'm talking about the file format itself and not the crate. You can't do what I'm talking about with official crate (although I'm pretty sure it should be able to read the files).

Also, I wasn't talking about polyglot files. That wouldn't be possible because the header is in a fixed position and had to exist. I'm saying you can put other stuff after the header or between the tensor data.

edit: Also, just in case I wasn't specific enough when I said "contiguous": An individual tensor in the file does have to be contiguous. My claim is that you can put whatever other data in the file that you want, as long as it's after the header part and before/after the each individual tensor's data.

Narsil May 22, 2023

Also, I wasn't talking about polyglot files. That wouldn't be possible because the header is in a fixed position and had to exist. I'm saying you can put other stuff after the header or between the tensor data.

Zip files are access by the end of the file and they are not alone, a security audit managed to create polyglot files.

Just to be clear: I'm talking about the file format itself and not the crate. You can't do what I'm talking about with official crate (although I'm pretty sure it should be able to read the files).

You can do whatever you want with your own code. However the official crate will not read those files and yield an error instead.
Actually there's indeed a missing note for that part in the spec. I updated it.

KerfuffleV2 May 22, 2023

Zip files are access by the end of the file

Fair enough. It could work for that type of file, but anyway I wasn't talking about polyglot files originally.

However the official crate will not read those files and yield an error instead.

Why? The metadata in the header defines the position where the tensor data exists, length, etc. Why would it matter if there's something in between the tensors?

For example, suppose the metadata for the header written by the official crate looks like (pseudocode):

{
 [ { "name": "tensor_a", "offset": 60, "length": 10 },
   { "name": "tensor_b", "offset": 70, "length": 10 } ]
}

Why would it be a problem if I saved the file like:

{
 [ { "name": "tensor_a", "offset": 60, "length": 10 },
   { "name": "tensor_b", "offset": 99, "length": 10 } ]
}

and there was some other data in the space between where tensor_a ended and tensor_b starts?

edit:

It does look the the official crate will fail to handle those files based on: https://github.com/huggingface/safetensors/blob/a7969d4683f014ef422cede384da1d1c3b1585bf/safetensors/src/tensor.rs#L484

This behavior is not described in the format definition (even after your changes a little while ago). It also makes no sense to me to enforce an arbitrary restriction that only makes the format less flexible and has no actual benefit.

If what you're trying to do is ensure the start of a tensor is aligned (which is reasonable) then do that instead of blindly checking that there's no space in between the tensors and assuming the file was written by your exact implementation that does things a specific undocumented way.

I also don't think you can reasonably just make a bunch of amendments to the format definition without doing something like adding versioning and bumping the version. Who would be willing to trust a format if the standard can just suddenly change breaking existing code that was implemented to spec? In this case I also think there's no reason to even do that: what's the advantage to specifically forbidding this?

Hopefully this doesn't sound too harsh, but I honestly really, really hate arbitrary restrictions. I won't just complain about the problem though, if you'd be willing to accept a patch that "fixes" that behavior (if we can agree it should change) then I will look at implementing it and making a pull request to the safetensors repo.

iacore · 2023-05-22T18:29:19Z

iacore
May 22, 2023
Author

Hello people, I made a simple reader for safetensors

https://github.com/iacore/model-conversions/tree/main/safetensors

Maybe it can go in this repo?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable Model Format #143

{{title}}

Replies: 6 comments 14 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Stable Model Format #143

Replies: 6 comments · 14 replies

iacore May 15, 2023 Author

philpax Apr 20, 2023 Maintainer

philpax May 1, 2023 Maintainer

iacore May 2, 2023 Author

philpax May 14, 2023 Maintainer

iacore May 14, 2023 Author

iacore May 2, 2023 Author

iacore May 2, 2023 Author

LLukas22 May 21, 2023 Collaborator

iacore May 22, 2023 Author

Replies: 6 comments 14 replies

iacore May 15, 2023
Author

philpax
Apr 20, 2023
Maintainer

philpax
May 1, 2023
Maintainer

iacore May 2, 2023
Author

philpax May 14, 2023
Maintainer

iacore May 14, 2023
Author

iacore
May 2, 2023
Author

iacore
May 2, 2023
Author

LLukas22 May 21, 2023
Collaborator

iacore
May 22, 2023
Author