Replies: 5 comments 1 reply
-
Lossy image compression works by transforming to a representation that looks similar to the human eye, but a model layer compressed to jpeg would break. |
Beta Was this translation helpful? Give feedback.
-
There are lossless image encoding scheme. E.g. png has a lossless mode. However isn't llamafile relying on memory mapping these area? If compressed you would have to decompress it to a staging area like a ram? |
Beta Was this translation helpful? Give feedback.
-
the concept is the same, it all is just a list of values, be that brightness or weights. it just gets leveled to a certain value due to some algorithm, just as it happens with reducing precision through existing weight quantizations. but images have not just a lot of quantization methods but also a lot of methods to compress the values. yes, it would require to decompress on the fly, but that's a balance between the max size of a model you can fit into fast memory vs speed of compute you have. yes, you would have to spend compute on decompression when weights go to the chip. however, the important thing here is that modern GPUs do have hardware support for decompressing images, so it can happen very fast, allowing to keep in a fast memory MUCH bigger models at the loss of some precision (just as we have with quantization). or it may allow us to keep in a fast memory lossless compression of weights which will still allow loading of much bigger models without ANY loss of precision, that is in comparison to the uncompressed methods used nowadays. |
Beta Was this translation helpful? Give feedback.
-
Oh so experimenting with jpeg style lossy frequency based compression? Interesting. Well there is a bunch of assumptions we would need to investigate first, ergo is the position of weights in an X dimension array has a spatial relations that won't be significantly impacted by having stuff like 'high frequency' information being discarded. Certainly needs a proof of concept first that this won't break anything first. Has anyone tested the idea at least in the python ML / huggingface community (as python experimentation with pytorch is likely a bit easier than testing it in here) |
Beta Was this translation helpful? Give feedback.
-
i was thinking about format and realized that layers with weight ideally fit all the existing "images" libraries. in other words, there is a hell lot of algorithms for image compression, including quantizing up to N bits, as well as lossless ones.
couldn't we store model weights with image libs, like png or webp, etc and get good results? possibly, even accessing values without fully unpacking, reducing greatly memory footprint but paying with cpu for accessing values?
i feel that image compression algorithms should work well on models, keeping the whole "picture" of the model, while reducing its size. it's very interesting to see how it compares to QLORA, etc. and, perhaps, it might allow running much bigger models on smaller memory without having to write a lot of new code, as it's already there for images?
what do you think?
Beta Was this translation helpful? Give feedback.
All reactions