You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand why F16 is required for linear and slerp, but can we do passthrough of quantized layer, as currently it necessary to go via huge models and requantize, which is a big pain point.
The text was updated successfully, but these errors were encountered:
It would be possible to write a standalone script to do this. I don't think working with quantized GGUF models in mergekit-yaml makes much sense, as there are very few operations that it would be able to actually support. Just stacking layers would be reasonable though. I'll add this to my list of things to investigate when I have the time. Thanks for the suggestion!
Is there a way to do this?
I understand why F16 is required for linear and slerp, but can we do passthrough of quantized layer, as currently it necessary to go via huge models and requantize, which is a big pain point.
The text was updated successfully, but these errors were encountered: