Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting for Ternary DiT #470

Open
Lucky-Lance opened this issue Nov 20, 2024 · 16 comments
Open

Supporting for Ternary DiT #470

Lucky-Lance opened this issue Nov 20, 2024 · 16 comments

Comments

@Lucky-Lance
Copy link

Hi,

Ternary quantization has become popular and has demonstrated computational speedups and power reductions, as demonstrated in works like llama.cpp and bitnet.cpp. We trained the first ternary DiT network, DiT is a popular structure nowadays for text to image generation. We would like to know if we can be assisted in realizing the deployment of it on stable-diffusion.cpp.

We asked llama.cpp for help and they advised me to come here for guidance link.

@stduhpf
Copy link
Contributor

stduhpf commented Nov 20, 2024

I think just updating the ggml submodule to a more recent version should be most of the work.

@Lucky-Lance
Copy link
Author

Thank you for your suggestion. Updating the ggml submodule to a more recent version sounds like a good starting point. However, I must admit that I have really limited experience with writing kernel codes😵.

@Green-Sky
Copy link
Contributor

Green-Sky commented Nov 20, 2024

We trained the first ternary DiT network

There has been one for a while that uses a categorical classifier. Do you mean embedding based?

Here: #331

Edit: oh, its you. hahah

@Lucky-Lance
Copy link
Author

😇👀

@Green-Sky
Copy link
Contributor

@stduhpf I will try to make a pr to update to latest, or newer ggml. We can then try to do some stuff based on that.

@Lucky-Lance Why did you user Lables and not Embedding(s) for the classifier? This makes its somewhat unusable for text-to-image.
I love your work however <3 .

Are there any plans to "distil" something like flux schnell, so training a new TerDiT on the outputs?
Or embedding based ... ?

@Lucky-Lance
Copy link
Author

Label-based generation was just an attempt I made previously. In fact, I've always wanted to work on a text-to-image model, but the actual deployment only resulted in reduced memory usage without improving inference speed. This has made me less confident about further pursuing text-to-image models. If I receive support, I would certainly train a text-to-image model afterwards.

Thanks a lot for your support 🤩🥳.

@Lucky-Lance
Copy link
Author

I noticed you're facing some problems while upgrading ggml. :( Just checking in to see if you're still planning to support it, and if so, can it be completed within one or two months..?

@Green-Sky
Copy link
Contributor

Well, it all depends on the individuals motivation and time, so no promises. 😅

That being said, after updating ggml, I did a test, where i quantize flux to tq1_0/tq2_0 (5w/byte and 4w/byte) and it runs. On cpu only. And produces noise. So it might or might not work.

I will probably continue updating ggml and adopting code changes to sd.cpp, before trying any architectural stuff.
Maybe @stduhpf wants to take a stab at it, while I do that?

@Green-Sky
Copy link
Contributor

This is what flux schnell with tq1_0/tq2_0 looks like:
flux-schnell_tq1
(both are identical, which is a good sign)

@Lucky-Lance
Copy link
Author

Oh, truly grateful for your efforts! 😆 Hoping everything goes smoothly.

@Green-Sky
Copy link
Contributor

Link to the "quantization" pr in llama.cpp that added tq1/2 ggerganov/llama.cpp#8151

@Green-Sky
Copy link
Contributor

Another thing, that I leave to the future is looking into ik's fork with better bitnet support https://github.com/ikawrakow/ik_llama.cpp

@Lucky-Lance
Copy link
Author

Hi, a month has slipped away, and I was wondering if the support is still part of the plan 😌

@stduhpf
Copy link
Contributor

stduhpf commented Dec 20, 2024

Ternary data types are now supported. Which means that in theory, any model with the same overall architecture as a supported model like SD3 or Flux, but trained in ternary, would work.
If the architecture is different, then more work is required.

@Green-Sky
Copy link
Contributor

Haven't had time to work on sd.cpp this month, sorry.

Yea the bitnets have extra normalization layers in places.
So the real pain point here is the lack of a, ideally already implemented, text embedding, instead of labels.

@Lucky-Lance
Copy link
Author

OK I will give it a try 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants