- plus, a Nerdy Transformer Shuffle node
- New (best) SAE-informed Long-CLIP model with 90% ImageNet/ObjectNet accuracy.
- Code is here, model is at my HF π€: https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14
- To clarify, only put this folder into
ComfyUI/custom_nodes
; if you cloned the entire repo, you'll need to move it.only this!
should be inComfyUI/custom_nodes
; you should have an__init__.py
in yourComfyUI/custom_nodes/ComfyUI-HunyuanVideo-Nyan
folder. If you see a README.md, that's wrong.
- The CLIP model doesn't seem to matter much? True for default Hunyuan Video, False with this node! β¨
- Simply put the
ComfyUI...
folder from this repo inComfyUI/custom_nodes
- See example workflow; it's really easy to use, though. Replaces the loader node.
- Recommended CLIP huggingface.co/zer0int/CLIP-SAE-ViT-L-14
- Takes 248 tokens, π @ 19/DEC/24 π€: https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14
- Requires kijai/ComfyUI-HunyuanVideoWrapper
β οΈ If something breaks because WIP: Temporarily fall back to my fork for compatibility- Uses HunyuanVideoWrapper -> loader node implementation. All credits to the original author!
- My code = only the 2 different 'Nyan nodes' in
hynyan.py
. - Loader is necessary as the mod changes model buffers; changes are cumulative if not re-loaded.
- You can choose to re-load from file - or from RAM deepcopy (faster, may require >64 GB RAM).
- Q: What does it do, this
Factor
for scaling CLIP & LLM? π€ - A: Here are some examples. Including a 'do NOT set BOTH the CLIP and LLM factors >1' example.
- Prompt:
high quality nature video of a red panda balancing on a bamboo stick while a bird lands on the panda's head, there's a waterfall in the background
- SAE: Bird at least flies (though takes off), better feet on panda (vs. OpenAI)
demo-default.mp4
- These are all my CLIP models from huggingface.co/zer0int; SAE is best.
- See details on legs; blurriness; coherence of small details.
demo-spider-cafe.mp4
π Long-CLIP @ 19/DEC/24: The original CLIP model has 77 tokens max input - but only ~20 tokens effective length. See the original Long-CLIP paper for details. HunyuanVideo demo:
- 69 tokens, normal scene:
- Lens: 16mm. Aperture: f/2.8. Color Grading: Blue-green monochrome. Lighting: Low-key with backlit silhouettes. Background: Gothic cathedral at night, stained glass windows breaking. Camera angle: Over the shoulder of a ninja, tracking her mid-air leap as she lands on a rooftop.
- 52 tokens, OOD (Out-of-Distribution) scene: Superior handling for consistency and prompt-following despite OOD concept.
- In this surreal nightmare documentary, a sizable spider with a human face is peacefully savoring her breakfast at a diner. The spider has a spider body, but a lady's face on the front, and regular human hands at the end of the spider legs.
demo-long-sae-short.mp4
- Q: And what does this confusing, gigantic node for nerds do? π€
- A: You can glitch the transformer (video model) by shuffling or skipping MLP and Attention layers: