Skip to content

Text Encoders finally matter πŸ€–πŸŽ₯ - scale CLIP & LLM influence! + a Nerdy Transformer Shuffle node

Notifications You must be signed in to change notification settings

zer0int/ComfyUI-HunyuanVideo-Nyan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ComfyUI-HunyuanVideo-Nyan

Text Encoders finally matter πŸ€–πŸŽ₯ - scale CLIP & LLM influence!

  • plus, a Nerdy Transformer Shuffle node

Changes 19/DEC/2024:


  • To clarify, only put this folder into ComfyUI/custom_nodes; if you cloned the entire repo, you'll need to move it. only this! should be in ComfyUI/custom_nodes; you should have an __init__.py in your ComfyUI/custom_nodes/ComfyUI-HunyuanVideo-Nyan folder. If you see a README.md, that's wrong.

clarify

use-node

  • Requires kijai/ComfyUI-HunyuanVideoWrapper
  • ⚠️ If something breaks because WIP: Temporarily fall back to my fork for compatibility
  • Uses HunyuanVideoWrapper -> loader node implementation. All credits to the original author!
  • My code = only the 2 different 'Nyan nodes' in hynyan.py.
  • Loader is necessary as the mod changes model buffers; changes are cumulative if not re-loaded.
  • You can choose to re-load from file - or from RAM deepcopy (faster, may require >64 GB RAM).

two-nodes

  • Q: What does it do, this Factor for scaling CLIP & LLM? πŸ€”
  • A: Here are some examples. Including a 'do NOT set BOTH the CLIP and LLM factors >1' example.
  • Prompt: high quality nature video of a red panda balancing on a bamboo stick while a bird lands on the panda's head, there's a waterfall in the background
  • SAE: Bird at least flies (though takes off), better feet on panda (vs. OpenAI)
demo-default.mp4
  • These are all my CLIP models from huggingface.co/zer0int; SAE is best.
  • See details on legs; blurriness; coherence of small details.
demo-spider-cafe.mp4

πŸ†• Long-CLIP @ 19/DEC/24: The original CLIP model has 77 tokens max input - but only ~20 tokens effective length. See the original Long-CLIP paper for details. HunyuanVideo demo:

  • 69 tokens, normal scene:
  1. Lens: 16mm. Aperture: f/2.8. Color Grading: Blue-green monochrome. Lighting: Low-key with backlit silhouettes. Background: Gothic cathedral at night, stained glass windows breaking. Camera angle: Over the shoulder of a ninja, tracking her mid-air leap as she lands on a rooftop.
  • 52 tokens, OOD (Out-of-Distribution) scene: Superior handling for consistency and prompt-following despite OOD concept.
  1. In this surreal nightmare documentary, a sizable spider with a human face is peacefully savoring her breakfast at a diner. The spider has a spider body, but a lady's face on the front, and regular human hands at the end of the spider legs.
demo-long-sae-short.mp4

  • Q: And what does this confusing, gigantic node for nerds do? πŸ€“
  • A: You can glitch the transformer (video model) by shuffling or skipping MLP and Attention layers:
demo-glitchformer.mp4

About

Text Encoders finally matter πŸ€–πŸŽ₯ - scale CLIP & LLM influence! + a Nerdy Transformer Shuffle node

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages