-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
219 lines (204 loc) · 11.4 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<!-- Meta tags for SEO and social media sharing -->
<meta name="description" content="Training-free diffusion model alignment with sampling demons. This paper proposes a novel stochastic optimization approach, Demon, to guide the denoising process in diffusion models for improved preference alignment without involving backpropagation.">
<!-- Open Graph meta tags for social media -->
<meta property="og:title" content="Training-free Diffusion Model Alignment with Sampling Demons">
<meta property="og:description" content="This preprint discusses a novel method to align diffusion models with user preferences using stochastic optimization. Our approach can integrate non-differentiable reward sources like VLM APIs and human judgements.">
<meta property="og:url" content="https://arxiv.org/abs/2410.05760">
<!-- Banner image for social media -->
<meta property="og:image" content="static/images/UIcat.jpg">
<meta property="og:image:width" content="1999">
<meta property="og:image:height" content="1001">
<!-- Twitter-specific meta tags -->
<meta name="twitter:title" content="Training-free Diffusion Model Alignment with Sampling Demons">
<meta name="twitter:description" content="Our approach improves preference alignment in diffusion models without backpropagation or retraining. The method can use non-differentiable rewards such as Visual-Language Models and human feedback.">
<meta name="twitter:image" content="static/images/UIcat.jpg">
<meta name="twitter:card" content="summary_large_image">
<!-- Keywords for indexing -->
<meta name="keywords" content="diffusion models, stochastic optimization, sampling demons, preference alignment, non-differentiable rewards, arXiv, human judgements">
<meta name="google-site-verification" content="h5SaLSUdeMr-AIU2HVqEDPXGFZzYNxEpsn0pO_3c0Oo">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Training-free Diffusion Model Alignment with Sampling Demons</title>
<link rel="icon" type="image/x-icon" href="static/images/favicon.ico">
<!-- Favicon credit: https://www.flaticon.com/free-icon/smile_13535737 -->
<!-- Google Fonts -->
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
<!-- External CSS Libraries -->
<link rel="stylesheet" href="static/css/bulma.min.css">
<link rel="stylesheet" href="static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="static/css/bulma-slider.min.css">
<link rel="stylesheet" href="static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="static/css/index.css">
<!-- JavaScript Libraries -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://documentcloud.adobe.com/view-sdk/main.js"></script>
<script defer src="static/js/fontawesome.all.min.js"></script>
<script src="static/js/bulma-carousel.min.js"></script>
<script src="static/js/bulma-slider.min.js"></script>
<script src="static/js/index.js"></script>
<script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
</head>
<body>
<!-- Header Section -->
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">Training-free Diffusion Model Alignment with Sampling Demons</h1>
<div class="is-size-5 publication-authors">
<!-- Paper authors -->
<span class="is-inline-block">
<a href="https://scholar.google.com/citations?user=rDzzITAAAAAJ&hl" target="_blank">Po-Hung Yeh</a>,
</span>
<span class="is-inline-block">
<a href="https://kuanghuei.github.io/" target="_blank">Kuang-Huei Lee</a>,
</span>
<span class="is-inline-block">
<a href="https://homepage.citi.sinica.edu.tw/pages/pullpull/index_en.html" target="_blank">Jun-Cheng Chen</a>
</span>
</div>
<div class="is-size-5 publication-authors">
<span class="is-inline-block">Academia Sinica, Google DeepMind</span>
</div>
<div class="publication-links">
<!-- Paper Links -->
<span class="link-block">
<a href="https://arxiv.org/pdf/2410.05760" target="_blank" class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<span class="link-block">
<a href="https://github.com/rareone0602/Demon_page" target="_blank" class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code [Coming Soon]</span>
</a>
</span>
<span class="link-block">
<a href="https://arxiv.org/abs/2410.05760" target="_blank" class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span>
</div>
</div>
</div>
</div>
</div>
</section>
<!-- End Header Section -->
<!-- Teaser Video Section -->
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<!-- Video -->
<figure>
<img src="static/images/ShortVideo.gif" alt="Teaser Image">
</figure>
<!-- Teaser Text -->
<h2 class="subtitle has-text-centered">
Our proposed stochastic optimization algorithm, Demon, allows to guide the denoising process with the human preference through direct user interaction during inference.
</h2>
<p class="subtitle has-text-centered">
The author selects images marked by a red border based on their preferences—non-preferred images remain unselected—to align with a reference image (top left). Improved performance is observed by measuring the cosine similarity of DINOv2 features between the baseline PF-ODE 0.5414 (bottom left) and the final state 0.8549 (right).
</p>
</div>
</div>
</section>
<!-- End Teaser Video Section -->
<!-- Abstract Section -->
<section class="section hero is-light">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
Aligning diffusion models with user preferences has been a key challenge. Existing methods for aligning diffusion models either require retraining or are limited to differentiable reward functions. To address these limitations, we propose a stochastic optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining. Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through stochastic optimization. We provide comprehensive theoretical and empirical evidence to support and validate our approach, including experiments that use non-differentiable sources of rewards such as Visual-Language Model (VLM) APIs and human judgements. To the best of our knowledge, the proposed approach is the first inference-time, backpropagation-free preference alignment method for diffusion models. Our method can be easily integrated with existing diffusion models without further training. Our experiments show that the proposed approach significantly improves the average aesthetics scores for text-to-image generation.
</p>
</div>
</div>
</div>
</div>
</section>
<!-- End Abstract Section -->
<section class="hero is-small">
<div class="hero-body">
<div class="container is-max-desktop">
<h2 class="title">Method Overview</h2>
<figure>
<img id="method-overview-image" src="static/images/DemonVideo.gif" alt="Method Overview">
</figure>
<!-- Footnotes section -->
<div class="footnotes">
<hr>
<ol>
<li id="footnote1">
While the video demonstrates using a first-order solver for simplicity, the paper employs a second-order solver to enhance optimization efficiency.
</li>
<li id="footnote2">
Unlike the demo, the paper projects the weighted noise onto a sphere of radius \( \sqrt{N} \) instead of dividing by \( \sqrt{K} \), while retaining the same concept to generate Gaussian-like noise.
</li>
</ol>
</div>
</div>
</div>
</section>
<div id="sections-container"></div>
<!-- End Method Overview Section -->
<!-- Preprint Section -->
<section class="hero is-small is-light">
<div class="hero-body">
<div class="container is-max-desktop">
<h2 class="title">Preprint</h2>
<iframe src="static/pdfs/2410.05760v1.pdf" width="100%" height="550"></iframe>
</div>
</div>
</section>
<!-- End Preprint Section -->
<!-- BibTeX Citation -->
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@misc{yeh2024trainingfreediffusionmodelalignment,
title={Training-free Diffusion Model Alignment with Sampling Demons},
author={Po-Hung Yeh and Kuang-Huei Lee and Jun-Cheng Chen},
year={2024},
eprint={2410.05760},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.05760},
}</code></pre>
</div>
</section>
<!-- End BibTeX Citation -->
<!-- Footer -->
<footer class="footer">
<div class="container">
<div class="columns is-centered">
<div class="column is-8">
<div class="content">
<p>
This page was built using the <a href="https://github.com/eliahuhorwitz/Academic-project-page-template" target="_blank">Academic Project Page Template</a> which was adopted from the <a href="https://nerfies.github.io" target="_blank">Nerfies</a> project page.
You are free to borrow the source code of this website, we just ask that you link back to this page in the footer.<br> This website is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/" target="_blank">Creative Commons Attribution-ShareAlike 4.0 International License</a>. Favicon credit: <a href="https://www.flaticon.com/free-icon/smile_13535737" target="_blank">André Luiz Gollo</a>.
The teaser and method overview video is licensed under <a href="https://creativecommons.org/publicdomain/zero/1.0/" target="_blank">Creative Commons Zero (CC0)</a> (<a href="https://github.com/rareone0602/Demon_page" target="_blank">Video Source Code</a>). You are free to distribute, modify, and use this video without attribution.
</p>
</div>
</div>
</div>
</div>
</footer>
<!-- End Footer -->
</body>
</html>