-
Notifications
You must be signed in to change notification settings - Fork 1
/
openlogprobs.html
311 lines (311 loc) · 10.7 KB
/
openlogprobs.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<meta name="author" content="Matthew Finlayson" />
<title>Obtaining logprobs from an LLM API</title>
<style>
html {
color: #1a1a1a;
background-color: #fdfdfd;
}
body {
margin: 0 auto;
max-width: 36em;
padding-left: 50px;
padding-right: 50px;
padding-top: 50px;
padding-bottom: 50px;
hyphens: auto;
overflow-wrap: break-word;
text-rendering: optimizeLegibility;
font-kerning: normal;
}
@media (max-width: 600px) {
body {
font-size: 0.9em;
padding: 12px;
}
h1 {
font-size: 1.8em;
}
}
@media print {
html {
background-color: white;
}
body {
background-color: transparent;
color: black;
font-size: 12pt;
}
p, h2, h3 {
orphans: 3;
widows: 3;
}
h2, h3, h4 {
page-break-after: avoid;
}
}
p {
margin: 1em 0;
}
a {
color: #1a1a1a;
}
a:visited {
color: #1a1a1a;
}
img {
max-width: 100%;
}
svg {
height: auto;
max-width: 100%;
}
h1, h2, h3, h4, h5, h6 {
margin-top: 1.4em;
}
h5, h6 {
font-size: 1em;
font-style: italic;
}
h6 {
font-weight: normal;
}
ol, ul {
padding-left: 1.7em;
margin-top: 1em;
}
li > ol, li > ul {
margin-top: 0;
}
blockquote {
margin: 1em 0 1em 1.7em;
padding-left: 1em;
border-left: 2px solid #e6e6e6;
color: #606060;
}
code {
font-family: Menlo, Monaco, Consolas, 'Lucida Console', monospace;
font-size: 85%;
margin: 0;
hyphens: manual;
}
pre {
margin: 1em 0;
overflow: auto;
}
pre code {
padding: 0;
overflow: visible;
overflow-wrap: normal;
}
.sourceCode {
background-color: transparent;
overflow: visible;
}
hr {
background-color: #1a1a1a;
border: none;
height: 1px;
margin: 1em 0;
}
table {
margin: 1em 0;
border-collapse: collapse;
width: 100%;
overflow-x: auto;
display: block;
font-variant-numeric: lining-nums tabular-nums;
}
table caption {
margin-bottom: 0.75em;
}
tbody {
margin-top: 0.5em;
border-top: 1px solid #1a1a1a;
border-bottom: 1px solid #1a1a1a;
}
th {
border-top: 1px solid #1a1a1a;
padding: 0.25em 0.5em 0.25em 0.5em;
}
td {
padding: 0.125em 0.5em 0.25em 0.5em;
}
header {
margin-bottom: 4em;
text-align: center;
}
#TOC li {
list-style: none;
}
#TOC ul {
padding-left: 1.3em;
}
#TOC > ul {
padding-left: 0;
}
#TOC a:not(:hover) {
text-decoration: none;
}
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
div.column{flex: auto; overflow-x: auto;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
/* The extra [class] is a hack that increases specificity enough to
override a similar rule in reveal.js */
ul.task-list[class]{list-style: none;}
ul.task-list li input[type="checkbox"] {
font-size: inherit;
width: 0.8em;
margin: 0 0.8em 0.2em -1.6em;
vertical-align: middle;
}
</style>
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js"
type="text/javascript"></script>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<header id="title-block-header">
<h1 class="title">Obtaining logprobs from an LLM API</h1>
<p class="author">Matthew Finlayson</p>
</header>
<h1 id="introduction">Introduction</h1>
<p>Many LLM APIs give top-<span class="math inline">\(k\)</span>
logprobs in their outputs. What if we want to obtain <em>all</em> the
logprobs? Here I present two algorithms for obtaining logprobs from an
LLM API. Both of these depend on the API allowing us to add a logit bias
to any token. The second is a generalization of the first.</p>
<h1 id="full-logprobs-in-v-api-calls">Full logprobs in <span
class="math inline">\(v\)</span> API calls</h1>
<p>Suppose for every call to the API we add a bias term <span
class="math inline">\(b\in\mathbb{R}\)</span> to a token <span
class="math inline">\(i\in\mathbb{N}\)</span>. This means that the
model’s logits <span class="math inline">\(\boldsymbol\ell\)</span> are
modified to <span class="math display">\[\begin{aligned}
\boldsymbol\ell'=(\ell_1,\ell_2,\ldots,\ell_i+b,\ldots,\ell_v).
\end{aligned}\]</span> If we collect the biased output <span
class="math inline">\(\log
p_i'=\log\mathrm{softmax}(\boldsymbol\ell')_i\)</span> for each
token <span class="math inline">\(i\)</span>, we now have <span
class="math display">\[\begin{aligned}
\log p_i'
&=\log\frac{\exp(\ell_i+b)}{\exp(\ell_i+b)+\sum_{j\neq i}\exp
\ell_j} \\
&= \log\frac{\exp\ell_i}{\exp\ell_i+\exp(-b)\sum_{j\neq i}\exp
\ell_j},
\end{aligned}\]</span> which we can exponentiate and rearrange to get
<span class="math display">\[\begin{aligned}
\frac{\exp(-b) p'_i}{1-p'_i} &=
\frac{\exp\ell_i}{\sum_{j\neq i}\exp\ell_j}.
\end{aligned}\]</span> Note that the righthand side is the <em>odds</em>
of the token, therefore we can solve for the unbiased probability <span
class="math inline">\(p_i\)</span> of the token <span
class="math display">\[\begin{aligned}
\frac{p_i}{1-p_i} &= \frac{\exp(-b) p'_i}{1-p'_i} \\
p_i &= \frac{\exp(-b) p'_i}{1-p'_i+\exp(-b) p'_i} \\
\log p_i &= \log p'_i + \log
\frac{\exp(-b)}{1-p'_i+\exp(-b) p'_i} \\
\log p_i &= \log p'_i - \log\left(\exp b -\exp(b + \log
p'_i)+p'_i\right)
\label{eq:prob}
\end{aligned}\]</span> Thus, it is possible to obtain unbiased logprobs
for any token with exactly 1 API call.</p>
<h1 id="full-logprobs-in-vk-api-calls">Full logprobs in <span
class="math inline">\(v/k\)</span> API calls</h1>
<p>We can adapt this algorithm to solve for <span
class="math inline">\(k\)</span> logprobs at a time simultaneously.
Without loss of generality, let us solve for tokens 1-<span
class="math inline">\(k\)</span> and assume that <span
class="math inline">\(\exp\ell_1=1\)</span> (since logprobs are
invariant to uniform logit translation). Setting <span
class="math inline">\(\lambda=\exp(-b)\)</span> for convenience, we can
solve for each <span class="math inline">\(\exp\ell_i\)</span> <span
class="math display">\[\begin{aligned}
p_i'
&=\frac{\exp\ell_i}{\sum_{j\in[k]}\exp\ell_j+\lambda\sum_{j\not\in[k]}\exp\ell_j}
\\
\frac{\exp\ell_i}{p_i'} &= \sum_{j\in[k]}\exp\ell_j +
\lambda\sum_{j\not\in[k]}\exp\ell_j \\
\exp\ell_i &= \frac{p'_i}{p'_1}
\end{aligned}\]</span> which in turn allows us to solve for <span
class="math inline">\(p_i\)</span> <span
class="math display">\[\begin{aligned}
\frac{\exp\ell_i}{p_i'} &= \sum_{j\in[k]}\exp\ell_j +
\lambda\sum_{j\not\in[k]}\exp\ell_j \\
\frac{\exp\ell_i}{\lambda p_i'} -
\frac1\lambda\sum_{j\in[k]}\exp\ell_j &=
\sum_{j\not\in[k]}\exp\ell_j \\
\frac{\exp\ell_i}{\lambda p_i'} -
\frac1\lambda\sum_{j\in[k]}\exp\ell_j &= \frac{\exp\ell_i}{p_i} -
\sum_{j\in[k]}\exp\ell_j \\
p_i &= \frac{\lambda p'_i\exp\ell_i}{\exp\ell_i +
p'_i(\lambda -1) \sum_{j\in[k]}\exp\ell_j} \\
p_i &= \frac{\lambda p'_i}{1 - \sum_{j\in[k]}p'_j +
\lambda\sum_{j\in[k]}p'_j}
\end{aligned}\]</span> which we can take the log of an obtain <span
class="math display">\[\begin{aligned}
\log p_i &= \log p'_i - \log\left(\left(1 -
\sum_{j\in[k]}p'_j\right)\exp b + \sum_{j\in[k]}p'_j\right).
\label{eq:parallel}
\end{aligned}\]</span></p>
<h1 id="top-n-logprobs">Top-<span class="math inline">\(n\)</span>
logprobs</h1>
<p>Since most of the probability is often concentrated in the top-<span
class="math inline">\(n\)</span> tokens, one may wish to gather logprobs
for only these top tokens and skip lower-probability tokens. This can be
done by adding a penalty bias <span class="math inline">\(-b\)</span> to
tokens that have known logprob to surface the next-highest-probability
batch of <span class="math inline">\(k\)</span> tokens. We can then
solve for the unbiased probability of new batch as follows. First, solve
the biased softmax for <span class="math inline">\(\exp\ell_i\)</span>
<span class="math display">\[\begin{aligned}
p'_i &= \frac{\exp\ell_i}{\sum_{j=i}^v\exp\ell_j +
\exp(-b)\sum_{j=1}^{i-1}\exp\ell_j} & \text{Biased softmax} \\
\exp\ell_i &= p'_i\left(\sum_{j=i}^v\exp\ell_j +
\exp(-b)\sum_{j=1}^{i-1}\exp\ell_j\right) \label{eq:bs}
\end{aligned}\]</span> as well as the unbiased softmax <span
class="math display">\[\begin{aligned}
p_i &= \frac{\exp\ell_i}{\sum_{j=1}^v\exp\ell_j} &
\text{Unbiased softmax} \\
\exp\ell_i &= p_i\left(\sum_{j=1}^v\exp\ell_j\right)
\label{eq:ubs}
\end{aligned}\]</span> which we can combine and simplify using the
arbitrary assumption that <span
class="math inline">\(\sum_{j=1}^{i-1}\exp\ell_j=1\)</span> <span
class="math display">\[\begin{aligned}
p_i\left(\sum_{j=1}^v\exp\ell_j\right) &=
p'_i\left(\sum_{j=i}^v\exp\ell_j +
\exp(-b)\sum_{j=1}^{i-1}\exp\ell_j\right) \\
p_i\left(1 + \sum_{j=i}^v\exp\ell_j\right) &=
p'_i\left(\sum_{j=i}^v\exp\ell_j + \exp(-b)\right) &
\text{Assume $\sum_{j=1}^{i-1}\exp\ell_j=1$} \label{eq:pi}.
\end{aligned}\]</span> We can now use the fact that we know the sum of
the first <span class="math inline">\(i-1\)</span> probabilities to
solve for <span class="math inline">\(\sum_{j=i}^v\exp\ell_j\)</span>
<span class="math display">\[\begin{aligned}
\sum_{j=1}^{i-1}p_j&=\frac{\sum_{j=1}^{i-1}\exp\ell_j}{\sum_{j=1}^v\exp\ell_j}
& \text{Softmax defn.} \\
&= \frac1{1+\sum_{j=i}^v\exp\ell_j} & \text{Recall
$\sum_{j=1}^{i-1}\exp\ell_j=1$} \\
\sum_{j=i}^v\exp\ell_j &= \frac1{\sum_{j=1}^{i-1}p_j} - 1
\label{eq:sump}
\end{aligned}\]</span> and substitute this into our result from earlier
to solve for <span class="math inline">\(p_i\)</span> <span
class="math display">\[\begin{aligned}
\frac{p_i}{\sum_{j=1}^{i-1}p_j} &=
p'_i\left(\frac1{\sum_{j=1}^{i-1}p_j} - 1 + \exp(-b)\right) \\
p_i &= p'_i\left(1 + (\exp(-b)-1) \sum_{j=1}^{i-1}p_j\right).
\end{aligned}\]</span></p>
</body>
</html>