Skip to content

v0.9.4 - don't repeat KV if possible

Compare
Choose a tag to compare
@kddubey kddubey released this 12 Sep 04:17
· 14 commits to main since this release
a831bd5

Breaking changes

None

New features

  • cappr.huggingface.classify doesn't copy the prompt's KVs when broadcasting the prompt to completions if batch_size=1 or if you pass in a single prompt. Instead, it repeats a view of it. This change saves memory for tasks where there are many completions. For example, in the Banking 77 demo, peak reserved CUDA memory goes from 13.8 GB to 8.3 GB (~40% decrease), and peak allocated CUDA memory goes from 9.3 GB to 7.7 GB (~17% decrease).

Bug fixes

None