v0.9.4 - don't repeat KV if possible
Breaking changes
None
New features
cappr.huggingface.classify
doesn't copy the prompt's KVs when broadcasting the prompt to completions ifbatch_size=1
or if you pass in a single prompt. Instead, it repeats a view of it. This change saves memory for tasks where there are many completions. For example, in the Banking 77 demo, peak reserved CUDA memory goes from 13.8 GB to 8.3 GB (~40% decrease), and peak allocated CUDA memory goes from 9.3 GB to 7.7 GB (~17% decrease).
Bug fixes
None