You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your interesting work, I believe that the project provides new theoretical analysis and insights about speculative decoding.
I would like to ask a question about the draft/target memory ratio. The paper shows that "the draft models can occupy up to 38∼140% memory footprint of target models", but I didn't find any equation related to this. I wanna to know how do you analysis it theoretically? Could you provide a specific equation?
The text was updated successfully, but these errors were encountered:
Thanks for your interesting work, I believe that the project provides new theoretical analysis and insights about speculative decoding.
I would like to ask a question about the draft/target memory ratio. The paper shows that "the draft models can occupy up to 38∼140% memory footprint of target models", but I didn't find any equation related to this. I wanna to know how do you analysis it theoretically? Could you provide a specific equation?
The text was updated successfully, but these errors were encountered: