Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chunked prefill #2201

Merged
merged 4 commits into from
Aug 1, 2024
Merged

Fix chunked prefill #2201

merged 4 commits into from
Aug 1, 2024

Conversation

lzhangzz
Copy link
Collaborator

@lzhangzz lzhangzz commented Jul 31, 2024

  • Add max_prefill_token_num, decoupled with max_context_token_num
    • max_prefill_token_num (>= max_batch_size): max number of prefill tokens in a forward pass
    • max_context_token_num (>= session_len): max number of context tokens (including history) in a forward pass
  • Fix data race in chunked prefill

Memory consumption is greatly reduced for large session_len.

@lvhan028 lvhan028 requested review from lvhan028 and irexyc July 31, 2024 13:53
@lvhan028 lvhan028 added the enhancement New feature or request label Jul 31, 2024
@lvhan028 lvhan028 merged commit ddb462b into InternLM:main Aug 1, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants