Fix kv_cache_type issue #2219

qingquansong · 2024-09-11T20:05:01Z

Fix kv_cache_type issue related to #1930

Details described: #1930 (comment)

Fix kv_cache_type issue

qingquansong · 2024-09-20T23:21:38Z

@Barry-Delaney ^^ mind helping take a look? Seems to help resolve some issue here #1930 Thank you!

lfr-0531 · 2024-09-21T16:33:28Z

Thanks for the fix. We'll merge you changes into internal code base.

hchings · 2024-09-27T16:48:19Z

Closing this out as it's been merged.

kaiyux · 2024-09-29T02:38:07Z

Hi @qingquansong , thanks a lot for the contribution! Your changes will be included in the next main branch update, and we'll mark you as co-author.

Please also note that, the Python benchmark is not suggested to be used and will soon be deprecated. Please take a look at the on-going support to a benchmarking suite, as well as the C++ benchmark for the support to the latest features.

Closing this out as it's been merged.

@hchings To clarify in case there are going to be confusion - the changes are merged in the internal repo, but not updated to the external GitHub repo yet. For future cases I would suggest to only close the PR after we pushed the main branch update that includes the changes. Please let me know if there are any questions, thanks!

Thanks again for your support!

qingquansong · 2024-09-29T02:54:48Z

Hi @qingquansong , thanks a lot for the contribution! Your changes will be included in the next main branch update, and we'll mark you as co-author.

Please also note that, the Python benchmark is not suggested to be used and will soon be deprecated. Please take a look at the on-going support to a benchmarking suite, as well as the C++ benchmark for the support to the latest features.

Closing this out as it's been merged.

@hchings To clarify in case there are going to be confusion - the changes are merged in the internal repo, but not updated to the external GitHub repo yet. For future cases I would suggest to only close the PR after we pushed the main branch update that includes the changes. Please let me know if there are any questions, thanks!

Thanks again for your support!

Sound great. Thank you! Besides the C++ throughput API benchmarking, I'm currently also switching to using the hlapi and benchmarking it with this perf evaluator script, is the a suggested one to use? It contains both latency and throughput results which is quite nice to use and the only thing that I'm modifying to add is the concurrency part (which I'm planning to use the Poisson request). Not sure if you think that's a good feature to add here .

kaiyux · 2024-09-30T01:30:13Z

Hi @qingquansong , thanks a lot for the contribution! Your changes will be included in the next main branch update, and we'll mark you as co-author.
Please also note that, the Python benchmark is not suggested to be used and will soon be deprecated. Please take a look at the on-going support to a benchmarking suite, as well as the C++ benchmark for the support to the latest features.

Closing this out as it's been merged.

@hchings To clarify in case there are going to be confusion - the changes are merged in the internal repo, but not updated to the external GitHub repo yet. For future cases I would suggest to only close the PR after we pushed the main branch update that includes the changes. Please let me know if there are any questions, thanks!
Thanks again for your support!

Sound great. Thank you! Besides the C++ throughput API benchmarking, I'm currently also switching to using the hlapi and benchmarking it with this perf evaluator script, is the a suggested one to use? It contains both latency and throughput results which is quite nice to use and the only thing that I'm modifying to add is the concurrency part (which I'm planning to use the Poisson request). Not sure if you think that's a good feature to add here .

@qingquansong Thanks a lot for your attention on those details! The tests/hlapi/hlapi_evaluator.py is under tests directory, and currently used for test purpose only. I would still suggest to try the suite and feel free to let us know your feedback. We'll consolidate the benchmarking workflows.

DanBlanaru · 2024-09-30T13:38:14Z

Hello @qingquansong. I was in charge of the publishing for this week.
It seems GitHub did not like the formatting as it did not properly credit you.
I apologize, I will credit you in next week's push.

DanBlanaru · 2024-10-08T14:43:19Z

Hello @qingquansong, we credited you in the last push to main .Thank you for the contribution again!

Fix kv_cache_type issue

dd524be

Fix kv_cache_type issue

qingquansong mentioned this pull request Sep 11, 2024

failed to load whisper decoder engine with paged kv cache #1930

Closed

4 tasks

lfr-0531 self-assigned this Sep 21, 2024

lfr-0531 added the triaged Issue has been triaged by maintainers label Sep 21, 2024

lfr-0531 added the Merged label Sep 23, 2024

hchings closed this Sep 27, 2024

DanBlanaru mentioned this pull request Sep 30, 2024

Update TensorRT-LLM #2273

Merged

kaiyux mentioned this pull request Nov 1, 2024

Update TensorRT-LLM v0.14.0 #2401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix kv_cache_type issue #2219

Fix kv_cache_type issue #2219

qingquansong commented Sep 11, 2024 •

edited

Loading

qingquansong commented Sep 20, 2024

lfr-0531 commented Sep 21, 2024

hchings commented Sep 27, 2024

kaiyux commented Sep 29, 2024

qingquansong commented Sep 29, 2024 •

edited

Loading

kaiyux commented Sep 30, 2024 •

edited

Loading

DanBlanaru commented Sep 30, 2024

DanBlanaru commented Oct 8, 2024

Fix kv_cache_type issue #2219

Fix kv_cache_type issue #2219

Conversation

qingquansong commented Sep 11, 2024 • edited Loading

qingquansong commented Sep 20, 2024

lfr-0531 commented Sep 21, 2024

hchings commented Sep 27, 2024

kaiyux commented Sep 29, 2024

qingquansong commented Sep 29, 2024 • edited Loading

kaiyux commented Sep 30, 2024 • edited Loading

DanBlanaru commented Sep 30, 2024

DanBlanaru commented Oct 8, 2024

qingquansong commented Sep 11, 2024 •

edited

Loading

qingquansong commented Sep 29, 2024 •

edited

Loading

kaiyux commented Sep 30, 2024 •

edited

Loading