Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable fx_graph_cache in gpt-fast example #2935

Merged
merged 6 commits into from
Feb 10, 2024

Conversation

mreso
Copy link
Collaborator

@mreso mreso commented Feb 9, 2024

Description

This PR enables fxGraphCache in the gpt-fast example to speed up compile time.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • [X]pytest test/pytest/test_example_gpt_fast.py -k test_handler
===================================================================================================================== test session starts =====================================================================================================================
platform linux -- Python 3.10.12, pytest-7.3.1, pluggy-1.3.0
rootdir: /home/ubuntu/serve
plugins: mock-3.10.0, cov-4.1.0
collected 5 items / 3 deselected / 2 selected

test/pytest/test_example_gpt_fast.py ..                                                                                                                                                                                                                 [100%]

====================================================================================================================== warnings summary =======================================================================================================================
test/pytest/test_example_gpt_fast.py::test_handler[false]
  /home/ubuntu/serve/ts/torch_handler/base_handler.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    from pkg_resources import packaging

test/pytest/test_example_gpt_fast.py::test_handler[false]
test/pytest/test_example_gpt_fast.py::test_handler[false]
  /home/ubuntu/miniconda3/envs/serve/lib/python3.10/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('zope')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

test/pytest/test_example_gpt_fast.py::test_handler[false]
test/pytest/test_example_gpt_fast.py::test_handler[true]
test/pytest/test_example_gpt_fast.py::test_handler[true]
test/pytest/test_example_gpt_fast.py::test_handler[true]
  /home/ubuntu/miniconda3/envs/serve/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:321: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_
kernel() for the new context manager, with updated signature.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================================================== 2 passed, 3 deselected, 7 warnings in 79.21s (0:01:19) ====================================================================================================
  • pytest test/pytest/test_example_gpt_fast.py -k test_gpt_fast_mar[mar_file_path0]
===================================================================================================================== test session starts =====================================================================================================================
platform linux -- Python 3.10.12, pytest-7.3.1, pluggy-1.3.0
rootdir: /home/ubuntu/serve
plugins: mock-3.10.0, cov-4.1.0
collected 5 items / 4 deselected / 1 selected

test/pytest/test_example_gpt_fast.py 2024-02-09T20:28:14,022 [INFO ] W-29500-gpt_fast_handler_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:6.3719938016E7|#model_name:gpt_fast_handler,model_version:default|#hostname:ip-172-31-15-101,time
stamp:1707510494
2024-02-09T20:28:14,022 [INFO ] W-29500-gpt_fast_handler_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:144.753|#model_name:gpt_fast_handler,model_version:default|#hostname:ip-172-31-15-101,timestamp:1707510494
2024-02-09T20:28:14,022 [INFO ] W-29500-gpt_fast_handler_1.0-stdout MODEL_METRICS - HandlerTime.ms:63717.68|#ModelName:gpt_fast_handler,Level:Model|#hostname:ip-172-31-15-101,requestID:69282a5f-3532-497e-863f-4824e81e505a,timestamp:1707510494
.                                                                                                                                                                                                                  [100%]

========================================================================================================= 1 passed, 4 deselected in 75.21s (0:01:15) ==========================================================================================================

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@mreso mreso marked this pull request as ready for review February 9, 2024 21:42
@mreso mreso requested review from agunapal and lxning February 9, 2024 21:42
Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me.
Can you please add a section in the README highlighting this example working with multiple GPUs.
The README talks about it with TP, but its not very obvious.

@mreso mreso enabled auto-merge February 10, 2024 01:10
@mreso mreso added this pull request to the merge queue Feb 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Feb 10, 2024
@msaroufim msaroufim self-requested a review February 10, 2024 04:48
@msaroufim msaroufim added this pull request to the merge queue Feb 10, 2024
Merged via the queue into master with commit e6654ec Feb 10, 2024
15 checks passed
@chauhang chauhang added this to the v0.10.0 milestone Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants