Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set --xla_latency_hiding_scheduler_rerun to 1 #5736

Merged
merged 4 commits into from
Oct 26, 2023
Merged

Conversation

alanwaketan
Copy link
Collaborator

Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.

Test Plan:
Tested on Llama 2 7B on V4-32.

Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.

Test Plan:
Tested on Llama 2 7B on V4-32.
@alanwaketan alanwaketan self-assigned this Oct 25, 2023
@JackCaoG
Copy link
Collaborator

lgtm, let's wait for the manual run passed then merge this pr

# Lower shared memory limit means less communiation and computation overlapping,
# and thus worse performance.
flags = _set_missing_flags(flags,
(('xla_latency_hiding_scheduler_rerun', '1'),))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I was actually wondering about this, we set XLA_FLAGS here, but in other cases we pass the flags through LIBTPU_INIT_ARGS. Do you know if there's a difference? If you saw the appropriate log output from the test, it seems both work...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right.. sorry I missed this. so for everything in compiler/xla/xla.proto we used XLA_FLAGS. xla_latency_hiding_scheduler_rerun is one of those TPU specified flags that needs to be passed in with LIBTPU_INIT_ARGS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I miss this lol.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense that LIBTPU_INIT_ARGS would be tpu-specific lol, thanks @JackCaoG. Is there a rule of thumb to tell which flag goes where? I'm thinking in terms of hashing the compilation environment, I suppose we'll just need to ensure both env vars are included.

Copy link
Collaborator

@jonb377 jonb377 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Jiewen!

@alanwaketan
Copy link
Collaborator Author

Thanks for the review, Jon.

@alanwaketan alanwaketan merged commit 47a33d0 into master Oct 26, 2023
18 checks passed
jonb377 pushed a commit that referenced this pull request Oct 31, 2023
Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.

Test Plan:
Tested on Llama 2 7B on V4-32.
mbzomowski pushed a commit to mbzomowski-test-org/xla that referenced this pull request Nov 16, 2023
Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.

Test Plan:
Tested on Llama 2 7B on V4-32.
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.

Test Plan:
Tested on Llama 2 7B on V4-32.
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.

Test Plan:
Tested on Llama 2 7B on V4-32.
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.

Test Plan:
Tested on Llama 2 7B on V4-32.
golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024
Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.

Test Plan:
Tested on Llama 2 7B on V4-32.
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.

Test Plan:
Tested on Llama 2 7B on V4-32.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants