Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Set --xla_latency_hiding_scheduler_rerun to 1 (#5736)
Summary: This flag will rerun the latency hidding scheduler if the default shared memory limit 95% leads to OOM. Each rerun will choose a value 0.9x of the previous run, and the number of rerun is set to 1 now. Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit. Lower shared memory limit means less communiation and computation overlapping, and thus worse performance. Test Plan: Tested on Llama 2 7B on V4-32.
- Loading branch information