You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Applying the following patch dumps the GPU peak memory usage of a given benchmark:
diff --git a/benchmarks/experiment_runner.py b/benchmarks/experiment_runner.py
index 9f55e3f02..a71cc20a2 100644
--- a/benchmarks/experiment_runner.py
+++ b/benchmarks/experiment_runner.py
@@ -202,6 +202,7 @@ class ExperimentRunner:
input_tensor):
tracing_time = None
total_time_start = time.perf_counter()
+ torch.cuda.reset_peak_memory_stats()
# Invoke iteration function and measure tracing time w/o waiting on the
# result.
if benchmark_experiment.xla:
@@ -210,7 +211,8 @@ class ExperimentRunner:
input_tensor, collect_full_output=self._args.collect_full_output)
if benchmark_experiment.xla:
tracing_time = time.perf_counter() - t_trace_start
-
+ print("> Max MEM (GB):", torch.cuda.max_memory_allocated() / 10**9)
# Mark step.
self._mark_step(benchmark_experiment)
total_time = time.perf_counter() - total_time_start
Running opacus_cifar10 benchmark with the following command (see below) make the memory leak explicit. Basically, the amount of used memory keeps growing iteration after iteration.
# This should run 4 (2x2) training iterations.
python xla/benchmarks/experiment_runner.py \
--suite-name torchbench --accelerator cuda --xla None --dynamo inductor --test train \
--repeat 2 --iterations-per-run 2 \
--no-resume --print-subprocess \
-k opacus_cifar10
> Max MEM (GB): 3.061852672
> Max MEM (GB): 5.957279232
> Max MEM (GB): 8.821041152
> Max MEM (GB): 11.683754496
Expected behavior
After the first training iteration, memory usage stays roughly constant.
I think it only works for inductor (I did try to use this with pt/xla, but didn't work). That's probably because their allocator is instrumented to gather this information (guess).
🐛 Bug
Applying the following patch dumps the GPU peak memory usage of a given benchmark:
Running
opacus_cifar10
benchmark with the following command (see below) make the memory leak explicit. Basically, the amount of used memory keeps growing iteration after iteration.# This should run 4 (2x2) training iterations. python xla/benchmarks/experiment_runner.py \ --suite-name torchbench --accelerator cuda --xla None --dynamo inductor --test train \ --repeat 2 --iterations-per-run 2 \ --no-resume --print-subprocess \ -k opacus_cifar10
Expected behavior
After the first training iteration, memory usage stays roughly constant.
Environment
cc @miladm @JackCaoG
The text was updated successfully, but these errors were encountered: