Use `benchmark_cls` for checking precision. #6375

ysiraichi · 2024-01-24T21:04:16Z

This PR makes it so we don't have to call load_benchmark only for checking the precision to be used.

benchmarks/torchbench_model.py

zpcore · 2024-01-24T21:26:42Z

Refer to the issue here for the context: #6286
Thanks for making the fix.

The key point I think is to prevent leaving behind a dangling object which e.g., moved a model to xla device. del benchmark doesn't resolve the issue because it has already claimed the PJRT runtime. This will trigger the stackdump error: RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: open(/dev/vfio/0): Device or resource busy: Device or resource busy; Couldn't open iommu group /dev/vfio/0.

zpcore · 2024-01-24T22:11:35Z

Since we only need to detect the precision, we can fetch the information directly without invoking

benchmark_cls(
        test=self.benchmark_experiment.test,
        device=device,
        batch_size=self.benchmark_experiment.batch_size,
    )

I think we can call the following load_benchmark_precision instead of load_benchmark to get the precision directly.

  def load_benchmark_precision(self):
   try:
     module = importlib.import_module(
         f"torchbenchmark.models.{self.model_name}")
   except ModuleNotFoundError:
     module = importlib.import_module(
         f"torchbenchmark.models.fb.{self.model_name}")
   benchmark_train_precision = getattr(module.Model, "DEFAULT_TRAIN_CUDA_PRECISION", None)
   benchmark_eval_precision = getattr(module.Model, "DEFAULT_EVAL_CUDA_PRECISION", None)
   return benchmark_train_precision, benchmark_eval_precision

WDYT?

ysiraichi · 2024-01-25T11:04:44Z

Right. Correct if I'm misunderstanding things, but isn't that exactly what I'm doing here?

zpcore · 2024-01-25T17:34:22Z

Right. Correct if I'm misunderstanding things, but isn't that exactly what I'm doing here?

Hah, you are right. I didn't notice that you called benchmark_cls instead.

Now it LGTM!

Cache benchmark_cls and use it for checking precision.

a8a4065

ysiraichi requested review from golechwierowicz, frgossen, cota, vanbasten23 and zpcore January 24, 2024 21:04

ysiraichi added the xla:gpu label Jan 24, 2024

ysiraichi mentioned this pull request Jan 24, 2024

Torchbench Benchmark Running ERROR #6286

Open

zpcore reviewed Jan 24, 2024

View reviewed changes

benchmarks/torchbench_model.py Show resolved Hide resolved

zpcore requested a review from will-cromar January 24, 2024 21:29

ysiraichi mentioned this pull request Jan 25, 2024

fix subprocess issue with orphaned PJRT loading #6376

Closed

zpcore self-requested a review January 25, 2024 17:41

zpcore approved these changes Jan 25, 2024

View reviewed changes

zpcore merged commit a1e51e4 into master Jan 25, 2024
18 checks passed

This was referenced Jan 29, 2024

Failing Torchbench Models: tracking issue #5932

Open

[torchbench] moco inference and training fail on inductor. #6367

Closed

lezcano changed the title ~~Use benchmark_cls for checking precision.`~~ Use benchmark_cls for checking precision. Feb 5, 2024

ysiraichi linked an issue Feb 16, 2024 that may be closed by this pull request

benchmarks/torchbench_model: some benchmarks fail to load and kill experiment_runner's main process #6207

Closed

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Use benchmark_cls for checking precision.` (#6375)

f8025cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `benchmark_cls` for checking precision. #6375

Use `benchmark_cls` for checking precision. #6375

ysiraichi commented Jan 24, 2024 •

edited

Loading

zpcore commented Jan 24, 2024

zpcore commented Jan 24, 2024

ysiraichi commented Jan 25, 2024

zpcore commented Jan 25, 2024

Use benchmark_cls for checking precision. #6375

Use benchmark_cls for checking precision. #6375

Conversation

ysiraichi commented Jan 24, 2024 • edited Loading

zpcore commented Jan 24, 2024

zpcore commented Jan 24, 2024

ysiraichi commented Jan 25, 2024

zpcore commented Jan 25, 2024

Use `benchmark_cls` for checking precision. #6375

Use `benchmark_cls` for checking precision. #6375

ysiraichi commented Jan 24, 2024 •

edited

Loading