Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TabTransformer CUDA issue #81

Open
duncanmcelfresh opened this issue Dec 1, 2022 · 1 comment
Open

TabTransformer CUDA issue #81

duncanmcelfresh opened this issue Dec 1, 2022 · 1 comment
Labels
bug Something isn't working low-priority

Comments

@duncanmcelfresh
Copy link
Collaborator

occurs on datasets:

  • openml__Amazon_employee_access__34539
  • openml__PhishingWebsites__14952
  • openml__analcatdata_dmft__3560
  • openml__breast-cancer__145799
  • openml__car__146821
  • openml__connect-4__146195
  • openml__dna__167140
  • openml__kr-vs-kp__3
  • openml__primary-tumor__146032
  • openml__soybean__41
  • openml__splice__45
  • openml__tic-tac-toe__49

traceback:

Traceback (most recent call last):
  File "/home/shared/tabzilla/TabSurvey/tabzilla_experiment.py", line 137, in __call__
    result = cross_validation(model, self.dataset, self.time_limit)
  File "/home/shared/tabzilla/TabSurvey/tabzilla_utils.py", line 236, in cross_validation
    loss_history, val_loss_history = curr_model.fit(
  File "/home/shared/tabzilla/TabSurvey/models/tabtransformer.py", line 120, in fit
    loss.backward()
  File "/opt/conda/envs/torch/lib/python3.10/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/envs/torch/lib/python3.10/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
@duncanmcelfresh duncanmcelfresh added the bug Something isn't working label Dec 1, 2022
@duncanmcelfresh
Copy link
Collaborator Author

update - this is a nasty bug.. there are a handful discussions on stackexchange and other github repos trying to diagnose this "CUDA error: invalid configuration argument" error.

this is also an intermediate bug - e.g. it occurs on the datasets listed in the original post, but doesn't occur on many other datasets (e.g., "openml__credit-approval__29" is fine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working low-priority
Projects
None yet
Development

No branches or pull requests

1 participant