Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] reserve allocation should be displayed when erroring due to lack of memory on startup #11168

Open
abellina opened this issue Jul 11, 2024 · 0 comments · May be fixed by #11282
Open

[BUG] reserve allocation should be displayed when erroring due to lack of memory on startup #11168

abellina opened this issue Jul 11, 2024 · 0 comments · May be fixed by #11282
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@abellina
Copy link
Collaborator

abellina commented Jul 11, 2024

Our QA found the following error message as they were debugging something else, and rightly called out it was confusing:

24/02/06 04:02:51 ERROR RapidsExecutorPlugin: Exception in the executor plugin, shutting down! 
java.lang.IllegalArgumentException: The pool allocation of -498.875 MB (calculated from
spark.rapids.memory.gpu.allocFraction (=1.0) and 141.125 MB free memory) was less than the minimum allocation of
20252.015625 (calculated from spark.rapids.memory.gpu.minAllocFraction (=0.25) and 81008.0625 MB total memory)

The reason for the negative number is we are missing our reserve amount from the message. There are other things to clean up here like that there are no units in the minimum allocation umber. Also these should all be MiB not MB.

We could say:

The pool allocation of -498.875 MiB (gpu.free: 141.125 MiB, spark.rapids.memory.gpu.allocFraction: 1.0,
spark.rapids.memory.gpu.reserve: 640 MiB => (gpu.free - reserve) * allocFraction = -498.875 MiB) was less than the minimum
allocation of 20252.016 MiB (gpu.total: 81008.063 MiB, spark.rapids.memory.gpu.minAllocFraction: 0.25 => gpu.total *
minAllocFraction = 20252.016 MiB). Please ensure that the GPU has enough free memory, or adjust configuration accordingly.

Added some line breaks in the messages above to make it easier to read.

@abellina abellina added bug Something isn't working good first issue Good for newcomers ? - Needs Triage Need team to review and classify labels Jul 11, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jul 16, 2024
@kuhushukla kuhushukla self-assigned this Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
3 participants