Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime::Printer has a scratch field that is ~never used #6469

Open
steven-johnson opened this issue Dec 6, 2021 · 7 comments
Open

runtime::Printer has a scratch field that is ~never used #6469

steven-johnson opened this issue Dec 6, 2021 · 7 comments

Comments

@steven-johnson
Copy link
Contributor

The runtime Printer has a scratch field that (presumably) is meant to avoid using malloc() for small print jobs, but in practice, it appears to ~never be used:

  • The length of scratch is (theoretically) a template parameter, which defaults to 1024
  • ...but scratch is actually declared as char scratch[length <= 256 ? length : 1];, so any length of > 256 becomes just 1 (which in practice means "always use malloc")
  • The places in runtime that override the default are:
    • profiler.cpp (which happens to also end up passing 1024)
    • tracing.cpp (which passes 4096)
    • opencl.cpp (16384)
    • d3d12compute.cpp (64k)
    • d3d12compute.cpp (16)
    • d3d12compute.cpp (256)

So basically, there are only two places (in d3d12compute) that are ever using this field.

The whole thing looks wonky to me -- my guess is that length was a straight passthrough in the past, but got limited this way due to stack-size issues on some architecture, and the ramifications never thought thru.

Anyway, this is an ugly wart that should really be resolved somehow, either by removing scratch entirely (since it's rarely used currently), or by deciding that a 1024-byte sized stack scratch is OK after all, or by deciding that 256-byte scratch is adequate for most things, or... something else.

(Note that d3d12compute.cpp has a StackPrinter class that attempts to make use of this, but the class is unused and should be removed.)

@shoaibkamil
Copy link
Contributor

Good catch, yikes! I definitely did not understand this was the behavior.

Pinging @slomp to chime in about the one in d3d12compute.cpp. I think we wanted to avoid malloc() and the bigger one is intentional.

@steven-johnson
Copy link
Contributor Author

Good catch, yikes! I definitely did not understand this was the behavior.

The API is really a bit too clever for its own good

@slomp
Copy link
Contributor

slomp commented Dec 6, 2021

When I implemented StackPrinter, it was to avoid malloc since I needed larger buffers for tracing / error dumps. The class had it's own internal char buffer [length] that was passed to Printer<type, length> to "override" the internal malloc().

What seems to have happened is that @abadams did some improvements on Halide's Printer and decided to retire my StackPrinter (this is the commit for reference 2e5309a)

@slomp
Copy link
Contributor

slomp commented Dec 6, 2021

Good catch, yikes! I definitely did not understand this was the behavior.

Pinging @slomp to chime in about the one in d3d12compute.cpp. I think we wanted to avoid malloc() and the bigger one is intentional.

Yup, malloc'ing on the 64k case is fine (it's on a debug path, just to dump shader code). Everywhere else, stack allocation is preferred, since it's small and scoped (very short lived).

@steven-johnson
Copy link
Contributor Author

Everywhere else, stack allocation is preferred, since it's small and scoped (very short lived).

Right -- and the point is that that only happens currently in those two cases (16 and 256)

@slomp
Copy link
Contributor

slomp commented Dec 6, 2021

Removing StackPrinter here: https://github.com/halide/Halide/pull/6470/files

As for the regular Printer, I say we just change the default template parameter from 1024 to 256.

(Although, to be honest, I kind of prefer to also have an explicit StackPrinter available since that has a precise behavior).

@abadams
Copy link
Member

abadams commented Dec 6, 2021

IIRC I had to remove the large stack allocations because it was causing failures to even compile the runtime in some circumstances.

steven-johnson added a commit that referenced this issue Dec 6, 2021
Instead of trying to optimize every Printer instance to use stack (and failing), move the StackPrinter concept into printer.h directly and require opt-in at the point of compilation to use stack instead of malloc.

This PR also does a few other drive-by cleanups:
- Ensures that all Printer ctors are explicit
- Makes some template aliases to make using (e.g.) ErrorPrinter with a custom buffer size slightly cleaner syntax
- Have tracing use the `.str()` method, which already deals with MSAN internally
- Make all the Printer data members private
- Fix some evil  code in opencl.cpp that previously used the now-private data members
steven-johnson added a commit that referenced this issue Dec 8, 2021
Instead of trying to optimize every Printer instance to use stack (and failing), move the StackPrinter concept into printer.h directly and require opt-in at the point of compilation to use stack instead of malloc.

This PR also does a few other drive-by cleanups:
- Ensures that all Printer ctors are explicit
- Makes some template aliases to make using (e.g.) ErrorPrinter with a custom buffer size slightly cleaner syntax
- Have tracing use the `.str()` method, which already deals with MSAN internally
- Make all the Printer data members private
- Fix some evil  code in opencl.cpp that previously used the now-private data members
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants