Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance regression: Multiple allocations with "--check-bounds=no" on map after 1.9 #50110

Open
dpinol opened this issue Jun 8, 2023 · 1 comment
Labels
performance Must go faster

Comments

@dpinol
Copy link

dpinol commented Jun 8, 2023

This always reports 1 allocation in julia <1.9

h(x) = (2,)
f(v) = @time Base.map(h, v)
f([2])
  0.000003 seconds (1 allocation: 64 bytes)
1-element Vector{Tuple{Int64}}:
 (2,)

But on julia 1.9 or master, when using --check-bounds=no it reports 6 allocations

h(x) = (2,)
f(v) = @time Base.map(h, v)
f([2])
  0.000020 seconds (6 allocations: 288 bytes)
1-element Vector{Tuple{Int64}}:
 (2,)
@oscardssmith
Copy link
Member

This theoretically should be fixed by #50107. That said, see #48245 for why --checkbounds=no is probably a bad idea.

Keno added a commit that referenced this issue Jun 20, 2023
In 1.9, `--check-bounds=no` has started causing significant performance
regressions (e.g. #50110). This is because we switched a number of functions that
used to be `@pure` to new effects-based infrastructure, which very closely tracks
the the legality conditions for concrete evaluation. Unfortunately, disabling
bounds checking completely invalidates all prior legality analysis, so the only
realistic path we have is to completely disable it.

In general, we are learning that these kinds of global make-things-faster-but-unsafe
flags are highly problematic for a language for several reasons:

- Code is written with the assumption of a particular mode being chosen, so
  it is in general not possible or unsafe to compose libraries (which in a language
  like julia is a huge problem).

- Unsafe semantics are often harder for the compiler to reason about, causing
  unexpected performance issues (although the 1.9 --check-bounds=no issues are
  worse than just disabling concrete eval for things that use bounds checking)

In general, I'd like to remove the `--check-bounds=` option entirely (#48245),
but that proposal has encountered major opposition.

This PR implements an alternative proposal: We introduce a new function
`Core.should_check_bounds(boundscheck::Bool) = boundscheck`. This function is
passed the result of `Expr(:boundscheck)` (which is now purely determined by
the inliner based on `@inbounds`, without regard for the command line flag).

In this proposal, what the command line flag does is simply redefine this
function to either `true` or `false` (unconditionally) depending on the
value of the flag.

Of course, this causes massive amounts of recompilation, but I think this can
be addressed by adding logic to loading that loads a pkgimage with appropriate
definitions to cure the invalidations. The special logic we have now now
to take account of the --check-bounds flag in .ji selection, would be replaced
by automatically injecting the special pkgimage as a dependency to every
loaded image. This part isn't implemented in this PR, but I think it's reasonable
to do.

I think with that, the `--check-bounds` flag remains functional, while having
much more well defined behavior, as it relies on the standard world age
mechanisms.

A major benefit of this approach is that it can be scoped appropriately using
overlay tables. For exmaple:

```
julia> using CassetteOverlay

julia> @MethodTable AssumeInboundsTable;

julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false;

julia> assume_inbounds = @overlaypass AssumeInboundsTable

julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically
```

Similar logic applies to GPUCompiler, which already supports overlay tables.
Keno added a commit that referenced this issue Jun 20, 2023
In 1.9, `--check-bounds=no` has started causing significant performance
regressions (e.g. #50110). This is because we switched a number of functions that
used to be `@pure` to new effects-based infrastructure, which very closely tracks
the the legality conditions for concrete evaluation. Unfortunately, disabling
bounds checking completely invalidates all prior legality analysis, so the only
realistic path we have is to completely disable it.

In general, we are learning that these kinds of global make-things-faster-but-unsafe
flags are highly problematic for a language for several reasons:

- Code is written with the assumption of a particular mode being chosen, so
  it is in general not possible or unsafe to compose libraries (which in a language
  like julia is a huge problem).

- Unsafe semantics are often harder for the compiler to reason about, causing
  unexpected performance issues (although the 1.9 --check-bounds=no issues are
  worse than just disabling concrete eval for things that use bounds checking)

In general, I'd like to remove the `--check-bounds=` option entirely (#48245),
but that proposal has encountered major opposition.

This PR implements an alternative proposal: We introduce a new function
`Core.should_check_bounds(boundscheck::Bool) = boundscheck`. This function is
passed the result of `Expr(:boundscheck)` (which is now purely determined by
the inliner based on `@inbounds`, without regard for the command line flag).

In this proposal, what the command line flag does is simply redefine this
function to either `true` or `false` (unconditionally) depending on the
value of the flag.

Of course, this causes massive amounts of recompilation, but I think this can
be addressed by adding logic to loading that loads a pkgimage with appropriate
definitions to cure the invalidations. The special logic we have now now
to take account of the --check-bounds flag in .ji selection, would be replaced
by automatically injecting the special pkgimage as a dependency to every
loaded image. This part isn't implemented in this PR, but I think it's reasonable
to do.

I think with that, the `--check-bounds` flag remains functional, while having
much more well defined behavior, as it relies on the standard world age
mechanisms.

A major benefit of this approach is that it can be scoped appropriately using
overlay tables. For exmaple:

```
julia> using CassetteOverlay

julia> @MethodTable AssumeInboundsTable;

julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false;

julia> assume_inbounds = @overlaypass AssumeInboundsTable

julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically
```

Similar logic applies to GPUCompiler, which already supports overlay tables.
Keno added a commit that referenced this issue Jul 18, 2023
In 1.9, `--check-bounds=no` has started causing significant performance
regressions (e.g. #50110). This is because we switched a number of functions that
used to be `@pure` to new effects-based infrastructure, which very closely tracks
the the legality conditions for concrete evaluation. Unfortunately, disabling
bounds checking completely invalidates all prior legality analysis, so the only
realistic path we have is to completely disable it.

In general, we are learning that these kinds of global make-things-faster-but-unsafe
flags are highly problematic for a language for several reasons:

- Code is written with the assumption of a particular mode being chosen, so
  it is in general not possible or unsafe to compose libraries (which in a language
  like julia is a huge problem).

- Unsafe semantics are often harder for the compiler to reason about, causing
  unexpected performance issues (although the 1.9 --check-bounds=no issues are
  worse than just disabling concrete eval for things that use bounds checking)

In general, I'd like to remove the `--check-bounds=` option entirely (#48245),
but that proposal has encountered major opposition.

This PR implements an alternative proposal: We introduce a new function
`Core.should_check_bounds(boundscheck::Bool) = boundscheck`. This function is
passed the result of `Expr(:boundscheck)` (which is now purely determined by
the inliner based on `@inbounds`, without regard for the command line flag).

In this proposal, what the command line flag does is simply redefine this
function to either `true` or `false` (unconditionally) depending on the
value of the flag.

Of course, this causes massive amounts of recompilation, but I think this can
be addressed by adding logic to loading that loads a pkgimage with appropriate
definitions to cure the invalidations. The special logic we have now now
to take account of the --check-bounds flag in .ji selection, would be replaced
by automatically injecting the special pkgimage as a dependency to every
loaded image. This part isn't implemented in this PR, but I think it's reasonable
to do.

I think with that, the `--check-bounds` flag remains functional, while having
much more well defined behavior, as it relies on the standard world age
mechanisms.

A major benefit of this approach is that it can be scoped appropriately using
overlay tables. For exmaple:

```
julia> using CassetteOverlay

julia> @MethodTable AssumeInboundsTable;

julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false;

julia> assume_inbounds = @overlaypass AssumeInboundsTable

julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically
```

Similar logic applies to GPUCompiler, which already supports overlay tables.
Keno added a commit that referenced this issue Jul 19, 2023
In 1.9, `--check-bounds=no` has started causing significant performance
regressions (e.g. #50110). This is because we switched a number of functions that
used to be `@pure` to new effects-based infrastructure, which very closely tracks
the the legality conditions for concrete evaluation. Unfortunately, disabling
bounds checking completely invalidates all prior legality analysis, so the only
realistic path we have is to completely disable it.

In general, we are learning that these kinds of global make-things-faster-but-unsafe
flags are highly problematic for a language for several reasons:

- Code is written with the assumption of a particular mode being chosen, so
  it is in general not possible or unsafe to compose libraries (which in a language
  like julia is a huge problem).

- Unsafe semantics are often harder for the compiler to reason about, causing
  unexpected performance issues (although the 1.9 --check-bounds=no issues are
  worse than just disabling concrete eval for things that use bounds checking)

In general, I'd like to remove the `--check-bounds=` option entirely (#48245),
but that proposal has encountered major opposition.

This PR implements an alternative proposal: We introduce a new function
`Core.should_check_bounds(boundscheck::Bool) = boundscheck`. This function is
passed the result of `Expr(:boundscheck)` (which is now purely determined by
the inliner based on `@inbounds`, without regard for the command line flag).

In this proposal, what the command line flag does is simply redefine this
function to either `true` or `false` (unconditionally) depending on the
value of the flag.

Of course, this causes massive amounts of recompilation, but I think this can
be addressed by adding logic to loading that loads a pkgimage with appropriate
definitions to cure the invalidations. The special logic we have now now
to take account of the --check-bounds flag in .ji selection, would be replaced
by automatically injecting the special pkgimage as a dependency to every
loaded image. This part isn't implemented in this PR, but I think it's reasonable
to do.

I think with that, the `--check-bounds` flag remains functional, while having
much more well defined behavior, as it relies on the standard world age
mechanisms.

A major benefit of this approach is that it can be scoped appropriately using
overlay tables. For exmaple:

```
julia> using CassetteOverlay

julia> @MethodTable AssumeInboundsTable;

julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false;

julia> assume_inbounds = @overlaypass AssumeInboundsTable

julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically
```

Similar logic applies to GPUCompiler, which already supports overlay tables.
@brenhinkeller brenhinkeller added the performance Must go faster label Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants