Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mirpasses: make call-argument fixup a MIR pass #818

Merged
merged 4 commits into from
Jul 31, 2023

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Jul 29, 2023

Summary

Replace the PNode-based fixup for call arguments in ccgcalls with a
MIR pass. This removes a dependency on PNode-based analysis from
cgen, and also makes it possible to, in the future, enable the fixup
for the JS or VM backends.

While the used analysis stays mostly the same, an observable evaluation-
order violation is fixed. Injecting (shallow) copies for values passed
to both immutable and mutable parameters now considers all parameters,
instead of only parameters to the right of immutable ones. For example,
given:

f(a, a) # proc f(x: var T, y: T)

this fixes mutations through x inside f being visible on y.

Details

The analysis used by the MIR pass works mostly the same as the one
previously used in ccgcalls: for each argument value that is not
explicitly passed by-reference, it is analyzed whether:

  • the value is potentially mutated after it is bound to the parameter
    but before the procedure is called
  • the value is potentially also passed to a var parameter
    If either is the case, the argument is shallow-copied to a temporary
    that is then passed to the parameter instead.

The differences compared to the previous analysis are that:

  • checking whether the argument value (or something that potentially
    overlaps with it in memory) is also passed to a mutable parameter
    now also considers preceding parameters. Previously, only the
    following parameters were checked
  • testing for overlapping values considers the whole path now, instead
    of only the root. In effect, this means that for f(a.x, a.y), where
    the second parameter is mutable, no temporary is (unnecessarily)
    injected for the first parameter

For overlap testing, the maybeSameMutableLocation procedure is
introduced, which mirrors the behaviour of dfa.aliases (the routine
used by the previous PNode-based analysis). Since dereferences of
pointer-like values are treated like a normal field access, calls like
f(a[].x, b[].y), where one of the parameters is mutable and a and
b point to the same location, still cause observable evaluation-order
violations.

Finally, the analysis and temporary injection in ccgcalls is removed,
and a test is added for the fixed issue.

Summary
=======

Replace the `PNode`-based fixup for call arguments in `ccgcalls` with a
MIR pass. This removes a dependency on `PNode`-based analysis from
`cgen`, and also makes it possible to, in the future, enable the fixup
when the JS or VM backends are used (both are also affected by the
issue).

While the used analysis stays mostly the same, an observable evaluation-
order violation is fixed. Injecting (shallow) copies for values passed
to both immutable and mutable parameters now considers *all* parameters,
instead of only parameters to the right of immutable ones. For example,
given:
```nim
f(a, a) # proc f(x: var T, y: T)
```
this fixes mutations through `x` inside `f` being visible on `y`.

Details
=======

The analysis used by the MIR pass works mostly the same as the one
previously used in `ccgcalls`: for each argument value that is not
explicitly passed by-reference, it is analyzed whether:
- the value is potentially mutated *after* it is bound to the parameter
  but *before* the procedure is called
- the value is potentially also passed to a `var` parameter
If either is the case, the argument is shallow-copied to a temporary
that is then passed to the parameter instead.

The differences compared to the previous analysis are that:
- checking whether the argument value (or something that potentially
  overlaps with it in memory) is also passed to a mutable parameter
  now also considers preceding parameters. Previously, only the
  following parameters were checked
- testing for overlapping values considers the whole path now, instead
  of only the root. In effect, this means that for `f(a.x, a.y)`, where
  the second parameter is mutable, no temporary is (unnecessarily)
  injected for the first parameter

For overlap testing, the `maybeSameMutableLocation` procedure is
introduced, which mirrors the behaviour of `dfa.aliases` (the routine
used by the previous `PNode`-based analysis). Since dereferences of
pointer-like values are treated like a normal field access, calls like
`f(a[].x, b[].y)`, where one of the parameters is mutable and `a` and
`b` point to the same location, still cause observable evaluation-order
violations.

Finally, the analysis and temporary injection in `ccgcalls` is removed,
and a test is added for the fixed issue.
@zerbina zerbina added bug Something isn't working compiler/backend Related to backend system of the compiler simplification Removal of the old, unused, unnecessary or un/under-specified language features. labels Jul 29, 2023
@zerbina zerbina added this to the MIR phase milestone Jul 29, 2023
@zerbina zerbina requested review from saem and Clyybber July 29, 2023 19:02
## specified range, rather it means that the value *could* be mutated.
var i = start
while i <= last:
# all ``mnkTag`` nodes currently imply some sort of mutation/change
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm increasingly starting to think that generalizing the original mnkModify into mnkTag was a mistake. As the comment mentions, all current value tags represent some form of mutation, but the name mnkTag doesn't really convey that.

I'll think about it some more, but my current opinion is that mnkTag should become mnkMutation (or similar).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless the intention is to generalize across other effects, I agree I think a name indicating mutation/memory makes sense.

Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool

## Do note that due to the placement of this pass (it happens after the
## ``injectdestructors`` pass), only *shallow*, non-owning copies of the
## affected arguments are made, meaning that there's the issue of resource-
## like values (refs, seqs, strings, everything else that has a destructor)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm increasingly convinced that destructors, as realized are a misfeature, or one that should be deemphasized. instead we should trigger ops on the containing type, which would be better for data oriented programming and allow for better composition. the op interface might be: =receiveCopy, =takeOwnership, =endOfLife, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does instead we should trigger ops on the containing type mean concretely, since the current hooks are attached to the respective type?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type
  seq[T] = object # for discussion assume this definition
    data: uncheckedArray[T]
    len: int

proc `=takeOwnership`[T](s: seq[T], t: T) = # ... whatever default impl we want

# `seq` is a container and its 'containers ops' are triggered based on a `T`'s lifetime (like current type ops), but call `seq[T]`'s 'container ops'
# this also opens the door on specialization based on the contained type and overriding might be feasible

proc `=takeOwnership`(s: seq[Uri], u: Uri) = # ... specialized impl

I'm not sure about the actual events/hooks we want, but what I'm somewhat certain about is type ops triggered and called on the same type is 'off'. To clarify, Uri could have container ops of its own under this scheme, but they'd apply to the pile of string fields that are in its type definition and not the Uri type itself.


I had separately started thinking about triggered ops on containers before and I was coming at it from a DOD perspective. That further corroborated that type ops on the actual type aren't as useful wrt correctness or saving work, below is a snippet I was using to think about this.

Using the below example, if we specialized the Component types with type ops we'd end up with something "wrong". Those types would start having to know where and how they're stored, plus others couldn't reuse the components and associated systems easily (poor composition).

type
  Position* = object
    x*, y*: int64
  Movement* = object
    dx*, dy*, ddx*, ddy*: float32
  Component* = Position | Movement
  EntityData = object
  Store* = object
    allocator: Allocator
    entities*: seq[EntityData]
    positions*: seq[Position]
    movements*: seq[Movement]
proc `=alloc`[T=Component or EntitData](s: Store, t: typedesc[T])
proc `=dealloc`[T=Component or EntitData](s: Store, a: T)
proc `=transferInto`[T=Component or EntitData](s: Store, a: T)
proc `=copyInto`[T=Component or EntitData](s: Store, a: T)

The last bit I should add is that we effectively do this sort of logic for the stack, the push/pop of activation records, lifting of envs and associated allocs/deallocs, etc. All that is more call stack/container focused, that's what's being bumped, that's the ever present context. If we were actually modelling a call stack, we wouldn't want to put the type ops on each call frame type (that's a lot of object types 😆), we'd want activation record lifecycle events to trigger container ops on the call stack (container).

Anyhow, I hope that makes some sense.

compiler/mir/mirpasses.nim Outdated Show resolved Hide resolved

var i = C(i: 1)
when nimvm: # XXX: doesn't work yet
discard
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gave me an idea, maybe we should add a doAssertKnownIssue that'll work with inverted behaviour.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, I've come across multiple cases were that would have been useful.

One bug was that the wrong nodes were compared (`^1` was used instead of
`^i`), and while this doesn't necessarily lead to incorrect behaviour,
it does lead to overlap being detected where there is none.

The second was with the node position for array element comparisons
being wrong.
compiler/mir/mirpasses.nim Outdated Show resolved Hide resolved
Co-authored-by: Clyybber <darkmine956@gmail.com>
@saem
Copy link
Collaborator

saem commented Jul 31, 2023

/merge

@github-actions
Copy link

Merge requested by: @saem

Contents after the first section break of the PR description has been removed and preserved below:


Notes for Reviewers

@chore-runner chore-runner bot added this pull request to the merge queue Jul 31, 2023
Merged via the queue into nim-works:devel with commit df0aaf3 Jul 31, 2023
18 checks passed
@zerbina zerbina deleted the call-argument-fixup-as-mir-pass branch July 31, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compiler/backend Related to backend system of the compiler simplification Removal of the old, unused, unnecessary or un/under-specified language features.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants