Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow reassignment of function parameters #2128

Closed
MadLittleMods opened this issue Jul 31, 2021 · 10 comments
Closed

Disallow reassignment of function parameters #2128

MadLittleMods opened this issue Jul 31, 2021 · 10 comments
Labels
rule request Adding a new rule

Comments

@MadLittleMods
Copy link

MadLittleMods commented Jul 31, 2021

Rule request

Thesis

Add a lint rule similar to ESlint no-param-reassign for JavaScript which disallows reassignment of function parameters.

Assignment to variables declared as function parameters can be misleading and lead to confusing behavior, as modifying function parameters will also mutate the arguments object. Often, assignment to function parameters is unintended and indicative of a mistake or programmer error.

https://eslint.org/docs/rules/no-param-reassign

Why? Manipulating objects passed in as parameters can cause unwanted variable side effects in the original caller.

https://airbnb.io/javascript/#functions--mutate-params

It's also about the clarity of mutating and reassigning - it's easier to understand functions when the arguments are static and constant throughout the life of the function.

eslint/archive-website#686 (comment)
Related discussion, eslint/eslint#5306

When a function parameter gets re-assigned, it masks the original argument passed in. With a sufficiently long function, it's not obvious that the variable was assigned to something different than what you expect from quickly reading the function signature then jumping down to the relevant piece you're interested in. When this occurs, it's probably more the case, that you accidentally used the same variable name which is why it should be disallowed.

Even if re-assigning the function parameter was intended, it would probably be more clear to assign a new local variable and maybe make a copy to avoid modifying the underlying object:

def example(context):
  new_context = context
  if True:
    new_context = compute_context()
  • shadowing
  • masking
  • overriding
  • overwriting

Reasoning

There are similar violations already in the best practices section but they don't cover the following scenario.


Recently ran into a real-life bug because of accidentally re-assigning a function parameter, https://github.com/matrix-org/synapse/pull/10439/files/a94217ee34840237867d037cf1133f3a9bf6b95a

Simplified reproduction case:

def ensure_correct_context(event, context):
  remote_auth_chain = get_event_auth(event)
  for auth_event in remote_auth_chain:
    # XXX: Violation should trigger here!
    # We are introducing a bug because `context` which corresponds to `event`
    # is being re-assigned to the `context` for the `auth_event`
    context = compute_event_context(auth_event)
    persist_event(auth_event, context)

  # logic for context_needs_updating

  if context_needs_updating:
    return compute_event_context(event)

  return context

Fixed code:

def ensure_correct_context(event, context):
  remote_auth_chain = get_event_auth(event)
  for auth_event in remote_auth_chain:
    auth_event_context = compute_event_context(auth_event)
    persist_event(auth_event, auth_event_context)

  # logic for context_needs_updating

  if context_needs_updating:
    return compute_event_context(event)

  return context

This rule wouldn't protect from a similar scenario where context is a local variable because Python does not have block scope and there is no distinction between declaration and assignment so we can't be protected that way either. I don't think we can solve this scenario with lints unless the linter wants to interpret Python and enforce block-scope with a unique variable name constraint (not suggesting this).

def ensure_correct_context(event):
  context = compute_event_context(event)
  
  remote_auth_chain = get_event_auth(event)
  for auth_event in remote_auth_chain:
    # XXX: Bug is here but no violation would be triggered
    context = compute_event_context(auth_event)
    persist_event(auth_event, context)

  # logic for context_needs_updating

  if context_needs_updating:
    return compute_event_context(event)

  return context

Comparing to JavaScript

How the same code would behave in JavaScript

This isn't trying to claim JavaScript vs Python. I'm just trying to document what I am familiar with and come up with some lints we can use for Python to avoid the same scenarios.

Converting that problem Python code to JavaScript, it works as expected even though the function parameter and variable in the loop are both named context because JavaScript is block-scoped. The const usage can also be enforced with ESLints prefer-const rule.

'use strict';

function get_event_auth() { return [1,2,3]; }
function compute_event_context(event) { return { foo: event } }
function persist_event() {}

function ensure_correct_context(event, context) {
  const remote_auth_chain = get_event_auth(event);
  for (const auth_event of remote_auth_chain) {
    const context = compute_event_context(auth_event);
    persist_event(auth_event, context);
  }

  // logic for context_needs_updating
  const context_needs_updating = false;

  if (context_needs_updating) {
    return compute_event_context(event);
  }

  return context;
}

ensure_correct_context(999, { foo: 999 })
// -> {foo: 999}
// ✅ expected result

If we forget the const, ESLint warns us about re-assigning a function parameter thanks to the ESlint no-param-reassign rule.

(ESLint demo)

'use strict';

function get_event_auth() { return [1,2,3]; }
function compute_event_context(event) { return { foo: event } }
function persist_event() {}

function ensure_correct_context(event, context) {
  const remote_auth_chain = get_event_auth(event);
  for (const auth_event of remote_auth_chain) {
    context = compute_event_context(auth_event);
    persist_event(auth_event, context);
  }

  // logic for context_needs_updating
  const context_needs_updating = false;

  if (context_needs_updating) {
    return compute_event_context(event);
  }

  return context;
}

ensure_correct_context(999, { foo: 999 })
// -> ESLint: 10:5 - Assignment to function parameter 'context'.
// ✅ The linter caught our mistake

If we change the example where the context is already a local variable in the function instead of a function parameter, it still works because of block scope.

'use strict';

function get_event_auth() { return [1,2,3]; }
function compute_event_context(event) { return { foo: event } }
function persist_event() {}

function ensure_correct_context(event) {
  const context = compute_event_context(event);

  const remote_auth_chain = get_event_auth(event);
  for (const auth_event of remote_auth_chain) {
    const context = compute_event_context(auth_event);
    persist_event(auth_event, context);
  }

  // logic for context_needs_updating
  const context_needs_updating = false;

  if (context_needs_updating) {
    return get_new_context_for_event(event);
  }

  return context;
}

ensure_correct_context(999)
// -> {foo: 999}
// ✅ expected result

And finally, if we forget the const, on the second context, we see Uncaught TypeError: Assignment to constant variable. because of the nice const guarantees (can't be reassigned or redeclared).

'use strict';

function get_event_auth() { return [1,2,3]; }
function compute_event_context(event) { return { foo: event } }
function persist_event() {}

function ensure_correct_context(event) {
  const context = compute_event_context(event);

  const remote_auth_chain = get_event_auth(event);
  for (const auth_event of remote_auth_chain) {
    context = compute_event_context(auth_event);
    persist_event(auth_event, context);
  }

  // logic for context_needs_updating
  const context_needs_updating = false;

  if (context_needs_updating) {
    return get_new_context_for_event(event);
  }

  return context;
}

ensure_correct_context(999)
// -> Uncaught TypeError: Assignment to constant variable.
// ✅ The language mechanics caught our mistake
@MadLittleMods MadLittleMods added the rule request Adding a new rule label Jul 31, 2021
@MadLittleMods MadLittleMods changed the title Disallow Reassignment of Function Parameters Disallow reassignment of function parameters Jul 31, 2021
@sobolevn
Copy link
Member

sobolevn commented Jul 31, 2021

Oh wow! Great issue description! Thank you!

I have one specific usecase in mind that we need to cover / explain:

def some(a: Union[List[str], None] = None):
    if a is None:
         a = []

What do you think about it? It is very common.

@MadLittleMods
Copy link
Author

MadLittleMods commented Aug 2, 2021

I looked back at the original ESLint rule proposal and someone asked a similar question, eslint/eslint#1599 (comment) but it was just described as a code smell with no proper code alternative suggestion.

The AirBnB JavaScript style guide also inherits no-param-reassign and looking at their examples, they suggest using the default parameter syntax where possible, https://airbnb.io/javascript/#es6-default-parameters

// really bad
function handleThings(opts) {
  // No! We shouldn’t mutate function arguments.
  // Double bad: if opts is falsy it'll be set to an object which may
  // be what you want but it can introduce subtle bugs.
  opts = opts || {};
  // ...
}

// still bad
function handleThings(opts) {
  if (opts === void 0) {
    opts = {};
  }
  // ...
}

// good
function handleThings(opts = {}) {
  // ...
}

Or assign the parameter to a new local variable, https://airbnb.io/javascript/#functions--reassign-params

// bad
function f1(a) {
  a = 1;
  // ...
}

function f2(a) {
  if (!a) { a = 1; }
  // ...
}

// good
function f3(a) {
  const b = a || 1;
  // ...
}

function f4(a = 1) {
  // ...
}

The first option is that you could use the # noqa: inline ignoring syntax which makes the usage explicit that you actually want to do it.

For your example, it would probably easiest to put the default value in the function signature:

def some(a: Union[List[str], None] = []):
    pass

some() # -> []
some(None) # -> []
some(["a", "b", "c"]) # -> ["a", "b", "c"]

But I realize there is more validation and default fetching code in real-life scenarios. Here is a real example that I've refactored not to violate the rule,

Original: synapse/events/builder.py#L104-L130:

async def build(
    self,
    prev_event_ids: List[str],
    auth_event_ids: Optional[List[str]],
    depth: Optional[int] = None,
) -> EventBase:
    if auth_event_ids is None:
        state_ids = await self._state.get_current_state_ids(
            self.room_id, prev_event_ids
        )
        auth_event_ids = self._event_auth_handler.compute_auth_events(
            self, state_ids
        )

    # other code...

Updated: might be better to name the parameter as input_auth_event_ids, then create a local auth_event_ids variable that has all of the extra logic done to it.

async def build(
    self,
    prev_event_ids: List[str],
    input_auth_event_ids: Optional[List[str]],
    depth: Optional[int] = None,
) -> EventBase:
    auth_event_ids = input_auth_event_ids
    if auth_event_ids is None:
        state_ids = await self._state.get_current_state_ids(
            self.room_id, prev_event_ids
        )
        auth_event_ids = self._event_auth_handler.compute_auth_events(
            self, state_ids
        )

    # other code...

In this case, I'm really wishing I could also use a const declaration from JavaScript so we could further avoid an accidental re-assignment error like I was seeing in the bug that spawned this issue in the first. We kinda just moved the problem from re-assigning the function parameter to a local variable which we can't track. This feels more like a Python language limitation we can't avoid though.

JavaScript example using const authEventIds:

async function build(prevEventIds, inputAuthEventIds, depth) {
    let computedAuthEventIds;
    if (!inputAuthEventIds) {
        state_ids = await _state.get_current_state_ids(
            self.room_id, prev_event_ids
        );
        computedAuthEventIds = _event_auth_handler.compute_auth_events(
            self, state_ids
        );
    }

    const authEventIds = inputAuthEventIds || computedAuthEventIds;

    // other code...
}

This new rule I'm proposing would catch the original bug from the issue description but that's just because we don't need any of this fancy defaulting logic for it.

@MadLittleMods
Copy link
Author

MadLittleMods commented Aug 2, 2021

I've just learned about PEP 591 which adds typing.Final and a @typing.final decorator!

This works great for patching over the last loophole of re-assigning the local variable unexpectedly!

from typing import Final

async def build(
    self,
    prev_event_ids: List[str],
    input_auth_event_ids: Optional[List[str]],
    depth: Optional[int] = None,
) -> EventBase:
    if input_auth_event_ids is None:
        state_ids = await self._state.get_current_state_ids(
            self.room_id, prev_event_ids
        )
        computed_auth_event_ids = self._event_auth_handler.compute_auth_events(
            self, state_ids
        )
   
    auth_event_ids: Final = input_auth_event_ids if input_auth_event_ids is None else computed_auth_event_ids

    # other code...

In terms of using final to protect function parameters themselves, it looks like it doesn't work. So the lint rule proposed by the issue would still be useful!

Final can’t be used in annotations for function arguments:

https://mypy.readthedocs.io/en/stable/final_attrs.html#details-of-using-final

@sobolevn
Copy link
Member

sobolevn commented Aug 3, 2021

For your example, it would probably easiest to put the default value in the function signature:

def some(a: Union[List[str], None] = []):
    pass

some() # -> []
some(None) # -> []
some(["a", "b", "c"]) # -> ["a", "b", "c"]

This is not the best practice, see: https://florimond.dev/en/posts/2018/08/python-mutable-defaults-are-the-source-of-all-evil/

This use-case is the only concern I have for now, we need to figure out how to handle it.

@MadLittleMods
Copy link
Author

MadLittleMods commented Aug 3, 2021

This is not the best practice, see: https://florimond.dev/en/posts/2018/08/python-mutable-defaults-are-the-source-of-all-evil/

Thanks for the great context around this! Such an easy pitfall.

We could do the same parameter rename, inputA, _a (pick your favorite naming scheme)

def some(inputA: Union[List[str], None] = None):
    if inputA is None:
         a = []

There is a related issue where someone wanted to allow re-assigning if it had the default value: eslint/eslint#14189. Maybe we could just allow re-assigning if the default is None since that only allows the mutable scenario to slip through. Can we go further and only allow under single statement if-blocks that do the x is None?

Or as already mentioned, an explicit allow with # noqa: inline ignoring syntax.

def some(a: Union[List[str], None] = None):
    if a is None:
         # noqa: WPS4XX
         a = []

Do you have any preferences/suggestions?

@sobolevn
Copy link
Member

sobolevn commented Aug 3, 2021

We could do the same parameter rename

We can't do it. Here are two example:

  1. With methods, when we subclass things
class NotOurClass(object):
    def method(self, a: Union[List[str], None] = None):
        """default impl"""


class OurClass(NotOurClass):
    def method(self, a: Union[List[str], None] = None):
        if a is None:
            a = []
        # our own impl
  1. With functions that are using named arguments, where name is an important part of the contract

noqa is also not an option here 😞
Because this pattern is really common and it is something you can hardly replace.

@MadLittleMods
Copy link
Author

I'm running out of my known ideas here. Do you see any way out?

Is this one an option?

There is a related issue where someone wanted to allow re-assigning if it had the default value: eslint/eslint#14189. Maybe we could just allow re-assigning if the default is None since that only allows the mutable scenario to slip through. Can we go further and only allow under single statement if-blocks that do the x is None?

@sobolevn
Copy link
Member

sobolevn commented Aug 3, 2021

There are other corner cases:

So, I really like this idea, but I need to think about the possible limitations.
I will try to ask mypy team about Final type on arguments. Maybe this is the way to go.

@MadLittleMods
Copy link
Author

I will try to ask mypy team about Final type on arguments. Maybe this is the way to go.

Sounds great, please link any discussions 🙂

I was curious on why the limitation was there in the first place but all I have found is from the PEP 591 PR on GitHub. I'm not finding the pre discussion in the mailing list about it though (mailing list noob).

Two more things worth mentioning in rejected ideas: [...]

  1. allow Final in function argument annotations

For the second point, it would cause too much confusion with function not mutating the argument.

-- python/peps#990 (comment)

I'm not sure what is meant there 🤔

Related mailing list threads I could find but not what I was after:

@MadLittleMods
Copy link
Author

Created python/mypy#11076 to propose it in on the mypy side. Also found a bit more discussion there around the original limitation.

In mypy, I could find that Final was added in python/mypy#5522 which mentions:

I don't allow Final in function argument types. One argument is simplicity, another is I didn't see many bugs related to shadowing an argument in function bodies, finally people might have quite different expectations for this. If people will ask, this would be easy to implement.

-- python/mypy#5522

In a previous revision of the documentation, it mentioned this example:

  • Final can be only used as an outermost type in assignments, using it in
    any other position is an error. In particular, Final can't be used in
    annotations for function arguments because this may cause confusions about
    what are the guarantees in this case:
: List[Final[int]] = []  # Error!
def fun(x: Final[List[int]]) ->  None:  # Error!

-- https://github.com/python/mypy/pull/5522/files/c9abd9db835240962467a9ee164f4bbb50d56545#diff-3b595f6f83800d78a304cf528ed39aff352c8cd8282edda26bafced79646baad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rule request Adding a new rule
Projects
None yet
Development

No branches or pull requests

2 participants