Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type annotations for early stage compiler functions #1504

Merged
merged 9 commits into from
Jul 2, 2019

Conversation

davesque
Copy link
Contributor

What I did

Added type annotations for a number of functions encountered during the first stages of invoking the vyper binary. Also did some refactors to facilitate type checking and annotation.

How to verify it

Run the tests. Reviewers might also find it helpful to look at each commit individually instead of the entire branch diff.

Description for the changelog

  • The API of vyper.compile_codes has been simplified to always return an OrderedDict mapping vyper code file names to their respective compilation results. This means that this function no longer accepts an output_type keyword argument. Users wishing to duplicate the functionality of passing output_type='list' should convert the results of this function to a list like so: list(compile_codes(...).values()).

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@davesque davesque added the work in progress Work on this PR or issue is not yet complete but reviewers are free to add their input for guidance. label Jun 27, 2019
@davesque
Copy link
Contributor Author

davesque commented Jun 27, 2019

Actually, I'll just say that this PR is ready for review. I have some other changes that build on top of this but I don't want people to feel like they have to read through too many commits. I'll make those as another PR.

@davesque davesque removed the work in progress Work on this PR or issue is not yet complete but reviewers are free to add their input for guidance. label Jun 27, 2019
@davesque davesque changed the title [WIP] Type annotations for early stage compiler functions Type annotations for early stage compiler functions Jun 27, 2019
@jacqueswww
Copy link
Contributor

jacqueswww commented Jun 27, 2019

I am afraid you'll have to revert the compile_codes interface work as it's now a solidified compiler interface for tools to use and the change is not backwards compatible.
Looks good for the annotations and replacements with compile_code where we could use that.

bin/vyper Outdated

def uniq(seq: Iterable[T]) -> List[T]:
"""
Returns unique items in ``seq`` in order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: We use imperative present tense among the rest of the Python libraries and I think this should also follow this. (Same as Python docs and many others)

https://github.com/ethereum/snake-charmers-tactical-manual/blob/master/documentation.md#grammar

Return unique items in seq in order

As a rule of thumb, imperative present tense comes out when it fits in this:

If we call this API it will: __________________________

bin/vyper Outdated
T = TypeVar('T')


def uniq(seq: Iterable[T]) -> List[T]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware this has nothing to do with this PR and should not be addressed here but it would probably be a good idea to not turn the Iterable[T] into a List[T] and instead do one of these things:

  1. Either just yield the items so that we return an Iterable[T] and have consumers deal with conversion if needed
    or
  2. Return a Tuple[T, ...] instead of a List[T] to not return something that is mutable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cburgdorf I like the yielding suggestion, thanks!

Copy link
Contributor

@cburgdorf cburgdorf Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Minor further suggestion. You could change the type annotation from Generator[T, None, None] to Iterable[T] which is basically a subset of a generator and looks a bit cleaner imho. Basically our convention in Trinity is to use Iterator[T] instead of Generator[T, None, None] whenever you are only interested in the yield type but not in the send type or return type (e.g. not using generator.send(something))

@davesque
Copy link
Contributor Author

davesque commented Jun 27, 2019

@jacqueswww Can you give me an example of why the API of compile_codes can't be changed? Vyper is pre-release software and I would assume these kinds of changes are still on the table. It also seems like compile_codes would be used far less often than command-line invocation of the vyper script. Is its use common enough that changing it is difficult?

Practically speaking, having a function return one type or another depending on the input makes working with mypy difficult. Also, having the API work that way makes it harder to remember how the code behaves.

@jacqueswww
Copy link
Contributor

jacqueswww commented Jun 27, 2019

@davesque vyper-debug and vyper online off the top of my head. Last time we changed it many things ended up breaking and people popped on gitter that we had to handle on a case by case basis. I agree it's better to do it the above way, however I don't see mypy as worth breaking backwards compatibility change i.e. I don't see it as critical to have OrderedDict vs. having to find all projects that use it this way currently.

@fubuloubu
Copy link
Member

Can we ignore it for now?

@davesque
Copy link
Contributor Author

@jacqueswww @fubuloubu Perhaps we can raise a deprecation warning if the arg is provided? I don't agree that mypy is not worth making backwards incompatible changes. It helps tremendously with code readability and is a very worthy endeavor, even if there are some growing pains in adopting it.

@fubuloubu
Copy link
Member

Is there a way to have it support both methods if that is the case?

@cburgdorf
Copy link
Contributor

I don't agree that mypy is not worth making backwards incompatible changes. It helps tremendously with code readability and is a very worthy endeavor, even if there are some growing pains in adopting it.

I can't comment on the specific case (other than that functions with variable return types make me sad 😅 ) but I 💯 % agree that adopting mypy is absolutely worth it. In the long run, it make the code base much more robust. When we started adopting mypy in Trinity we've found plenty of small bugs just by adding type hints! Also it makes refactoring code so much simpler and safer. Mypy does catch lots of things that are easy to go unnoticed otherwise.

@fubuloubu
Copy link
Member

I think it may make sense to adopt the change here, but in a separate PR and perhaps alongside the suggestion to add the vyper-json script that I now realize doesn't have an issue for it.

Those impacted by the change would probably be happy to find that we added a new script to help them with their use case.

@jacqueswww
Copy link
Contributor

jacqueswww commented Jun 27, 2019

I think you guys are misunderstanding what I am saying. I am not saying mypy shouldn't be adopted, I am saying that a single use case that is difficult to annotate is, should rather be difficult to annotate than force backwards incompatibility. I see compatibility to outwards facing APIs as critical to the usage of vyper, we can make the switch but will have to be deprecation warning first for b11 and then disabled in b12.

We changed the interface before and it was painful, just trying to learn from previous mistakes.

@jacqueswww
Copy link
Contributor

jacqueswww commented Jun 27, 2019

@cburgdorf @davesque

OK (bear with me here) so I am not sure why in this specific case it's a problem and am trying to understand something:
I ask the the function to provide me either of 2 output types: a list or a dict format. From the calling perspective it's very clear what going on, and this is even clearer than a type annotation because it happens at point of calling. Looking over the function again think the only change I would make is make output type a mandatory kwarg parameter. But if you have function that says compile_codes(codes, output_type='dict') or perhaps a function that you pass formatter into render(result, formater=Formatter) Why not expect different types as a result?

Why is it a bad practice to return different types from a function, if you ask for that type at calling time? I can see the argument if the output is not dependent on the call parameters - as you have no idea how to handle a call. But being able to this in python is great, and to me has a lot of utility.

@jacqueswww
Copy link
Contributor

Good news: I scanned most implementations I could find calling compile_codes and they don't use list type, so we can probably just #yolo it.

@davesque
Copy link
Contributor Author

davesque commented Jun 27, 2019

@jacqueswww That's convenient if true. Are you remembering that the default result type was list? I ask because I also reviewed a number of call sites for compile_codes in the vyper codebase. Then, I realized that I'd forgotten that the default return type was list and so I had to go back and re-examine everything.

As for the exact reason why it's not convenient, here's a commit that causes mypy to complain (I didn't give an annotation for the interface_codes arg because I'm still trying to figure out the best way to do that): davesque@f36931b .

Here's one of the errors that mypy throws that relates to the return statement in the implementation of compile_code:

vyper/compiler.py:271: error: No overload variant of "__getitem__" of "list" matches argument type "str"
vyper/compiler.py:271: note: Possible overload variants:
vyper/compiler.py:271: note:     def __getitem__(self, int) -> Dict[str, Any]
vyper/compiler.py:271: note:     def __getitem__(self, slice) -> List[Dict[str, Any]]

The problem is that the return type could be a list or a dict, so it has to be a Union of those two types. But then mypy correctly complains that a str value isn't a correct value to use as an index into the result of compile_codes because the result could be a list and a str wouldn't work. So mypy is requiring that every type that is part of that union type will implement a __getitem__ that accepts strings. I think you can fix things like that by adding an isinstance check just above the return statement but it's better to avoid doing that everywhere.

Those sorts of problems will come up for any kind of function that potentially returns different types. I think it's good that mypy complains about stuff like this because it highlights an API that is sort of ambiguous and potentially a source of bugs. A possibly better approach would be to just have two different, appropriately named, functions that return results with the desired structure.

As for functions which take formatters as args, I guess I would try to design such an API so that the formatter only modifies the structure of the result within a single type. So a function which does string templating or something might accept a formatter which could build the result differently, but the result would still be a string either way.

@jacqueswww
Copy link
Contributor

jacqueswww commented Jun 27, 2019

Yes so far I can trace most use the compile_code will check tomorrow again (a bit late here and could've checked wrong).

@cburgdorf
Copy link
Contributor

I ask the the function to provide me either of 2 output types: a list or a dict format. From the calling perspective it's very clear what going on, and this is even clearer than a type annotation because it happens at point of calling

You are generally right that reading compile_codes(codes, output_type='dict') is clear to a human reader.

However, it isn't clear enough to be useful to perform static analyzing on it. A static analyzer can't tell the difference between compile_codes(codes, output_type='dict') or compile_codes(codes, output_type='dicct').

Sure compile_codes will throw an exception if output_type is not with the expected values but this is a runtime thing. Mypy is all about catching errors during development time.

However, there is a way to have a function return different types based on its inputs which is entirely type safe and statically analyzable. E.g. you can have a function parse_to_unicorn that returns a type T where the actual type behind T changes based on the input parameter parser.

def parse_to_unicorn(raw: bytes, parser: UnicornParser[T]) -> T:
    return parser.parse(raw)

But because this code uses generics, mypy knows exactly the return type of parse_to_unicorn at any time based on the parameters it was called with.

Here's the entire code for that.

from abc import ABC, abstractmethod
from typing import Generic, Tuple, TypeVar


T = TypeVar("T")


class UnicornParser(Generic[T], ABC):
    @abstractmethod
    def parse(self, raw: bytes) -> T:
        ...


class TupleUnicornParser(UnicornParser[Tuple[str, ...]]):
    def parse(self, raw: bytes) -> Tuple[str, ...]:
        return ("u", "n", "i", "c", "o", "r", "n")


class StringUnicornParser(UnicornParser[str]):
    def parse(self, raw: bytes) -> str:
        return "unicorn"


def parse_to_unicorn(raw: bytes, parser: UnicornParser[T]) -> T:
    return parser.parse(raw)


first_unicorn = parse_to_unicorn(b"", TupleUnicornParser())

# mypy knows that `first_unicorn` is of type `Tuple[str, ...]`

print(first_unicorn)  # prints: ('u', 'n', 'i', 'c', 'o', 'r', 'n')

second_unicorn = parse_to_unicorn(b"", StringUnicornParser())

# mypy knows that `second_unicorn` is of type `str`

print(second_unicorn)  # prints: unicorn

# mypy rightfully rejects this with: `Unsupported operand types for + ("Tuple[str, ...]" and "str")
# first_unicorn + second_unicorn

@jacqueswww
Copy link
Contributor

jacqueswww commented Jun 28, 2019

@cburgdorf Looking at the above example - does this mean that the only way to return different types in the mypy sphere is by creating extra classes using generics?

How would it work if I wanted to pass a function pointer instead of a class in the above example? ie. it's after all a class with only one function.

@cburgdorf
Copy link
Contributor

Looking at the above example - does this mean that the only way to return different types in the mypy sphere is by creating extra classes using generics?

Nope, generics work both for classes and functions. I just did it class based to show case a more complete example.

I'm with you that classes with single functions are generally a smell and are often better to be replaced with standalone functions.

The generics work out the same.

from typing import Callable, Tuple, TypeVar


T = TypeVar("T")


def tuple_unicorn_parser(raw: bytes) -> Tuple[str, ...]:
    return ("u", "n", "i", "c", "o", "r", "n")


def string_unicorn_parser(raw: bytes) -> str:
    return "unicorn"


def parse_to_unicorn(raw: bytes, parser: Callable[[bytes], T]) -> T:
    return parser(raw)


first_unicorn = parse_to_unicorn(b"", tuple_unicorn_parser)

# mypy knows that `first_unicorn` is of type `Tuple[str, ...]`

print(first_unicorn)  # prints: ('u', 'n', 'i', 'c', 'o', 'r', 'n')

second_unicorn = parse_to_unicorn(b"", string_unicorn_parser)

# mypy knows that `second_unicorn` is of type `str`

print(second_unicorn)  # prints: unicorn

# mypy rightfully rejects this with: `Unsupported operand types for + ("Tuple[str, ...]" and "str")
# first_unicorn + second_unicorn

@jacqueswww
Copy link
Contributor

Ok that's great :)

@davesque
Copy link
Contributor Author

@jacqueswww @cburgdorf Yeah, I should have mentioned generics in the context of the formatter example.

@davesque
Copy link
Contributor Author

davesque commented Jul 1, 2019

@jacqueswww Just wanted to ping you on this. Still feel like doing a #yolo on it?

@jacqueswww
Copy link
Contributor

Yes compile_code output remains the same so we can go ahead.

@davesque
Copy link
Contributor Author

davesque commented Jul 2, 2019

I believe I also included all of @cburgdorf 's suggestions. I'll go ahead and merge this.

@davesque davesque merged commit 18d140f into vyperlang:master Jul 2, 2019
@davesque davesque deleted the type-checking branch July 3, 2019 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants