Type annotations for early stage compiler functions #1504

davesque · 2019-06-27T03:37:12Z

What I did

Added type annotations for a number of functions encountered during the first stages of invoking the vyper binary. Also did some refactors to facilitate type checking and annotation.

How to verify it

Run the tests. Reviewers might also find it helpful to look at each commit individually instead of the entire branch diff.

Description for the changelog

The API of vyper.compile_codes has been simplified to always return an OrderedDict mapping vyper code file names to their respective compilation results. This means that this function no longer accepts an output_type keyword argument. Users wishing to duplicate the functionality of passing output_type='list' should convert the results of this function to a list like so: list(compile_codes(...).values()).

Cute Animal Picture

Improves readability and passes mypy.

davesque · 2019-06-27T03:55:14Z

Actually, I'll just say that this PR is ready for review. I have some other changes that build on top of this but I don't want people to feel like they have to read through too many commits. I'll make those as another PR.

jacqueswww · 2019-06-27T08:03:11Z

I am afraid you'll have to revert the compile_codes interface work as it's now a solidified compiler interface for tools to use and the change is not backwards compatible.
Looks good for the annotations and replacements with compile_code where we could use that.

cburgdorf · 2019-06-27T08:54:52Z

bin/vyper

+
+def uniq(seq: Iterable[T]) -> List[T]:
+    """
+    Returns unique items in ``seq`` in order.


Nitpick: We use imperative present tense among the rest of the Python libraries and I think this should also follow this. (Same as Python docs and many others)

https://github.com/ethereum/snake-charmers-tactical-manual/blob/master/documentation.md#grammar

Return unique items in seq in order

As a rule of thumb, imperative present tense comes out when it fits in this:

If we call this API it will: __________________________

cburgdorf · 2019-06-27T08:58:21Z

bin/vyper

+T = TypeVar('T')
+
+
+def uniq(seq: Iterable[T]) -> List[T]:


I'm aware this has nothing to do with this PR and should not be addressed here but it would probably be a good idea to not turn the Iterable[T] into a List[T] and instead do one of these things:

Either just yield the items so that we return an Iterable[T] and have consumers deal with conversion if needed
or

Return a Tuple[T, ...] instead of a List[T] to not return something that is mutable.

@cburgdorf I like the yielding suggestion, thanks!

Great! Minor further suggestion. You could change the type annotation from Generator[T, None, None] to Iterable[T] which is basically a subset of a generator and looks a bit cleaner imho. Basically our convention in Trinity is to use Iterator[T] instead of Generator[T, None, None] whenever you are only interested in the yield type but not in the send type or return type (e.g. not using generator.send(something))

davesque · 2019-06-27T13:54:51Z

@jacqueswww Can you give me an example of why the API of compile_codes can't be changed? Vyper is pre-release software and I would assume these kinds of changes are still on the table. It also seems like compile_codes would be used far less often than command-line invocation of the vyper script. Is its use common enough that changing it is difficult?

Practically speaking, having a function return one type or another depending on the input makes working with mypy difficult. Also, having the API work that way makes it harder to remember how the code behaves.

jacqueswww · 2019-06-27T14:29:49Z

@davesque vyper-debug and vyper online off the top of my head. Last time we changed it many things ended up breaking and people popped on gitter that we had to handle on a case by case basis. I agree it's better to do it the above way, however I don't see mypy as worth breaking backwards compatibility change i.e. I don't see it as critical to have OrderedDict vs. having to find all projects that use it this way currently.

fubuloubu · 2019-06-27T14:55:06Z

Can we ignore it for now?

davesque · 2019-06-27T15:01:05Z

@jacqueswww @fubuloubu Perhaps we can raise a deprecation warning if the arg is provided? I don't agree that mypy is not worth making backwards incompatible changes. It helps tremendously with code readability and is a very worthy endeavor, even if there are some growing pains in adopting it.

fubuloubu · 2019-06-27T16:04:50Z

Is there a way to have it support both methods if that is the case?

cburgdorf · 2019-06-27T19:44:20Z

I don't agree that mypy is not worth making backwards incompatible changes. It helps tremendously with code readability and is a very worthy endeavor, even if there are some growing pains in adopting it.

I can't comment on the specific case (other than that functions with variable return types make me sad 😅 ) but I 💯 % agree that adopting mypy is absolutely worth it. In the long run, it make the code base much more robust. When we started adopting mypy in Trinity we've found plenty of small bugs just by adding type hints! Also it makes refactoring code so much simpler and safer. Mypy does catch lots of things that are easy to go unnoticed otherwise.

fubuloubu · 2019-06-27T19:52:18Z

I think it may make sense to adopt the change here, but in a separate PR and perhaps alongside the suggestion to add the vyper-json script that I now realize doesn't have an issue for it.

Those impacted by the change would probably be happy to find that we added a new script to help them with their use case.

jacqueswww · 2019-06-27T20:14:22Z

I think you guys are misunderstanding what I am saying. I am not saying mypy shouldn't be adopted, I am saying that a single use case that is difficult to annotate is, should rather be difficult to annotate than force backwards incompatibility. I see compatibility to outwards facing APIs as critical to the usage of vyper, we can make the switch but will have to be deprecation warning first for b11 and then disabled in b12.

We changed the interface before and it was painful, just trying to learn from previous mistakes.

jacqueswww · 2019-06-27T20:32:06Z

@cburgdorf @davesque

OK (bear with me here) so I am not sure why in this specific case it's a problem and am trying to understand something:
I ask the the function to provide me either of 2 output types: a list or a dict format. From the calling perspective it's very clear what going on, and this is even clearer than a type annotation because it happens at point of calling. Looking over the function again think the only change I would make is make output type a mandatory kwarg parameter. But if you have function that says compile_codes(codes, output_type='dict') or perhaps a function that you pass formatter into render(result, formater=Formatter) Why not expect different types as a result?

Why is it a bad practice to return different types from a function, if you ask for that type at calling time? I can see the argument if the output is not dependent on the call parameters - as you have no idea how to handle a call. But being able to this in python is great, and to me has a lot of utility.

jacqueswww · 2019-06-27T20:54:35Z

Good news: I scanned most implementations I could find calling compile_codes and they don't use list type, so we can probably just #yolo it.

davesque · 2019-06-27T21:41:36Z

@jacqueswww That's convenient if true. Are you remembering that the default result type was list? I ask because I also reviewed a number of call sites for compile_codes in the vyper codebase. Then, I realized that I'd forgotten that the default return type was list and so I had to go back and re-examine everything.

As for the exact reason why it's not convenient, here's a commit that causes mypy to complain (I didn't give an annotation for the interface_codes arg because I'm still trying to figure out the best way to do that): davesque@f36931b .

Here's one of the errors that mypy throws that relates to the return statement in the implementation of compile_code:

vyper/compiler.py:271: error: No overload variant of "__getitem__" of "list" matches argument type "str"
vyper/compiler.py:271: note: Possible overload variants:
vyper/compiler.py:271: note:     def __getitem__(self, int) -> Dict[str, Any]
vyper/compiler.py:271: note:     def __getitem__(self, slice) -> List[Dict[str, Any]]

The problem is that the return type could be a list or a dict, so it has to be a Union of those two types. But then mypy correctly complains that a str value isn't a correct value to use as an index into the result of compile_codes because the result could be a list and a str wouldn't work. So mypy is requiring that every type that is part of that union type will implement a __getitem__ that accepts strings. I think you can fix things like that by adding an isinstance check just above the return statement but it's better to avoid doing that everywhere.

Those sorts of problems will come up for any kind of function that potentially returns different types. I think it's good that mypy complains about stuff like this because it highlights an API that is sort of ambiguous and potentially a source of bugs. A possibly better approach would be to just have two different, appropriately named, functions that return results with the desired structure.

As for functions which take formatters as args, I guess I would try to design such an API so that the formatter only modifies the structure of the result within a single type. So a function which does string templating or something might accept a formatter which could build the result differently, but the result would still be a string either way.

jacqueswww · 2019-06-27T21:52:32Z

Yes so far I can trace most use the compile_code will check tomorrow again (a bit late here and could've checked wrong).

cburgdorf · 2019-06-28T08:12:43Z

I ask the the function to provide me either of 2 output types: a list or a dict format. From the calling perspective it's very clear what going on, and this is even clearer than a type annotation because it happens at point of calling

You are generally right that reading compile_codes(codes, output_type='dict') is clear to a human reader.

However, it isn't clear enough to be useful to perform static analyzing on it. A static analyzer can't tell the difference between compile_codes(codes, output_type='dict') or compile_codes(codes, output_type='dicct').

Sure compile_codes will throw an exception if output_type is not with the expected values but this is a runtime thing. Mypy is all about catching errors during development time.

However, there is a way to have a function return different types based on its inputs which is entirely type safe and statically analyzable. E.g. you can have a function parse_to_unicorn that returns a type T where the actual type behind T changes based on the input parameter parser.

def parse_to_unicorn(raw: bytes, parser: UnicornParser[T]) -> T:
    return parser.parse(raw)

But because this code uses generics, mypy knows exactly the return type of parse_to_unicorn at any time based on the parameters it was called with.

Here's the entire code for that.

from abc import ABC, abstractmethod
from typing import Generic, Tuple, TypeVar


T = TypeVar("T")


class UnicornParser(Generic[T], ABC):
    @abstractmethod
    def parse(self, raw: bytes) -> T:
        ...


class TupleUnicornParser(UnicornParser[Tuple[str, ...]]):
    def parse(self, raw: bytes) -> Tuple[str, ...]:
        return ("u", "n", "i", "c", "o", "r", "n")


class StringUnicornParser(UnicornParser[str]):
    def parse(self, raw: bytes) -> str:
        return "unicorn"


def parse_to_unicorn(raw: bytes, parser: UnicornParser[T]) -> T:
    return parser.parse(raw)


first_unicorn = parse_to_unicorn(b"", TupleUnicornParser())

# mypy knows that `first_unicorn` is of type `Tuple[str, ...]`

print(first_unicorn)  # prints: ('u', 'n', 'i', 'c', 'o', 'r', 'n')

second_unicorn = parse_to_unicorn(b"", StringUnicornParser())

# mypy knows that `second_unicorn` is of type `str`

print(second_unicorn)  # prints: unicorn

# mypy rightfully rejects this with: `Unsupported operand types for + ("Tuple[str, ...]" and "str")
# first_unicorn + second_unicorn

jacqueswww · 2019-06-28T09:49:22Z

@cburgdorf Looking at the above example - does this mean that the only way to return different types in the mypy sphere is by creating extra classes using generics?

How would it work if I wanted to pass a function pointer instead of a class in the above example? ie. it's after all a class with only one function.

cburgdorf · 2019-06-28T10:08:43Z

Looking at the above example - does this mean that the only way to return different types in the mypy sphere is by creating extra classes using generics?

Nope, generics work both for classes and functions. I just did it class based to show case a more complete example.

I'm with you that classes with single functions are generally a smell and are often better to be replaced with standalone functions.

The generics work out the same.

from typing import Callable, Tuple, TypeVar


T = TypeVar("T")


def tuple_unicorn_parser(raw: bytes) -> Tuple[str, ...]:
    return ("u", "n", "i", "c", "o", "r", "n")


def string_unicorn_parser(raw: bytes) -> str:
    return "unicorn"


def parse_to_unicorn(raw: bytes, parser: Callable[[bytes], T]) -> T:
    return parser(raw)


first_unicorn = parse_to_unicorn(b"", tuple_unicorn_parser)

# mypy knows that `first_unicorn` is of type `Tuple[str, ...]`

print(first_unicorn)  # prints: ('u', 'n', 'i', 'c', 'o', 'r', 'n')

second_unicorn = parse_to_unicorn(b"", string_unicorn_parser)

# mypy knows that `second_unicorn` is of type `str`

print(second_unicorn)  # prints: unicorn

# mypy rightfully rejects this with: `Unsupported operand types for + ("Tuple[str, ...]" and "str")
# first_unicorn + second_unicorn

jacqueswww · 2019-06-28T10:37:07Z

Ok that's great :)

davesque · 2019-06-28T18:50:30Z

@jacqueswww @cburgdorf Yeah, I should have mentioned generics in the context of the formatter example.

davesque · 2019-07-01T17:20:42Z

@jacqueswww Just wanted to ping you on this. Still feel like doing a #yolo on it?

jacqueswww · 2019-07-01T17:26:04Z

Yes compile_code output remains the same so we can go ahead.

davesque · 2019-07-02T01:13:06Z

I believe I also included all of @cburgdorf 's suggestions. I'll go ahead and merge this.

davesque added 6 commits June 26, 2019 16:08

Add typing for extract_file_interface_imports

d31da39

Comments/exc messages

e16c6f9

Refactor uniq in vyper bin

aba01cd

Improves readability and passes mypy.

Add typing for exc_handler in vyper bin

87a862c

Add typing for get_interface_codes in vyper bin

0f8da58

Raise more descriptive exception type

c475dff

davesque added the work in progress Work on this PR or issue is not yet complete but reviewers are free to add their input for guidance. label Jun 27, 2019

davesque force-pushed the type-checking branch from b262833 to c531068 Compare June 27, 2019 03:38

davesque removed the work in progress Work on this PR or issue is not yet complete but reviewers are free to add their input for guidance. label Jun 27, 2019

davesque changed the title ~~[WIP] Type annotations for early stage compiler functions~~ Type annotations for early stage compiler functions Jun 27, 2019

davesque requested review from pipermerriam and jacqueswww June 27, 2019 03:55

davesque added 2 commits June 26, 2019 21:57

Simplify API of compile_codes

890d8c2

Lint and whitespace

2403d26

davesque force-pushed the type-checking branch from 216b69a to 2403d26 Compare June 27, 2019 03:58

cburgdorf reviewed Jun 27, 2019

View reviewed changes

Convert uniq to generator

edbff67

davesque force-pushed the type-checking branch from 052e8c6 to edbff67 Compare June 28, 2019 22:52

jacqueswww approved these changes Jul 1, 2019

View reviewed changes

davesque merged commit 18d140f into vyperlang:master Jul 2, 2019

davesque deleted the type-checking branch July 3, 2019 02:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type annotations for early stage compiler functions #1504

Type annotations for early stage compiler functions #1504

davesque commented Jun 27, 2019

davesque commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019 •

edited

Loading

cburgdorf Jun 27, 2019

cburgdorf Jun 27, 2019

davesque Jun 27, 2019

cburgdorf Jun 27, 2019 •

edited

Loading

davesque commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019 •

edited

Loading

fubuloubu commented Jun 27, 2019

davesque commented Jun 27, 2019

fubuloubu commented Jun 27, 2019

cburgdorf commented Jun 27, 2019

fubuloubu commented Jun 27, 2019

jacqueswww commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019

davesque commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019 •

edited

Loading

cburgdorf commented Jun 28, 2019

jacqueswww commented Jun 28, 2019 •

edited

Loading

cburgdorf commented Jun 28, 2019

jacqueswww commented Jun 28, 2019

davesque commented Jun 28, 2019

davesque commented Jul 1, 2019 •

edited

Loading

jacqueswww commented Jul 1, 2019

davesque commented Jul 2, 2019

Type annotations for early stage compiler functions #1504

Type annotations for early stage compiler functions #1504

Conversation

davesque commented Jun 27, 2019

What I did

How to verify it

Description for the changelog

Cute Animal Picture

davesque commented Jun 27, 2019 • edited Loading

jacqueswww commented Jun 27, 2019 • edited Loading

cburgdorf Jun 27, 2019

Choose a reason for hiding this comment

cburgdorf Jun 27, 2019

Choose a reason for hiding this comment

davesque Jun 27, 2019

Choose a reason for hiding this comment

cburgdorf Jun 27, 2019 • edited Loading

Choose a reason for hiding this comment

davesque commented Jun 27, 2019 • edited Loading

jacqueswww commented Jun 27, 2019 • edited Loading

fubuloubu commented Jun 27, 2019

davesque commented Jun 27, 2019

fubuloubu commented Jun 27, 2019

cburgdorf commented Jun 27, 2019

fubuloubu commented Jun 27, 2019

jacqueswww commented Jun 27, 2019 • edited Loading

jacqueswww commented Jun 27, 2019 • edited Loading

jacqueswww commented Jun 27, 2019

davesque commented Jun 27, 2019 • edited Loading

jacqueswww commented Jun 27, 2019 • edited Loading

cburgdorf commented Jun 28, 2019

jacqueswww commented Jun 28, 2019 • edited Loading

cburgdorf commented Jun 28, 2019

jacqueswww commented Jun 28, 2019

davesque commented Jun 28, 2019

davesque commented Jul 1, 2019 • edited Loading

jacqueswww commented Jul 1, 2019

davesque commented Jul 2, 2019

davesque commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019 •

edited

Loading

cburgdorf Jun 27, 2019 •

edited

Loading

davesque commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019 •

edited

Loading

davesque commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 27, 2019 •

edited

Loading

jacqueswww commented Jun 28, 2019 •

edited

Loading

davesque commented Jul 1, 2019 •

edited

Loading