Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[107] Use Python 3 type annotations #1665

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

seertaak
Copy link

@seertaak seertaak commented Nov 9, 2020

NB. This is a WIP. I am creating this PR to get feedback relatively early.

Issue 107 calls for the use of PEP 484 type annotations in lieu of the Pythran's comment directives as a means to convey function signature to the Pythran compiler. This PR offers an experimental implementation of these.

@serge-sans-paille
Copy link
Owner

Thanks for triggering that discussion. I think that @paugier will have an opinion on this.
Some early thoughts: doe a function signature implies a pythran export? how would we handle overloads? It would be nice to have a few test cases too.

@paugier
Copy link
Contributor

paugier commented Nov 9, 2020

First naive questions: @seertaak did you have a look at Transonic and how one can use Pythran with Python 3 type annotations with it?

With your proposal, how would you define array types? In strings as in def fun(a: "float[]")?

It would be nice to present examples of what your proposal improves and how things would be written.

A comment: multi-signatures are a bit more precise than annotations as soon as one uses more than one overloads. See for example the difference between:

export fun(int, int)
export fun(float, float)

and

export fun(int, int)
export fun(float, float)
export fun(int, float)
export fun(float, int)

In this case, it seems anecdotal but when there are more arguments, it can make a huge difference in term of compilation time. One needs to be clear how annotations are converted to signatures.

To express the former in Transonic one can write

T = Union[int, float]
def fun(a: T, b: T): ...

In contrast,

def fun(a: Union[int, float], b: Union[int, float]): ...
# or
def fun(a: "int or float", b: "int or float"): ...

should correspond to the later.

@seertaak
Copy link
Author

seertaak commented Nov 9, 2020

PR Goals

  1. Try to implement annotations "conservatively", with minimum disruption. Concretely, pythran directives are still supported; PEP 484 annotations are optional.
  2. If a function is annotated but has no pythran directive, it is not exported. Just like before.
  3. If a function is annotated and has a pythran directive, verify that the argument types match (throw error if not).
  4. If a function foo is annotated and has "empty" pythran directive (e.g. #pythran export foo), the export the function using signature gleaned from the type annotations.

30,000ft View

  1. Right at the the start of processing - close to where the specs are parsed, we use a custom analysis which extracts annotations from FunctionDecl AST objects, and converts them into the spec format.
  2. Next, we attempt to reconcile the specs obtained through comment directives and those obtained through type annotations, as described above.
  3. In the middle end, a transformation pass is added to remove all type annotations. This prevents the constant folding pass from failing (it recursively traverses the type annotations and fails to find the names associated with the meta types).

Support/Limitations

  • no support for overloading currently. However, it should be straightforward to implement now that I understand the model is basically C++'s model. (I thought it was HM under the hood which doesn't support overloads IIUC.) To fix this I would need to amend the analysis step (1) above to maintain overload sets rather than single function signatures. The reconciliation would then compare both sets of signatures rather than looking at individual ones.
  • currently the return type annotation (-> int) is ignored. We should verify that the type matches that outputted by HM.
  • array isn't supported, currently. I wasn't sure if that was a c array or std::array or something different, this is why I didn't implement it. It should be straightforward to implement.
  • no test cases; these are definitely needed. If you have thoughts on how to create really evil input, let me know.

Misuse: checking

#pythran export foo(str)
def foo(xs: str):
    return xs
# Error produced: "E: Type annotations don't match pythran spec directive for 'foo': expected ((<class 'int'>,),); got ((<class 'str'>,),)."

#pythran export bar
def bar(xs: ndarray[int, :, uint32]):
    return xs
# Error produced:  (Still needs some work e.g. replacing list for uint32.)
# Exception: Invalid type argument to template type ndarray; expected slice or int literal, got: uint32 of type <class 'type'>.                                                                                      

Primitive Types

Every primitive type except float128 and complex256. For those, I got an undefined error in numpy; not sure why.
e.g.

def times_two(x: int):
    return 2*x

def twice(x: str):
    return 2*x

def blah(z: float32):
    return 2.0*x

Compound Types

Here's what's supported:

import numpy as np

#pythran export foo
def foo(xs: list[int]) -> list[int]:
    return [2*x for x in xs]

#pythran export bar
def bar(strs: set[str]) -> bool:
    return 'hello' in strs

#pythran export baz
def baz(x: tuple[int, str]) -> str:
    return str(x[0]) + x[1]

#pythran export bingo
def bingo(M: ndarray[float32, :, 2]) -> ndarray[float32, :]:
    print(len(M))
    x = np.zeros(len(M))
    for i in range(len(M)):
        x[i] = sum(m_i for m_i in M[i, :])
    return x

Design Questions

  1. Should the #pythran export... directive still be required? I have some code which removes the need for the directive altogether. When I shared my PR I decided to be conservative; I thought there may be a preference for the directives to stay, and I didn't want to rock the boat too much. My personal preference is for the directives to be optional. I would like two annotations: @private and @export, and I would prefer if the default was to export rather than make the function private - just like python. The compiler should complain when type annotations are missing on exported functions.
  2. How should overloading be handled? We should use python's overload annotation.
  3. The introduction of the Annotated type in python 3.9 means that going forward generic type parameters can be values. This is of obvious relevance in the context of generating C++. (std::aray or other fixed-size containers). My view is that we should avoid inventing new syntax at the parser level. Any syntax Pythran uses should be compatible with plain Python, and should feel like plain python. I believe my design follows this idea. But I should stress that I'm not particularly wedded to the syntax I'm using!
  4. I think some discussion is perhaps warranted on the type system front. Pythran is an interesting hybrid betweem more a more ML approach and of course C++. I'm not sure I have the correct intuition for the type system yet, I'm afraid. The way types are conveyed - using specs - could use some work. For example, there's no mechanism to express the return type, although it's quite natural to want to use these when using type annotations.
  5. As an aside, I think the project may benefit from a more unified approach to error handling. It would be nice if a pass could output an error and automatically the location/file etc would be printed. I tried to do my best here, but I wasn't quite sure what exception to throw, and didn't want to get too bogged down.
  6. Final aside, the PR is currently a bit of a mess due to a mess-up on git. I'll fix that.

@paugier
Copy link
Contributor

paugier commented Nov 9, 2020

About the difference with what we can currently do with Transonic-Pythran (I'm just curious) ?

@seertaak
Copy link
Author

seertaak commented Nov 9, 2020

First, transonic looks really interesting, congrats on that project. It goes without saying that I find pythran quite delightful. Superficially, the difference is that I eschew the string syntax you cite in favor of a more "standard" (but perhaps longer winded) notation reminiscent of both Python and C++. Also, I'm submitting this PR for Pythran; I'm not sure if code is shared between Pythran and Transonic....

About the difference with what we can currently do with Transonic-Pythran (I'm just curious) ?

Apart from C (?) arrays, my implementation supports the gamut of types supported by the spec syntax itself. Hopefully this answers your question. Or maybe you wonder if the typing is really recursive? (Can you make a list of list? Yes).

FYI: I have of course thought about extending the type system of Pythran to include records (MLsub with special accommodation for sized arrays a la Futhark - could be cool). But as mentioned before, I was trying to keep this PR focused.

With your proposal, how would you define array types? In strings as in def fun(a: "float[]")?

The following would work:

def fun(xs: ndarray[float, :])
# next, how I would do arrays
def fun(arr: array[float, 10]) # fixed length, stack-allocated.
def fun(arr: array[float, :])    # in effect std::dynarray

@serge-sans-paille
Copy link
Owner

Thanks for the detailed answers!

One quick note before I dive more in the details:

If a function foo is annotated and has "empty" pythran directive (e.g. #pythran export foo), the export the function using signature gleaned from the type annotations.

 #pythran export foo

already means you're exporting a global variable named foo. To avoid conflict with existing notation, I suggest

 #pythran export foo(...)

I'm obviously open to alternative 👍

And again, thanks for triggering this delightful discussion ;-)

@seertaak
Copy link
Author

To avoid conflict with existing notation, I suggest

 #pythran export foo(...)

Thanks for pointing this out. I'm happy to use your proposed syntax.

Are you ok with the use of @overload annotation? Even if we don't use other annotations (like @export), I will need some way to signal, syntactically, that a function is overloaded. I think annotations are the most natural approach, but I'm open to other suggestions.

To manage expectations, the next slab of time I'll have will be this weekend.

@serge-sans-paille
Copy link
Owner

I'm ok with @overload but that can fit in another PR to keep this review focused and simple. Some more thoughts:

def bingo(M: ndarray[float32, :, 2]) -> ndarray[float32, :]:

without the proper imports, this line is no longer valid, you'll get unbound identifier for ndarray, float32 etc. you should probably use directly the value from pythran.typing.*. Plus that may simplify part of your code.

@paugier
Copy link
Contributor

paugier commented Nov 11, 2020

you should probably use directly the value from pythran.typing.*

But then you need Pythran to run the code even when it's not compiled, isn't it?

@serge-sans-paille
Copy link
Owner

serge-sans-paille commented Nov 11, 2020 via email

@seertaak
Copy link
Author

seertaak commented Nov 13, 2020

Questions About Implementation

@serge-sans-paille if I understand correctly, what you would like is that:

  1. Instead of parsing the type annotations, the user is required to add a from pythran.typing import * (or similar).
  2. Next, the code that reads the type annotations looks for top-level imports, and evaluates the type annotation AST in an environment that recreates the module imports.
  3. Next, those types are used to generate the specs used for the rest of compilation.

Please correct me where I'm wrong 😄 I actually thought of this approach myself, but had already begun work on the parsing method I used, and wanted not to get bogged down. However, what I like about the approach I think you suggest is that it keeps you honest: by evaluating the type annotation using Python itself, you ensure that the module can be parsed and evaluated using only Python (and of course the Pythran module it refers to - see next point!).

Type Annotations Implies Pythran Dependency

But then you need Pythran to run the code even when it's not compiled, isn't it?

You would need to import the relevant Pythran module, yes, but the alternative embodied by my current changes is that you need to compile the module using Pythran. If you really don't want to require Pythran at all, then there are two options:

  1. For types like list and set you could get away with using Python's own generic types. The semantics of Pythran would be that these Python lists are converted to pythran lists, etc. This obviously wouldn't work for NDArray, which is probably the type that most of Pythran's users want in the first place.
  2. You drop type annotations and proceed with the older comment directives.

My opinion is that the import to Pythran's typing module is not too onerous.

Rules Surrounding Pythran Comment Directives When Type Annotations Exist

It would be great if you could clarify the following:

  1. For an overload set of size n, how many Pythran comment directives should be required? n? Or k <= n? Or 1?
  2. Should the user be allowed to specify a subset of comment directives fully (i.e. with all the argument types)? Or should all the directives be either fully specified, or (...)? Obviously, if they are all (...) then you will have n identical export directives. I'm not saying that's bad, I'm just clarifying that that's what the input will be.
  3. Currently, textual proximity between pythran comment directive and the function it relates to is not checked. I'm worried about the potential for confusion; for example, assume that the user fully specifies in a comment directive one overload, but mistakenly attaches it to another overload - or somewhere completely different. Should the compiler warn or fail in this case?

For now those are all my questions. Going forward it would make a lot of sense for type annotations to permit return types to be specified, but that's for another day.

@serge-sans-paille
Copy link
Owner

Hey @steertaak, what's the status of this PR? Looks like it's in pause, which is fine with me - I mean, there's no hurry, just wanted to get some news ;-)

@ashwinvis
Copy link
Contributor

With NumPy v1.20, there is now an official way of annotating types:

https://numpy.org/devdocs/release/1.20.0-notes.html#numpy-is-now-typed

@paugier
Copy link
Contributor

paugier commented Feb 9, 2021

I'm a bit skeptical about using numpy.typing for Pythran (or other Numpy accelerators). ArrayLike is an union of many different types from which an array can be created (any sequences, any objects implementing the __array__ protocol, etc.).

To use type annotations for Pythran and other Numpy accelerators, one needs much precise concrete types, and also fused types. See what is doable with transonic.typing (https://transonic.readthedocs.io/en/latest/generated/transonic.typing.html#transonic.typing.Array).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants