Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Add bool to supported types and improve type specification #141

Merged
merged 1 commit into from
Feb 24, 2023

Conversation

thetorpedodog
Copy link
Contributor

@thetorpedodog thetorpedodog commented Feb 24, 2023

This adds boolean to the supported sparse coordinate types and also improves the type definition of coordinate types to ensure that the values in each type are homogeneous:

BadCoord = Union[int, str]
BadCoords = Union[BadCoord, Sequence[BadCoord]]
# ["this", 1] is bad, but is a valid as a BadCoords value

GoodCoord = TypeVar("GoodCoord", int, str)
GoodCoords = Union[GoodCoord, Sequence[GoodCoord]]
# ["this", 1] is bad, and is *not* valid as a GoodCoords value
# ["this", "that"] is a valid as a GoodCoords value
# [1, 2] is valid as a GoodCoords value

I don’t see a bug entry for this but it was something we discussed earlier today, and in the process I realized another thing we could improve.

Context: #960

This adds boolean to the supported sparse coordinate types and also
improves the type definition of coordinate types to ensure that the
values in each type are homogeneous:

    BadCoord = Union[int, str]
    BadCoords = Union[BadCoord, Sequence[BadCoord]]
    # ["this", 1] is bad, but is a valid as a BadCoords value

    GoodCoord = TypeVar("GoodCoord", int, str)
    GoodCoords = Union[GoodCoord, Sequence[GoodCoord]]
    # ["this", 1] is bad, and is *not* valid as a GoodCoords value
    # ["this", "that"] is a valid as a GoodCoords value
    # [1, 2] is valid as a GoodCoords value
types.Slice[pa.TimestampType],
SparseCoord,
Sequence[SparseCoord],
types.Slice[SparseCoord],
pa.Array,
pa.ChunkedArray,
npt.NDArray[np.integer],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also have a numpy-array-like for np.float, np.datetime64, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea but I think it makes sense for a future change. (Also, I am not familiar enough with the numpy typing system at the moment to do the right thing here.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Led to a question about possibly missing point indexing types to match the new slices (eg, if we have slice(float), why not np.array(float)?)

Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

types.Slice[pa.TimestampType],
SparseCoord,
Sequence[SparseCoord],
types.Slice[SparseCoord],
pa.Array,
pa.ChunkedArray,
npt.NDArray[np.integer],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnkerl johnkerl changed the title Add bool to supported types and improve type specification. [python] Add bool to supported types and improve type specification Feb 24, 2023
@thetorpedodog thetorpedodog merged commit 4027413 into main Feb 24, 2023
@thetorpedodog thetorpedodog deleted the type-fix branch February 24, 2023 22:11
thetorpedodog added a commit that referenced this pull request Mar 3, 2023
My previous change which used TypeVar for coordinate types [1] was
actually a Bad Idea.  While it was better in that it prohibited
heterogeneous collections and slices, it was worse in that made the
SparseDFCoord type (and derived types) into generic types, which
we do not want.

The *right* way to avoid repeating all the possible types is to make
a union of the possible *containers*, and then specify *that* with all
the possible contents that may go into those containers.

[1]: #141
thetorpedodog added a commit that referenced this pull request Mar 3, 2023
My previous change which used TypeVar for coordinate types [1] was
actually a Bad Idea.  While it was better in that it prohibited
heterogeneous collections and slices, it was worse in that made the
SparseDFCoord type (and derived types) into generic types, which
we do not want.

The *right* way to avoid repeating all the possible types is to make
a union of the possible *containers*, and then specify *that* with all
the possible contents that may go into those containers.

[1]: #141
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants