Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add details to expectations for scalars #308

Merged
merged 31 commits into from
Nov 17, 2023
Merged
Changes from 3 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
751a131
note what may raise
MarcoGorelli Oct 31, 2023
7cb90ea
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 7, 2023
fc65648
list required methods
MarcoGorelli Nov 8, 2023
7c24afd
add scalar class
MarcoGorelli Nov 8, 2023
2714c13
reword
MarcoGorelli Nov 8, 2023
99b91a5
fixup
MarcoGorelli Nov 8, 2023
a867f00
fixup
MarcoGorelli Nov 8, 2023
9e13924
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 8, 2023
f197672
fixup
MarcoGorelli Nov 8, 2023
a417b1c
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 14, 2023
409d8f3
replace Scalar|NullType with Scalar
MarcoGorelli Nov 14, 2023
0db9871
type null as Scalar
MarcoGorelli Nov 14, 2023
6520ac4
add example of working with scalars
MarcoGorelli Nov 14, 2023
46bc08c
use AnyScalar;
MarcoGorelli Nov 15, 2023
b879f31
add Scalar.dtype and Scalar.persist
MarcoGorelli Nov 15, 2023
456c152
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 15, 2023
b8011c7
update shift arg
MarcoGorelli Nov 15, 2023
97d8f9a
use BoolScalar
MarcoGorelli Nov 15, 2023
3b7bcb6
use float scalar in some parts
MarcoGorelli Nov 15, 2023
d598a8d
use float scalar in some parts
MarcoGorelli Nov 15, 2023
a12585b
string scalar for rename
MarcoGorelli Nov 15, 2023
35cd4ed
intscalar for shift
MarcoGorelli Nov 15, 2023
fade164
numeric scalar for correction
MarcoGorelli Nov 15, 2023
29ceed2
simplify
MarcoGorelli Nov 15, 2023
bee402f
update python builtin types desc
MarcoGorelli Nov 15, 2023
d1f4daf
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 15, 2023
15090ac
fixup
MarcoGorelli Nov 15, 2023
24d2ad8
enable extra ruff rule, note AnyScalar
MarcoGorelli Nov 15, 2023
8360d96
remove some unnecessary nitpick ignores
MarcoGorelli Nov 15, 2023
f69679a
return Self from Scalar.persist, add column.persist
MarcoGorelli Nov 17, 2023
216b5e6
fixup
MarcoGorelli Nov 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 60 additions & 2 deletions spec/design_topics/python_builtin_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,70 @@ builtin types to CPU. In the above example, the `.mean()` call returns a
`float`. It is likely beneficial though to implement this as a library-specific
scalar object which duck types with `float`. This means that it should (a) have
the same semantics as a builtin `float` when used within a library, and (b)
support usage as a `float` outside of the library (i.e., implement
`__float__`). Duck typing is usually not perfect, for example `isinstance`
support usage as a `float` outside of the library (see below).
Duck typing is usually not perfect, for example `isinstance`
usage on the float-like duck type will behave differently. Such explicit "type
of object" checks don't have to be supported.

The following design rule applies everywhere builtin Python types are used
within this API standard: _where a Python builtin type is specified, an
implementation may always replace it by an equivalent library-specific type
that duck types with the Python builtin type._

## Required methods

If a library doesn't use the Python built-in scalars, then its scalars must implement
at least the following operations which return scalars:
- `__lt__`
- `__le__`
- `__eq__`
- `__ne__`
- `__gt__`
- `__ge__`
- `__add__`
- `__radd__`
- `__sub__`
- `__rsub__`
- `__mul__`
- `__rmul__`
- `__mod__`
- `__rmod__`
- `__pow__`
- `__rpow__`
- `__floordiv__`
- `__rfloordiv__`
- `__truediv__`
- `__rtruediv__`
- `__neg__`
- `__abs__`

Furthermore, unless the library exclusively allows for lazy execution,
it must also implement the following unary operations which return Python scalars:
- `__int__`
- `__float__`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

punt on these

- `__bool__`

For example, if a library implements `FancyFloat` and `FancyBool` scalars,
then the following should all be supported:
```python
df: DataFrame
column_1: Column = df.col('a')
column_2: Column = df.col('b')

scalar: FancyFloat = column_1.std()
result_1: Column = column_2 - column_1.std()
result_2: FancyBool = column_2.std() > column_1.std()
```
The following, however, may raise, dependening on the
implementation:
```python
df: DataFrame
column = df.col('a')

if column.std() > 0: # this line may raise!
print('std is positive')
```
This is because `if column.std() > 0` will call `(column.std() > 0).__bool__()`,
which must produce a Python scalar. Therefore, a purely lazy dataframe library
may choose to raise here, whereas as one which allows for eager execution may return
a Python bool.