Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add DataType inference from Python types #3555

Merged
merged 8 commits into from
Dec 12, 2024
Merged

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Dec 12, 2024

Closes #3549

@jaychia jaychia marked this pull request as ready for review December 12, 2024 02:12
@github-actions github-actions bot added the feat label Dec 12, 2024
Copy link

codspeed-hq bot commented Dec 12, 2024

CodSpeed Performance Report

Merging #3555 will degrade performances by 12.53%

Comparing jay/type-convenience (0f8be2c) with main (6ae4e77)

Summary

❌ 1 regressions
✅ 26 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main jay/type-convenience Change
test_iter_rows_first_row[100 Small Files] 185.1 ms 211.7 ms -12.53%

@pytest.mark.parametrize(
["source", "expected"],
[
(str, DataType.string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps could test

  • dict (our DataType::Map)
  • tuple (our DataType::Struct I think with just arbitrary names)
  • bytearray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trust your judgement here though :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes the dict map one is a good idea. Let me add that.

daft/udf.py Outdated
@@ -394,7 +394,7 @@ def __hash__(self) -> int:

def udf(
*,
return_dtype: DataType,
return_dtype: DataType | type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could create a type alias in pythong for DataType | type as it seems to be used in a lot of places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember documentation being a bit messy when I used type aliases for something else. Let me give it a shot again and see how it shows up on docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I forgot how it shows up in documentation... yea I guess I am thinking like a Rust user. it might be complicated for something that isn't statically typed having to go deeper in docs 🤔. Trust your judgement here.

@jaychia jaychia enabled auto-merge (squash) December 12, 2024 07:53
@jaychia jaychia merged commit da6f499 into main Dec 12, 2024
40 of 41 checks passed
@jaychia jaychia deleted the jay/type-convenience branch December 12, 2024 12:59
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

Attention: Patch coverage is 92.50000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 77.79%. Comparing base (f23ee37) to head (0f8be2c).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
daft/datatype.py 90.32% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3555      +/-   ##
==========================================
+ Coverage   77.69%   77.79%   +0.09%     
==========================================
  Files         710      716       +6     
  Lines       86941    87801     +860     
==========================================
+ Hits        67552    68305     +753     
- Misses      19389    19496     +107     
Files with missing lines Coverage Δ
daft/expressions/expressions.py 93.60% <100.00%> (+0.01%) ⬆️
daft/udf.py 95.58% <100.00%> (+0.02%) ⬆️
daft/datatype.py 91.73% <90.32%> (-0.16%) ⬇️

... and 41 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UDF Type Inference
2 participants