-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IGNORE_ERRORS when scanning from pyarrow/pandas #4646
base: master
Are you sure you want to change the base?
Conversation
3fdb113
to
2f21964
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4646 +/- ##
=======================================
Coverage 86.50% 86.51%
=======================================
Files 1369 1372 +3
Lines 57955 57999 +44
Branches 7203 7209 +6
=======================================
+ Hits 50136 50175 +39
- Misses 7652 7657 +5
Partials 167 167 ☔ View full report in Codecov by Sentry. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
615623b
to
074809c
Compare
074809c
to
dbad6a6
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
65e95d3
to
2f0172f
Compare
1ee740f
to
ed44e03
Compare
Benchmark ResultMaster commit hash:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several KUZU_API changes which I don't follow why they're now needed in this PR?
CMakeLists.txt
Outdated
@@ -317,7 +317,7 @@ add_subdirectory(third_party) | |||
if(${BUILD_KUZU}) | |||
add_definitions(-DKUZU_ROOT_DIRECTORY="${PROJECT_SOURCE_DIR}") | |||
add_definitions(-DKUZU_CMAKE_VERSION="${CMAKE_PROJECT_VERSION}") | |||
add_definitions(-DKUZU_EXTENSION_VERSION="0.7.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm Do we need to upgrade the extension version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a virtual function to TableFuncBindData
which is inherited by some classes in the extensions (e.g. JSONScanBindData) so I probably need to bump the extension version.
That single change (me needing to export ScanBindData/TableFuncBindData) caused a bunch of dynamic casts in the macos extension test to fail due to the casted targets not being exported which is why I added all those
KUZU_API` changes. I'm not completely sure why they showed up now and not before though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. We don't need to bump extension version manually in PRs now. The CI pipeline will handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit strange, as we have virtual function in TableFuncBindData
previously. I wonder why we don't need to mark KUZU_API previously, but now. @acquamarin Can you also take a look at the KUZU_API changes in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably because the only virtual method before is copy
which is pure virtual (so the definitions are all in the child classes). The method I added has a default implementation so it actually needs to exported for child classes in the extensions to access.
@@ -17,6 +18,7 @@ namespace kuzu { | |||
PyArrowScanConfig::PyArrowScanConfig(const common::case_insensitive_map_t<Value>& options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why PyArrowScan and PandasScan have different ways to convert scanConfig. I need to take another look to see if we can unify them. but can you also take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason is that we don't support the pyarrow scan options (skipNum
and limitNum
) in pandas scan so before this PR we don't actually support any configuration for pandas scan. I could look into if it's doable to support those options in pandas scan and if it is I'll add the support and unify the config in a separate PR.
Benchmark ResultMaster commit hash:
|
Benchmark ResultMaster commit hash:
|
Description
Adds support for the default
IGNORE_ERRORS
behaviour (skipping rows that trigger PK-related exceptions) when scanning from pandas/pyarrow.Fixes #4534
Contributor agreement