-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use column projection during update #322
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of questions
} | ||
|
||
ARROW_ASSIGN_OR_RAISE(auto datum, | ||
::arrow::compute::ExecuteScalarExpression(expression, *schema(), batch)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this bind the expression or is it required to be bound before the AddColumn method is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the expression is bound when entering this AddColumn
method.
ARROW_ASSIGN_OR_RAISE(arr, CreateArray(datum.scalar(), batch->num_rows())); | ||
} else if (datum.is_chunked_array()) { | ||
auto chunked_arr = datum.chunked_array(); | ||
ARROW_ASSIGN_OR_RAISE(arr, ::arrow::Concatenate(chunked_arr->chunks())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExtensionArray's cannot be concatenated currently - tho compute expressions won't either so ExtensionArray's probably won't make it past ExecuteScalarExpression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a test as follow up? But also, there is no function / kernel is available for extension types right now, this method might fail earlier.
@@ -316,6 +317,41 @@ ::arrow::Result<std::shared_ptr<UpdaterBuilder>> LanceDataset::NewUpdate( | |||
std::move(new_field)); | |||
} | |||
|
|||
::arrow::Result<std::shared_ptr<LanceDataset>> LanceDataset::AddColumn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again the problem here is if the compute expression contains aggregates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be checked via bool Expression::IsScalarExpression() const
.
I can throw a invalid status from AddColumn
ARROW_ASSIGN_OR_RAISE(auto datum, | ||
::arrow::compute::ExecuteScalarExpression(expression, *schema(), batch)); | ||
std::shared_ptr<::arrow::Array> arr; | ||
if (datum.is_scalar()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so this is a constant literal value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is for case like AddColumn(field, pc::literal(1234))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be safe if you just add the check/raise on column aggregates
added |
Closes #319 and #321