Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use column projection during update #322

Merged
merged 10 commits into from
Nov 21, 2022
Merged

Use column projection during update #322

merged 10 commits into from
Nov 21, 2022

Conversation

eddyxu
Copy link
Contributor

@eddyxu eddyxu commented Nov 21, 2022

Closes #319 and #321

@eddyxu eddyxu requested a review from changhiskhan November 21, 2022 16:47
@eddyxu eddyxu self-assigned this Nov 21, 2022
@eddyxu eddyxu added the c++ C++ issues label Nov 21, 2022
Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of questions

}

ARROW_ASSIGN_OR_RAISE(auto datum,
::arrow::compute::ExecuteScalarExpression(expression, *schema(), batch));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this bind the expression or is it required to be bound before the AddColumn method is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARROW_ASSIGN_OR_RAISE(arr, CreateArray(datum.scalar(), batch->num_rows()));
} else if (datum.is_chunked_array()) {
auto chunked_arr = datum.chunked_array();
ARROW_ASSIGN_OR_RAISE(arr, ::arrow::Concatenate(chunked_arr->chunks()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtensionArray's cannot be concatenated currently - tho compute expressions won't either so ExtensionArray's probably won't make it past ExecuteScalarExpression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test as follow up? But also, there is no function / kernel is available for extension types right now, this method might fail earlier.

@@ -316,6 +317,41 @@ ::arrow::Result<std::shared_ptr<UpdaterBuilder>> LanceDataset::NewUpdate(
std::move(new_field));
}

::arrow::Result<std::shared_ptr<LanceDataset>> LanceDataset::AddColumn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again the problem here is if the compute expression contains aggregates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be checked via bool Expression::IsScalarExpression() const.

I can throw a invalid status from AddColumn

ARROW_ASSIGN_OR_RAISE(auto datum,
::arrow::compute::ExecuteScalarExpression(expression, *schema(), batch));
std::shared_ptr<::arrow::Array> arr;
if (datum.is_scalar()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so this is a constant literal value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is for case like AddColumn(field, pc::literal(1234)).

Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be safe if you just add the check/raise on column aggregates

@eddyxu
Copy link
Contributor Author

eddyxu commented Nov 21, 2022

should be safe if you just add the check/raise on column aggregates

added IsScalarExpression check.

@eddyxu eddyxu merged commit 1035ed9 into main Nov 21, 2022
@eddyxu eddyxu deleted the lei/update_projection branch November 21, 2022 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ C++ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Projection during appending columns
2 participants