Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35879: [C++] Bump bundled google-cloud-cpp to 2.12.0 #36119

Merged
merged 19 commits into from
Jun 27, 2023

Conversation

kou
Copy link
Member

@kou kou commented Jun 16, 2023

Rationale for this change

The version will fix #35318.

What changes are included in this PR?

Use the latest released version.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

@github-actions
Copy link

⚠️ GitHub issue #35879 has been automatically assigned in GitHub to PR creator.

@kou
Copy link
Member Author

kou commented Jun 16, 2023

@coryan If we update bundled google-cloud-cpp to 2.12.0 from 2.8.0, our GCSFS tests are failed with "missing project id error":

https://github.com/apache/arrow/actions/runs/5287300615/jobs/9567623572?pr=36119#step:11:223

[ RUN      ] TestGCSFSGeneric.Empty
D:/a/arrow/arrow/cpp/src/arrow/filesystem/gcsfs_test.cc:304: Failure
Failed
'gcs_fs_->CreateDir(bucket_name, true)' failed with Invalid: google::cloud::Status(INVALID_ARGUMENT: missing project id error_info={reason=INVALID_ARGUMENT, domain=gcloud-cpp, metadata={gcloud-cpp.source.function=CreateBucket, gcloud-cpp.source.line=402, gcloud-cpp.source.filename=D:/a/arrow/arrow/build/cpp/google_cloud_cpp_ep-install/include/google/cloud/storage/client.h, gcloud-cpp.version=v2.12.0}}). Detail: [errno 22] Invalid argument

Do we need to update our GCSFS?

@coryan
Copy link
Contributor

coryan commented Jun 16, 2023

We need to update the tests. The client library now performs more validation, and the testbench is too forgiving. The test used to get away with a request that would have failed against production.

We need to set the GOOGLE_CLOUD_PROJECT environment variable to some value good for testing. Alternatively, we need to expand the GCSFS configuration to accept the project id as one of the (optional) parameters and use that in the test.

Should I send a separate PR or should we try to add these changes to this PR?

@kou
Copy link
Member Author

kou commented Jun 16, 2023

Thanks!

We need to set the GOOGLE_CLOUD_PROJECT environment variable to some value good for testing. Alternatively, we need to expand the GCSFS configuration to accept the project id as one of the (optional) parameters and use that in the test.

If specifying the project ID from API is useful for Apache Arrow users, how about choosing the alternative approach? If it's only useful for testing, the GOOGLE_CLOUD_PROJECT environment variable approach may be enough.

Should I send a separate PR or should we try to add these changes to this PR?

Could you send a separate PR? I think that it's easier for you.
If the project ID related changes don't need google-cloud-cpp 1.12.0, the separated PR don't need to update bundled google-cloud-cpp. If they need, you can cherry-pick commits from this branch and we can close this PR.

It seems that this PR still has a static linking related problem:

https://github.com/apache/arrow/actions/runs/5287582940/jobs/9568250982?pr=36119#step:5:3381

rror: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/arrow/r/check/arrow.Rcheck/00LOCK-arrow/00new/arrow/libs/arrow.so':
  /arrow/r/check/arrow.Rcheck/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: _ZN4absl12lts_2021110213cord_internal18cord_btree_enabledE

So I think that a PR without google-cloud-cpp update will be easy for you.

@coryan
Copy link
Contributor

coryan commented Jun 16, 2023

We need to set the GOOGLE_CLOUD_PROJECT environment variable to some value good for testing. Alternatively, we need to expand the GCSFS configuration to accept the project id as one of the (optional) parameters and use that in the
test.

If specifying the project ID from API is useful for Apache Arrow users, how about choosing the alternative approach?

No problem. Happy to do that.

Should I send a separate PR or should we try to add these changes to this PR?

Could you send a separate PR?

Gladly. It may take me until Tuesday, I am not working on Monday.

It seems that this PR still has a static linking related problem:

Ugh, it can be hard to debug the little library dependencies in Abseil. Good luck!

https://github.com/apache/arrow/actions/runs/5287582940/jobs/9568250982?pr=36119#step:5:3381

rror: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/arrow/r/check/arrow.Rcheck/00LOCK-arrow/00new/arrow/libs/arrow.so':
  /arrow/r/check/arrow.Rcheck/00LOCK-arrow/00new/arrow/libs/arrow.so: undefined symbol: _ZN4absl12lts_2021110213cord_internal18cord_btree_enabledE

So I think that a PR without google-cloud-cpp update will be easy for you.

Sounds good, I will give it a try.

@kou
Copy link
Member Author

kou commented Jun 16, 2023

Thanks!

It may take me until Tuesday, I am not working on Monday.

No problem.

@kou
Copy link
Member Author

kou commented Jun 21, 2023

It seems that the static linking problem is solved.

@coryan
Copy link
Contributor

coryan commented Jun 22, 2023

You may have noticed, I created a PR to help with the problems here.

kou pushed a commit that referenced this pull request Jun 22, 2023
### Rationale for this change

This fixes #36227, originally motivated by the problems in #36119, but seems like a valuable feature in any case.

### What changes are included in this PR?

- Refactor some code to make it testable.
- Add a new `std::optional<std::string>` field to the `GcsOptions` class.

### Are these changes tested?

Yes, I expanded the unit tests.

### Are there any user-facing changes?

Yes. I updated the field documentation.  If I missed some documentation please let me know.

I am also not familiar with the steps required to update the Python wrappers, if there is some documentation to follow I would appreciate it.  I can expand this PR or send a separate one, your call.

* Closes: #36227

Authored-by: Carlos O'Ryan <coryan@google.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou
Copy link
Member Author

kou commented Jun 22, 2023

GH-36228 is merged and I've rebased on main.

@assignUser
Copy link
Member

Quick reminder to also upload the new version to our artifactory!

@kou
Copy link
Member Author

kou commented Jun 23, 2023

Oh. I forgot it. Thanks!

@kou
Copy link
Member Author

kou commented Jun 26, 2023

+1

@paleolimbot @thisisnic This pull request changes the R part a bit. Could you review it?

See also the context: #36228

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the R bits! I can't spot anything that wouldn't have been caught by the tests.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jun 27, 2023
@kou
Copy link
Member Author

kou commented Jun 27, 2023

Thanks!
I'll merge this.

@kou kou merged commit 4cfe9fa into apache:main Jun 27, 2023
@kou kou deleted the cpp-google-cloud-cpp branch June 27, 2023 20:55
@kou kou removed the awaiting merge Awaiting merge label Jun 27, 2023
@raulcd
Copy link
Member

raulcd commented Jun 28, 2023

I am unsure if there's something else to be done but it seems the nightlies for:

Failed with:

E   pyarrow.lib.ArrowInvalid: google::cloud::Status(INVALID_ARGUMENT: missing project id error_info={reason=INVALID_ARGUMENT, domain=gcloud-cpp, metadata={gcloud-cpp.source.function=CreateBucket, gcloud-cpp.source.line=402, gcloud-cpp.source.filename=/build/cpp/google_cloud_cpp_ep-install/include/google/cloud/storage/client.h, gcloud-cpp.version=v2.12.0}}). Detail: [errno 22] Invalid argument

Is there something else to be done? Should I create a new issue for those failures?

@coryan
Copy link
Contributor

coryan commented Jun 28, 2023

I am unsure if there's something else to be done but it seems the nightlies for:

Failed with:

E   pyarrow.lib.ArrowInvalid: google::cloud::Status(INVALID_ARGUMENT: missing project id error_info={reason=INVALID_ARGUMENT, domain=gcloud-cpp, metadata={gcloud-cpp.source.function=CreateBucket, gcloud-cpp.source.line=402, gcloud-cpp.source.filename=/build/cpp/google_cloud_cpp_ep-install/include/google/cloud/storage/client.h, gcloud-cpp.version=v2.12.0}}). Detail: [errno 22] Invalid argument

Is there something else to be done? Should I create a new issue for those failures?

I probably missed making changes for Python in #36228. I do not know where to start with these changes, if somebody can point me to the right documentation, I will give it a shot.

@raulcd
Copy link
Member

raulcd commented Jun 28, 2023

Hi @coryan !
No worries, from my understanding we have to add the new project_id to our GcsFileSystem constructor:
https://github.com/apache/arrow/blob/main/python/pyarrow/_gcsfs.pyx#L83-L89
And adding that to the underlying CGcsOptions.
We migt have to update the __reduce__ function on that file and update the tests on python/pyarrow/tests/test_fs.py:
https://github.com/apache/arrow/blob/main/python/pyarrow/tests/test_fs.py#L211
I've created #36352 to track it.

@conbench-apache-arrow
Copy link

Conbench analyzed the 6 benchmark runs on commit 4cfe9fab.

There were 7 benchmark results indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants