Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dependency on pyarrow introduces heavyweight numpy sub-dependency #1196

Closed
di opened this issue Mar 31, 2022 · 7 comments · Fixed by #1282
Closed

New dependency on pyarrow introduces heavyweight numpy sub-dependency #1196

di opened this issue Mar 31, 2022 · 7 comments · Fixed by #1282
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@di
Copy link
Member

di commented Mar 31, 2022

Is your feature request related to a problem? Please describe.
The new dependency on pyarrow, introduced in #1178, creates a new sub-dependency on numpy. Without fully understanding why these dependencies were introduced, a required dependency on numpy feels unnecessarily large for this library.

Describe the solution you'd like
Make the pyarrow and numpy dependencies optional (via extras).

Describe alternatives you've considered
Pin my usage of google-cloud-bigquery back to a version that does have these dependencies, or find a way to remove it entirely.

Additional context
The dependency on numpy in pyarrow: https://github.com/apache/arrow/blob/4a90e3994fc9fc10b968ab3439dec636385dec22/python/setup.py#L589-L591

(PS, thanks for your work on this library!)

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Mar 31, 2022
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Apr 1, 2022
@meredithslota meredithslota added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Apr 4, 2022
@tswast
Copy link
Contributor

tswast commented Apr 5, 2022

Possibly a duplicate of #1142

@tswast
Copy link
Contributor

tswast commented Apr 5, 2022

For some background, the reason we added pyarrow as a required dependency is that we kept getting issues from folks installing incompatible versions of pyarrow. Perhaps there's another way to address this, but at the moment, pip doesn't do anything to help with incompatible versions in "extras" unless those extras are explicitly installed.

@majorgilles
Copy link

Concur with the post, we were using this as a replacement over the now bloated google-api-python-client and will be forced to pin this to the latest non problematic version because we use this from an AWS lambda environment which have very specific restricions in terms of deployment size

@tswast
Copy link
Contributor

tswast commented Apr 13, 2022

If someone were to send a PR to do this, I'd be open to it, given the extra dependency does appear to block more use cases than I anticipated.

We'd want to:

@tswast
Copy link
Contributor

tswast commented Jun 22, 2022

To implement this, we'd basically want to revert #776. I doubt a simple "revert" will be sufficient at this point as 2.x diverged a bit from 3.x.

@rogerhub
Copy link

I saw my binary size grow from 30-40MB to 234MB, and I found out it's because I upgraded google-cloud-bigquery and picked up a dependency on pyarrow and numpy. Any update on this bug?

I don't think I use the parts of the bigquery package that require these dependencies, so the extra dependencies are just slowing down deployment and startup time (new binary is 5-6x bigger).

@hannes-ucsc
Copy link

For some background, the reason we added pyarrow as a required dependency is that we kept getting issues from folks installing incompatible versions of pyarrow.

So now you'll be getting issues from people who's binary code size is exploding. Are you sure you landed on the right side of this trade-off? ;-) Can't you just add a dynamic version check after the import?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants