-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove PIP_NO_DEPS from YamlTemplate Dockerfile #1748
Remove PIP_NO_DEPS from YamlTemplate Dockerfile #1748
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1748 +/- ##
============================================
+ Coverage 42.15% 42.57% +0.41%
- Complexity 3278 3359 +81
============================================
Files 808 815 +7
Lines 47293 47672 +379
Branches 5053 5107 +54
============================================
+ Hits 19938 20295 +357
- Misses 25710 25719 +9
- Partials 1645 1658 +13
|
@@ -44,7 +44,6 @@ ARG PY_VERSION=${pythonVersion} | |||
|
|||
# Set python environment variables | |||
ENV FLEX_TEMPLATE_PYTHON_PY_FILE=main.py | |||
ENV PIP_NO_DEPS=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we have this anywhere? I see we now only recommend this for Beam <=2.37.0, so should we remove it everywhere?
cc/ @tvalentyn who may have more context - https://github.com/GoogleCloudPlatform/python-docs-samples/pull/11277/files - I'm also concerned that we may no longer recommend this because we're already doing something similar, but I'm definitely missing context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this, we already pass no-deps in https://github.com/apache/beam/blob/44792814ff303cd561a1fb3d0ff6cb1b4b4c3135/sdks/python/apache_beam/runners/portability/stager.py#L756
Context is that when beam tries to stage pipeline dependencies specified in a requirements.txt, if such a dependency uses pep-517 to package itself (e.g. with a toml file), then in order find its transitive dependencies, pip rebuilds a wheel, which may be slow and can fail. Hence we decided to only stage top-level dependencies, and we now pass no-deps
in stager.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only other place it is defined is the XLANG template, so I can also remove it from that one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I think this a good change to make, so I'll approve. I doubt this will actually solve the problem it is targeting though given this information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@damccorm I tested a Yaml job that has a provider specified with dependencies which fails without this change. Tested that it works with this change.
The link @tvalentyn linked is for the worker container image. The problem here is during expansion of provided transforms in the template launcher. In the case of pythonPackage
, a venv is cloned from the base venv, with extra packages defined in the provider spec being installed to the cloned venv. When PIP_NO_DEPS is defined, the transitive deps are not installed, meaning these custom transform environments do not contain all necessary packages.
An example (using first random package not in beam I could find):
pipeline:
type: chain
transforms:
- type: Create
config:
elements:
- id: 1
- type: Calendar
- type: LogForTesting
providers:
- type: pythonPackage
config:
packages:
- calendar-view
transforms:
Calendar: |
from calendar_view.calendar import Calendar
from calendar_view.core.event import EventStyles
@beam.ptransform_fn
def fn(pcoll):
calendar = Calendar.build()
calendar.add_event(day_of_week=0, start='08:00', end='17:00', style=EventStyles.GRAY)
calendar.save("simple_view.png")
return pcoll | beam.Map(lambda x: x)
In this example (which can be run locally), if PIP_NO_DEPS is passed, calendar-view
is installed to the cloned venv during expansion, but the pipeline will fail with message ModuleNotFoundError: No module named 'PIL'
since the Pillow
transitive dep is not installed. Removing the PIP_NO_DEPS flag results in success
Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>
63b6bba
to
d78bd3e
Compare
Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>
* adding logging to test schemmap * open schemmap prtection * Handling for String Primary Keys in Bulk Reader. (#1743) Limitation: For the current PR complete support only upto 3 byte code points. * Add support for ALO in SpannerChangeStreamsToBigQuery template (#1750) * Fixing exception in String .isSplittable (#1755) * Remove PIP_NO_DEPS from YamlTemplate Dockerfile (#1748) Signed-off-by: Jeffrey Kinard <jeff@thekinards.com> * Terraform template updates for custom transformation (#1746) * terraform changes for end-to-end template * terraform updates * formatting fix * Add string mapper case for IT (#1757) * Fixing Index Discovery for 5.7 and removing innodb_parallel_read_threads from init sequences for MySql5.7 compliance. (#1758) * adding schema map to dml handler * removing logs for testing * ut for reverse replication shadow tables (#1759) * ut for reverse replication shadow tables * incorporated review comments * unit test for source writer (#1749) * source writer unit test * added git configs for new template * added spanner pr workflow * added spanner pr workflow * added spanner pr workflow * Adding autoReconnect parameters to URL (#1760) * map should not be static * merging * adding logging to test schemmap * open schemmap prtection * adding schema map to dml handler * removing logs for testing * map should not be static * linebreaks --------- Signed-off-by: Jeffrey Kinard <jeff@thekinards.com> Co-authored-by: Vardhan Vinay Thigle <39047439+VardhanThigle@users.noreply.github.com> Co-authored-by: Dip Patel <37777672+Dippatel98@users.noreply.github.com> Co-authored-by: Jeff Kinard <jeff@thekinards.com> Co-authored-by: shreyakhajanchi <92910380+shreyakhajanchi@users.noreply.github.com> Co-authored-by: Deep1998 <deepchowdhury1998@gmail.com> Co-authored-by: aksharauke <126752897+aksharauke@users.noreply.github.com>
PIP_NO_DEPS is typically added to custom python templates to speedup launch, but since the YamlTemplate image already has all dependencies, it won't affect startup of native Beam YAML pipelines.
It does, however, prevent transitive dependencies from being installed for packages defined by a pythonPackage provider.
This PR removes the PIP_NO_DEPS to allow for proper compilation/expansion of custom pythonPackage provider transforms.