Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove PIP_NO_DEPS from YamlTemplate Dockerfile #1748

Merged
merged 1 commit into from
Jul 29, 2024

Conversation

Polber
Copy link
Contributor

@Polber Polber commented Jul 26, 2024

PIP_NO_DEPS is typically added to custom python templates to speedup launch, but since the YamlTemplate image already has all dependencies, it won't affect startup of native Beam YAML pipelines.

It does, however, prevent transitive dependencies from being installed for packages defined by a pythonPackage provider.

This PR removes the PIP_NO_DEPS to allow for proper compilation/expansion of custom pythonPackage provider transforms.

@Polber Polber requested a review from damccorm July 26, 2024 00:23
@Polber Polber self-assigned this Jul 26, 2024
Copy link

codecov bot commented Jul 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 42.57%. Comparing base (983c151) to head (d78bd3e).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1748      +/-   ##
============================================
+ Coverage     42.15%   42.57%   +0.41%     
- Complexity     3278     3359      +81     
============================================
  Files           808      815       +7     
  Lines         47293    47672     +379     
  Branches       5053     5107      +54     
============================================
+ Hits          19938    20295     +357     
- Misses        25710    25719       +9     
- Partials       1645     1658      +13     
Components Coverage Δ
spanner-templates 64.52% <ø> (+0.89%) ⬆️
spanner-import-export 64.32% <ø> (-0.13%) ⬇️
spanner-live-forward-migration 75.00% <ø> (ø)
spanner-live-reverse-replication 51.96% <ø> (ø)
spanner-bulk-migration 83.45% <ø> (+1.01%) ⬆️

see 18 files with indirect coverage changes

@@ -44,7 +44,6 @@ ARG PY_VERSION=${pythonVersion}

# Set python environment variables
ENV FLEX_TEMPLATE_PYTHON_PY_FILE=main.py
ENV PIP_NO_DEPS=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we have this anywhere? I see we now only recommend this for Beam <=2.37.0, so should we remove it everywhere?

cc/ @tvalentyn who may have more context - https://github.com/GoogleCloudPlatform/python-docs-samples/pull/11277/files - I'm also concerned that we may no longer recommend this because we're already doing something similar, but I'm definitely missing context

Copy link

@tvalentyn tvalentyn Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this, we already pass no-deps in https://github.com/apache/beam/blob/44792814ff303cd561a1fb3d0ff6cb1b4b4c3135/sdks/python/apache_beam/runners/portability/stager.py#L756

Context is that when beam tries to stage pipeline dependencies specified in a requirements.txt, if such a dependency uses pep-517 to package itself (e.g. with a toml file), then in order find its transitive dependencies, pip rebuilds a wheel, which may be slow and can fail. Hence we decided to only stage top-level dependencies, and we now pass no-deps in stager.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only other place it is defined is the XLANG template, so I can also remove it from that one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I think this a good change to make, so I'll approve. I doubt this will actually solve the problem it is targeting though given this information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damccorm I tested a Yaml job that has a provider specified with dependencies which fails without this change. Tested that it works with this change.

The link @tvalentyn linked is for the worker container image. The problem here is during expansion of provided transforms in the template launcher. In the case of pythonPackage, a venv is cloned from the base venv, with extra packages defined in the provider spec being installed to the cloned venv. When PIP_NO_DEPS is defined, the transitive deps are not installed, meaning these custom transform environments do not contain all necessary packages.

An example (using first random package not in beam I could find):

pipeline:
  type: chain
  transforms:
    - type: Create
      config:
        elements:
          - id: 1
    - type: Calendar
    - type: LogForTesting

providers:
  - type: pythonPackage
    config:
      packages:
        - calendar-view
    transforms:
      Calendar: |
        from calendar_view.calendar import Calendar
        from calendar_view.core.event import EventStyles
        
        @beam.ptransform_fn
        def fn(pcoll):
          calendar = Calendar.build()
          calendar.add_event(day_of_week=0, start='08:00', end='17:00', style=EventStyles.GRAY)
          calendar.save("simple_view.png")
        
          return pcoll | beam.Map(lambda x: x)

In this example (which can be run locally), if PIP_NO_DEPS is passed, calendar-view is installed to the cloned venv during expansion, but the pipeline will fail with message ModuleNotFoundError: No module named 'PIL' since the Pillow transitive dep is not installed. Removing the PIP_NO_DEPS flag results in success

Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>
@Polber Polber force-pushed the jkinard/yaml-provider-fix branch from 63b6bba to d78bd3e Compare July 26, 2024 18:20
@Polber Polber requested a review from damccorm July 26, 2024 22:36
@Polber Polber merged commit d142989 into GoogleCloudPlatform:main Jul 29, 2024
13 checks passed
dhercher pushed a commit that referenced this pull request Jul 30, 2024
Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>
damccorm pushed a commit that referenced this pull request Aug 3, 2024
* adding logging to test schemmap

* open schemmap prtection

* Handling for String Primary Keys in Bulk Reader. (#1743)

Limitation: For the current PR complete support only upto 3 byte code points.

* Add support for ALO in SpannerChangeStreamsToBigQuery template (#1750)

* Fixing exception in String .isSplittable (#1755)

* Remove PIP_NO_DEPS from YamlTemplate Dockerfile (#1748)

Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>

* Terraform template updates for custom transformation (#1746)

* terraform changes for end-to-end template

* terraform updates

* formatting fix

* Add string mapper case for IT (#1757)

* Fixing Index Discovery for 5.7 and removing innodb_parallel_read_threads from init sequences for MySql5.7 compliance. (#1758)

* adding schema map to dml handler

* removing logs for testing

* ut for reverse replication shadow tables (#1759)

* ut for reverse replication shadow tables

* incorporated review comments

* unit test for source writer (#1749)

* source writer unit test

* added git configs for new template

* added spanner pr workflow

* added spanner pr workflow

* added spanner pr workflow

* Adding autoReconnect parameters to URL (#1760)

* map should not be static

* merging

* adding logging to test schemmap

* open schemmap prtection

* adding schema map to dml handler

* removing logs for testing

* map should not be static

* linebreaks

---------

Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>
Co-authored-by: Vardhan Vinay Thigle <39047439+VardhanThigle@users.noreply.github.com>
Co-authored-by: Dip Patel <37777672+Dippatel98@users.noreply.github.com>
Co-authored-by: Jeff Kinard <jeff@thekinards.com>
Co-authored-by: shreyakhajanchi <92910380+shreyakhajanchi@users.noreply.github.com>
Co-authored-by: Deep1998 <deepchowdhury1998@gmail.com>
Co-authored-by: aksharauke <126752897+aksharauke@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants