Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Airflow provider package catalog connector #2438

Merged
merged 7 commits into from
Feb 8, 2022
Merged

Add Airflow provider package catalog connector #2438

merged 7 commits into from
Feb 8, 2022

Conversation

ptitzler
Copy link
Member

@ptitzler ptitzler commented Feb 2, 2022

This PR adds a catalog connector for Apache Airflow provider packages. Connector instances require the user to configure a download URL for the Apache Airflow provider package that is installed in the cluster.

Requires #2418, #2409, #2437

What changes were proposed in this pull request?

  • Add new catalog-connectors directory to the repository root, containing in the airflow subdirectory the newly introduced connector
  • Update Makefile to include new lint-connectors target, which was also added as a dependency to the lint task
  • Add new lint-connectors task to Github's build.yaml

Notes:

  • The connector is not included in the Elyra release process and needs to be published independently, as necessary.
  • The connector declares provider archive name (e.g. apache_airflow_providers_ssh - see later comment) and Python file name (e.g. airflow/providers/ssh/operators/ssh.py) as hash keys, which are used to internally identify operators in the palette.
  • The archive version string, e.g.2.3.0-py3-none-any, is currently not part of the key to avoid potential versioning issues. For example, assume user A adds operators from archive ...2.3.0-py3-none-any to the Elyra deployment and creates a pipeline using some of the operators. User B adds operators from an older archive, such as ...2.2.0-py3-none-any . If we were to include the archive name as is as a key, user B would not be able to run pipelines that user A created (and vice versa) because (pseudo code)
    "apache_airflow_providers_ssh-2.3.0-py3-none-any.whl:airflow/providers/ssh/operators/ssh.py:SSHOperator" != "apache_airflow_providers_ssh-2.2.0-py3-none-any.whl:airflow/providers/ssh/operators/ssh.py:SSHOperator"
    

How was this pull request tested?

  • Install connector from source, as documented in the connector's README
  • Enable the connector as documented in the connector's README, specifying one provider package
  • Open VPE for Airflow
  • Expand palette (core Airflow operators should be displayed)
  • Add and configure operators
  • Export pipeline and review DAG
  • Run pipeline

Error scenario testing:

  • configured download url yields 404
  • configured download url does not identify a file
  • configured download url identifies a file that is not a zip archive

Unit testing included the providers listed in this discussion thread

Notes:

  • There are unresolved Elyra Airflow component parser issues that need to be addressed before the Airflow 1.10.15 package can be used. Improve Airflow parser functionality #2418
  • The connector should already support Airflow 2.x packages but they have not been tested because Elyra does not support Airflow 2.x.

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@ptitzler ptitzler added area:documentation Improvements or additions to documentation kind:enhancement New feature or request platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime labels Feb 2, 2022
@ptitzler ptitzler added this to the 3.6.0 milestone Feb 2, 2022
@elyra-bot
Copy link

elyra-bot bot commented Feb 2, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@ptitzler
Copy link
Member Author

ptitzler commented Feb 2, 2022

This PR is a redo of #2416, which had issues.

@ptitzler ptitzler changed the title [HOLD] Add Airflow provider package catalog connector Add Airflow provider package catalog connector Feb 4, 2022
Copy link
Member

@kiersten-stokes kiersten-stokes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good and working well for me! Just one NIT below

Copy link
Member

@kiersten-stokes kiersten-stokes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@akchinSTC akchinSTC merged commit e7db8ef into elyra-ai:master Feb 8, 2022
@ptitzler ptitzler deleted the provider-package-catalog branch February 8, 2022 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:documentation Improvements or additions to documentation kind:enhancement New feature or request platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants