Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deferrable mode for BeamRunPythonPipelineOperator #44386

Merged

Conversation

MaksYermak
Copy link
Contributor

In this PR I have prepared a fix for an error in deferrable mode for BeamRunPythonPipelineOperator.

This error happens on a distributed system when the user has trigger and worker on different machines. BeamRunPythonPipelineOperator needs a local py file for starting a Job. Users can specify a path to the py file which is located in GCS bucket and then the operator will download this file to the local system. In the current deferrable mode implementation operator downloads py file before going to the deferrable mode. It means that on a distributed system the file is downloaded on the worker machine not on the trigger machine. And then when the operator tries to execute a Job on the trigger machine Airflow throws an error that the executable py file does not exist.

Fix for the same issue for the BeamRunJavaPipelineOperator: #39371


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:apache-beam provider:google Google (including GCP) related issues labels Nov 26, 2024
@eladkal
Copy link
Contributor

eladkal commented Nov 26, 2024

This error happens on a distributed system when the user has trigger and worker on different machines.

Do we have a way to verify if other operators / other providers suffer from this problem? If so, lets open dedicated issues for them so we can keep track on it.

@MaksYermak
Copy link
Contributor Author

This error happens on a distributed system when the user has trigger and worker on different machines.

Do we have a way to verify if other operators / other providers suffer from this problem? If so, lets open dedicated issues for them so we can keep track on it.

To be honest I do not see any way how we can check the existence of this problem for other operators/providers.

@eladkal eladkal merged commit 43adccf into apache:main Nov 27, 2024
48 checks passed
prabhusneha pushed a commit to astronomer/airflow that referenced this pull request Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants