-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a public interface for custom weight_rule implementation #35210
Add a public interface for custom weight_rule implementation #35210
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the modelling and the code - I am just not sure whether the pickling of DAGs after parsing is sufficient such that a custom weight rule gets to Scheduler. Have you tested with separated DAG parser and Scheduler w/o DAG code available?
If this works then I beleieve with some documentation we can make a full review. LIKE it!
In the last commit, I removed the attribute
I'm still trying to improve it by providing a task instance instead of a task to the new class in order to support dynamic priority weight. |
I had to make some changes to provide a task instance instead of a task to get_weight method. I tested it in Breeze (CeleryExecutor + Postgres) with this dag: from datetime import datetime
from airflow.decorators import dag, task
@dag(
dag_id="test_weight_rule",
schedule_interval=None,
start_date=datetime(2023, 1, 1),
tags=["test"]
)
def test_weight_rule():
@task(weight_rule="absolute", priority_weight=2)
def task1():
print("task1")
@task(weight_rule="downstream")
def task2():
print("task2")
@task(weight_rule="airflow.custom_priority_strategy.CustomPriorityStrategy")
def task3():
print("task3")
@task
def task4():
print("task4")
[task1(), task2(), task3()] >> task4()
test_weight_rule() And for the class from airflow.models.taskinstance import TaskInstance
from airflow.task.priority_strategy import PriorityWeightStrategy
class CustomPriorityStrategy(PriorityWeightStrategy):
def get_weight(self, ti: TaskInstance):
return max(3 - ti._try_number + 1, 1) I created a Dag run, and then I cleared the task a few times:
I will test in more complex examples, multiple concurrent dags, and in different situations (failure then retry, back from the deferred state, backfill, etc.); if all looks good, I'll add the documentation with a few examples and mark it as ready for review. cc: @eladkal |
This looks really good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good work @hussein-awala
I merge to test it and get some feebacks before 2.8.0 RC. |
Late to the game - but I think (maybe I am wrong) we have potential security issue here (but it's easy to fix). For quite some time wa have been very careful to not allow to execute the code that can be contributed by DAG authors, that can be executed in the context of scheduler. I believe this change breaks this assumption. This Luckily it's easy to mitigate - similarly as timetables for example - the custom weight rules need to be registered via plugin mechanism. Plugins are special - because they can be only registered via "plugins" folder (where DAG authors have no access to write to) or via package entrypoints (and package needs to be installed in scheduler's virtualenv. So I think this needs to be done as follow-up |
cc: @hussein-awala |
Are you sure it can inject "any" code by DAGs? I rather fear that if a user follows the example and deploys a similar copy into the UPDATE: Okay I was wrong. Just tested with an example DAG and if you use the module name |
I will take a look tomorrow at the history of this security issue and if it's the case for the new feature.
If the dags folder is added to |
yep. it is
Nope. There is no protection. Whatever is in PYTHONPATH can be used ... and .. airflow will AUTOMATICALLY add Some more context on that one. Historically, when dag file processor was not standalone this was even more important. This is also due to historical reasons. DAGFileProcessor in likely 9X* of airflow installations is not "standalone". It is a newly forked process, so it is not really the "same" context as scheduler - those are different proceses. But they share everything else (memory, filesystem and they use the same PYTHONPATH - there is only one process to set the original PYTHONPATH to ( It's only after |
BTW. I realized that this is something we should describe in our Security Model https://airflow.apache.org/docs/apache-airflow/stable/security/security_model.html. I will draft PR soon. |
Hey @hussein-awala @jscheffl -> I've proposed #36022 (subject to feedback and discussion - describing the way I understand the current model of DAG Author capabilities - and explaing (I hope) why this change violates it. Since we are close to 2.8.0 - do you think @hussein-awala (of course if we agree that it is, indeed, security vulnerability to leave it in) you will be able to get the "plugin" mechanism in, or should we revert it and implement it again "properly" in 2.9.0 ? |
* Add a public interface for custom weight_rule implementation * Remove _weight_strategy attribute * Move priority weight calculation to TI to support advanced strategies * Fix loading the var from mapped operators and simplify loading it from task * Update default value and deprecated the other one * Update task endpoint API spec * fix tests * Update docs and add dag example * Fix serialization test * revert change in spark provider * Update unit tests (cherry picked from commit 3385113)
…pache#35210)" This reverts commit 3385113.
…ntation (apache#35210)" (apache#36066)" This reverts commit f60d458.
…35210) * Add a public interface for custom weight_rule implementation * Remove _weight_strategy attribute * Move priority weight calculation to TI to support advanced strategies * Fix loading the var from mapped operators and simplify loading it from task * Update default value and deprecated the other one * Update task endpoint API spec * fix tests * Update docs and add dag example * Fix serialization test * revert change in spark provider * Update unit tests
This PR is a proposition to add a new public interface for weight rules (weight strategy) to allow the users to create custom classes to calculate the task priority weight.
Slack thread