Processing logic in Sensors #43579
-
Hi everyone I have a case where my dags are growing and they soon might contain possibly hundreds of tasks. A lot of these tasks are just sensors waiting for a specific time - this makes a problem for the readability of the dag. I have already taken the approach to split these dags into multiple smaller dags with the help of TriggerDagRunOperator but I would still like to "cut off" these sensors aswell. I was thinking about combining sensor logic with the processing logic that is executed right after the sensor. Is it a bad practice to include that processing logic into lets say a task decorated with @task.sensor? I was also considering implementing a custom operator inheriting both from a DateTimeSensor and PythonOperator but this seems tricky as these two dont have a lot to do with each other and I am worried that they would somehow clash. Thoughts on that? Any feedback much appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
You should look at deferrable operators. Quite often deferrable operator is precisely the kind of combo that you wait on something first (and you defer until this happens) and then you do stuff. Also you might take a look at datasets instead of TriggerDagRunOperator - you can have one dag produce the dataset and the other consume it and this would automatically trigger the dependent dag when the first completes. That allows you to also visualy see the dependencies better and have your "Datasets" logically represent the actual datasets that are produced/consumed by separate dags - making them more logical |
Beta Was this translation helpful? Give feedback.
You should look at deferrable operators. Quite often deferrable operator is precisely the kind of combo that you wait on something first (and you defer until this happens) and then you do stuff.
Also you might take a look at datasets instead of TriggerDagRunOperator - you can have one dag produce the dataset and the other consume it and this would automatically trigger the dependent dag when the first completes. That allows you to also visualy see the dependencies better and have your "Datasets" logically represent the actual datasets that are produced/consumed by separate dags - making them more logical