Processing logic in Sensors #43579

corkienjoyer · 2024-11-01T10:51:29Z

corkienjoyer
Nov 1, 2024

Hi everyone

I have a case where my dags are growing and they soon might contain possibly hundreds of tasks. A lot of these tasks are just sensors waiting for a specific time - this makes a problem for the readability of the dag. I have already taken the approach to split these dags into multiple smaller dags with the help of TriggerDagRunOperator but I would still like to "cut off" these sensors aswell.

I was thinking about combining sensor logic with the processing logic that is executed right after the sensor. Is it a bad practice to include that processing logic into lets say a task decorated with @task.sensor? I was also considering implementing a custom operator inheriting both from a DateTimeSensor and PythonOperator but this seems tricky as these two dont have a lot to do with each other and I am worried that they would somehow clash.

Thoughts on that? Any feedback much appreciated!

Answered by potiuk

Nov 1, 2024

You should look at deferrable operators. Quite often deferrable operator is precisely the kind of combo that you wait on something first (and you defer until this happens) and then you do stuff.

Also you might take a look at datasets instead of TriggerDagRunOperator - you can have one dag produce the dataset and the other consume it and this would automatically trigger the dependent dag when the first completes. That allows you to also visualy see the dependencies better and have your "Datasets" logically represent the actual datasets that are produced/consumed by separate dags - making them more logical

View full answer

potiuk · 2024-11-01T11:54:01Z

potiuk
Nov 1, 2024
Collaborator

You should look at deferrable operators. Quite often deferrable operator is precisely the kind of combo that you wait on something first (and you defer until this happens) and then you do stuff.

Also you might take a look at datasets instead of TriggerDagRunOperator - you can have one dag produce the dataset and the other consume it and this would automatically trigger the dependent dag when the first completes. That allows you to also visualy see the dependencies better and have your "Datasets" logically represent the actual datasets that are produced/consumed by separate dags - making them more logical

1 reply

corkienjoyer Nov 2, 2024
Author

I think this is exactly what I wanted:

class ScheduledPythonOperator(DateTimeSensorAsync):

    def __init__(
        self,
        *,
        python_callable: Callable,
        op_args: Collection[Any] | None = None,
        op_kwargs: Mapping[str, Any] | None = None,
        **kwargs,
    ) -> None:
        self.python_callable = python_callable
        self.op_args = op_args or ()
        self.op_kwargs = op_kwargs or {}
        super().__init__(
            **kwargs,
        )

    def execute_complete(self, context: Context, event: Any = None) -> None:
        self.python_callable(*self.op_args, **self.op_kwargs)

Simple as that, just overcomplicated it in my head.

Edit: After briefly going through datasets documentation - I don't think they apply to my usecase but I'll dig deeper into that aswell - seems like an interesting read.

Dziękuję!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing logic in Sensors #43579

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Processing logic in Sensors #43579

corkienjoyer Nov 1, 2024

Replies: 1 comment · 1 reply

potiuk Nov 1, 2024 Collaborator

corkienjoyer Nov 2, 2024 Author

corkienjoyer
Nov 1, 2024

Replies: 1 comment 1 reply

potiuk
Nov 1, 2024
Collaborator

corkienjoyer Nov 2, 2024
Author