Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target multiple jobs from a sensor #4590

Closed
sryza opened this issue Aug 19, 2021 · 2 comments · Fixed by #4745
Closed

target multiple jobs from a sensor #4590

sryza opened this issue Aug 19, 2021 · 2 comments · Fixed by #4745
Assignees
Labels
area: sensor Related to Sensors

Comments

@sryza
Copy link
Contributor

sryza commented Aug 19, 2021

A use case we heard from a user

Multiple jobs are derived from a single graph. Each job processes data from a different data source. He'd like a single sensor to trigger all of them.

Prior to the graph/job/op changes, the user had a partition set for each of these jobs. With the graph/job/op changes, each partition set becomes a job.

Another hypothesized use case

Diverse jobs might all depend on the same event stream, and that event stream may be expensive to read from. E.g. there might be some external API that hosts data, and different jobs might do different things with that data. Querying that external API might be expensive, so you might want to have a single sensor responsible for polling it and dispatching to different jobs.

@sryza sryza added the area: sensor Related to Sensors label Aug 19, 2021
@yuhan
Copy link
Contributor

yuhan commented Aug 25, 2021

say, we have two jobs from the same graph in one repo:

event_reports_prod1 = event_reports.to_job(
    name="event_reports_prod1", resource_defs={"mode": ResourceDefinition.string_resource("prod1")}
)
event_reports_prod2 = event_reports.to_job(
    name="event_reports_prod2", resource_defs={"mode": ResourceDefinition.string_resource("prod2")}
)

Option 1
then, we want to have a sensor that targets these two jobs at the same time:

@sensor(jobs=[event_reports_prod1, event_reports_prod2])
def multi_job_sensor(context):
    if some condition:
        yield RunRequest(run_key=some_key, job_name="event_reports_prod1")
    else:    
        yield RunRequest(run_key=some_other_key, job_name="event_reports_prod2")

however, many params on @sensor/SensorDefinition are tied to a single job, so adding jobs to @sensor will conflict with those params (i.e. pipeline_name, solid_selection, mode)

Option 2
it may make more sense to introduce a separate for multi-job sensor, like:

@multi_target_sensor(jobs=[event_reports_prod1, event_reports_prod2])
def multi_job_sensor(context):
    if some condition:
        yield RunRequest(run_key=some_key, job_name="event_reports_prod1")
    else:    
        yield RunRequest(run_key=some_other_key, job_name="event_reports_prod2")

thoughts?

@yuhan yuhan self-assigned this Aug 25, 2021
@sryza
Copy link
Contributor Author

sryza commented Aug 25, 2021

@yuhan - thoughts on whether @multi_target_sensor would return a SensorDefinition or some different type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: sensor Related to Sensors
Projects
None yet
2 participants