Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package: Add support for different query job types: #452

Merged
merged 12 commits into from
Jun 24, 2024

Conversation

haiqi96
Copy link
Contributor

@haiqi96 haiqi96 commented Jun 19, 2024

Description

This PR is a preparation to let query-scheduler supports more than one type of jobs.
The following changes are made:

  1. Add a new "type" column to the job metadata table.
  2. Adds an abstract base class for QueryJob and QueryConfig, which will be used as the base class for different types of job
  3. Refactor the query-scheduler to handle jobs differently based on their type
  4. Refactor some helper functions in the query-scheduler so they can handle generic QueryJob
  5. Updated job submission logic to add job's type

Validation performed

Built the package
Verified that search and reducer works properly from both commandline and webui

@@ -124,6 +131,7 @@ def fetch_cancelling_search_jobs(db_conn) -> list:
SELECT {QUERY_JOBS_TABLE_NAME}.id as job_id
FROM {QUERY_JOBS_TABLE_NAME}
WHERE {QUERY_JOBS_TABLE_NAME}.status={QueryJobStatus.CANCELLING}
AND {QUERY_JOBS_TABLE_NAME}.type={QueryJobType.SEARCH}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume we only cancel search job to simplify the implementation

Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WebUI changes look good to me.

@haiqi96 haiqi96 changed the title Add job types package: Refactor query job in the search-cluster to support different job types. Jun 19, 2024
@haiqi96 haiqi96 marked this pull request as ready for review June 19, 2024 21:10

@abstractmethod
def job_config(self) -> QueryConfig: ...


class SearchJob(QueryJob):
search_config: SearchConfig
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename this to just "job_config"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think search_config is fine.

@kirkrodrigues kirkrodrigues requested a review from gibber9809 June 24, 2024 13:03
@@ -124,6 +130,7 @@ def fetch_cancelling_search_jobs(db_conn) -> list:
SELECT {QUERY_JOBS_TABLE_NAME}.id as job_id
FROM {QUERY_JOBS_TABLE_NAME}
WHERE {QUERY_JOBS_TABLE_NAME}.status={QueryJobStatus.CANCELLING}
AND {QUERY_JOBS_TABLE_NAME}.type={QueryJobType.SEARCH}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the naming of QueryJobType.SEARCH gets confusing for me here. When I see this my first thought is wondering about what happens to aggregation jobs, but actually QueryJobType.SEARCH represents search AND aggregation jobs.

I think we might want to eventually split these out into different job types, but to avoid holding you up here how about we just rename this to QueryJobType.SEARCH_OR_AGGREGATION which is ugly but more clear at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it also require renaming other places in the query_scheduler?
for example, "pending_query_job" will become "pending_query_or_aggregation_job", and "search_config" becomes "search_or_aggregation_config"?

I personally prefer to not rename those, but only change QueryJobType.SEARCH to QueryJobType.SEARCH_OR_AGGREGATION.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's just rename the SEARCH_OR_AGGREGATION and leave everything else. The aggregations are a query of sorts, so I think its fine.


@abstractmethod
def job_config(self) -> QueryConfig: ...


class SearchJob(QueryJob):
search_config: SearchConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think search_config is fine.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small questions / suggestions.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title & body, how about:

package: Add support for different query job types:

  • Add abstract classes for QueryJobType and QueryJobConfig.
  • Submit and track job type in the search jobs table.
  • Guard search-job specific handling in the scheduler.

@haiqi96 haiqi96 changed the title package: Refactor query job in the search-cluster to support different job types. package: Add support for different query job types Jun 24, 2024
Copy link
Contributor

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@haiqi96 haiqi96 changed the title package: Add support for different query job types package: Add support for different query job types: Jun 24, 2024
@haiqi96 haiqi96 merged commit 94509a8 into y-scope:main Jun 24, 2024
2 checks passed
@haiqi96 haiqi96 deleted the AddJobTypes branch June 28, 2024 14:43
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
- Add abstract classes for QueryJobType and QueryJobConfig.
- Submit and track job type in the search jobs table.
- Guard search-job specific handling in the scheduler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants