Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve job list API with more fetching capabilities #16415

Merged
merged 16 commits into from
Sep 12, 2022

Conversation

lmossman
Copy link
Contributor

@lmossman lmossman commented Sep 7, 2022

What

Relates to #16369

As described in the issue linked above, the frontend currently does not have a great way to fetch all jobs up to a specific job ID with the current pagination implementation in the list jobs API endpoint.

This PR aims to solve this by adding a couple of fields to the list jobs API:

  • includingJobId is added to the JobListRequestBody, which will be used to retrieve all jobs created after and including the specified job, so that specific jobs can be linked directly with a URL and the frontend can retrieve all necessary jobs with a single request
  • totalJobCount is added to the JobReadList response, which contains the total count of jobs in the connection. This solves a tangentially-related issue where the frontend does not know if more jobs can be retrieved for a connection until all of them have been retrieved. Adding a total count value to the list jobs response is a standard approach in pagination to help the frontend understand if there are more jobs to load, and therefore decide whether or not to show the Load more button.

How

Add the above fields to the open API specification, and implement the logic for them through queries in the DefaultJobPersistence class. The only additional queries being added here are queries to retrieve a single job record and retrieve the count of jobs for a connection, which should both be very quick.

Recommended reading order

  1. airbyte-api/src/main/openapi/config.yaml - the API changes
  2. DefaultJobPersistence.java - the new query implementations
  3. JobHistoryHandler.java - where the new queries are used to form the new response
  4. tests

🚨 User Impact 🚨

No breaking changes for users - just some new fields being added to the list jobs API which API users can decide to use if they would like.

@github-actions github-actions bot added area/api Related to the api area/documentation Improvements or additions to documentation area/platform issues related to the platform area/scheduler area/server labels Sep 7, 2022
@lmossman lmossman temporarily deployed to more-secrets September 7, 2022 22:13 Inactive
@lmossman lmossman force-pushed the lmossman/add-including-job-to-list-jobs-api branch from cd91a7d to a49f985 Compare September 7, 2022 22:14
@lmossman lmossman marked this pull request as ready for review September 7, 2022 22:15
@lmossman lmossman temporarily deployed to more-secrets September 7, 2022 22:16 Inactive
Copy link
Contributor

@gosusnp gosusnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review here since I have a question about the API change itself.

@@ -3769,6 +3769,9 @@ components:
$ref: "#/components/schemas/JobConfigType"
configId:
type: string
includingJobId:
description: If the job with this ID exists for the specified connection, returns all jobs created after and including this job, or the full pagination pagesize if that list is smaller than a page. Otherwise, this field is ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels hard to use from a client perspective.
If I call the API using this attribute, the only way to know that the jobId doesn't exist is to check all the results to see if the jobId appears.
I am not sure about your use cases, but returning an empty list if the job doesn't exists feels more straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case is that someone can link to a particular job by creating a url such as https://cloud.airbyte.io/workspaces/a385d3d9-4512-4a3a-9240-cea3d237c822/connections/aa3b8ef1-2f39-48b8-bad4-dd12b0d3876d/status#575674::0 -- notice the #575674::0 at the end; the first number is the job ID (second is the attempt ID). This type of link can also be generated by clicking the "link" icon button on a job in the UI.

My reasoning here was that if someone correctly supplies a job ID that is in the connection, this will work as expected. Otherwise, they are specifying a job ID that doesn't exist in the connection, so there is nothing for us to do with that #jobId::attemptId suffix, and we should therefore just ignore that and return the normal job list.

However, your comment has me thinking that it might be a better user experience if we instead showed a specific message indicating that the linked job could not be found, with a button to return them to the normal job list page. With that approach, I think your suggestion of returning an empty list of the job cannot be found makes sense, so I can look into making that change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to return an empty list if the target job cannot be found

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. Can I voice the opposite opinion here? :D

I feel if we can't find the ID that's passed in, we should just return the specified page size instead and ignore trying to find that job. I think that would play nicer together with the frontend, since we otherwise might want to add additional logic in the FE.

The ID will just be part of the fragment of the URL, a user might accidentally change that to an invalid ID. If you try to load the page like that, we'd now just see an empty page with a "Load more" button. That feels like a bit weird behavior to me, and I think we rather want to just load the 25 first jobs still in this case. We can of course address that in the FE by making another API call if we see the list is empty, but had a totalCount that's > 0, but I feel it would be easier specifying this API in a way that it will return the regular page size if the ID isn't found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timroes I have actually already put up FE changes to show a message like The linked job could not be found with a link that takes them back to the normal sync history page: #16517
It didn't end up being too much more complex to do this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better UX. Ignore my comment then

@Override
public Long getJobCount(final Set<ConfigType> configTypes, final String connectionId) throws IOException {
final Result<Record> result = jobDatabase.query(ctx -> ctx.fetch(
"SELECT COUNT(*) FROM jobs WHERE CAST(config_type AS VARCHAR) in " + Sqls.toSqlInFragment(configTypes) + " AND scope = '" + connectionId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we are not using jooq to generate the query here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not any particularly good reason -- I was mainly just following the trend of the other job query implementations. But using jooq is definitely more maintainable, so I can switch to that instead 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the full queries that I added into jooq and they look a lot better now. Thanks for the suggestion!

@lmossman lmossman requested a review from a team as a code owner September 8, 2022 00:31
@github-actions github-actions bot added the area/frontend Related to the Airbyte webapp label Sep 8, 2022
@lmossman lmossman requested review from gosusnp and removed request for a team September 8, 2022 00:32
@lmossman lmossman temporarily deployed to more-secrets September 8, 2022 00:33 Inactive
@github-actions github-actions bot removed the area/frontend Related to the Airbyte webapp label Sep 8, 2022
@lmossman lmossman temporarily deployed to more-secrets September 8, 2022 00:37 Inactive
@lmossman lmossman temporarily deployed to more-secrets September 8, 2022 00:53 Inactive
@lmossman lmossman force-pushed the lmossman/add-including-job-to-list-jobs-api branch from 8f080cf to d5a946c Compare September 8, 2022 01:06
@lmossman lmossman temporarily deployed to more-secrets September 8, 2022 01:09 Inactive
Copy link
Contributor

@jdpgrailsdev jdpgrailsdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -3769,6 +3769,9 @@ components:
$ref: "#/components/schemas/JobConfigType"
configId:
type: string
includingJobId:
description: If the job with this ID exists for the specified connection, returns all jobs created after and including this job, or the full pagination pagesize if that list is smaller than a page. Returns an empty list if this job is specified and cannot be found in this connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename it starting jobId? It sounds more descriptive than includingJobId

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I actually like startingJobId, I think that is better. I'll make that change 👍

}

// list of jobs up to target job is larger than pagesize, so return that list
final String jobsSubquery = "(SELECT * FROM jobs WHERE CAST(config_type AS VARCHAR) in " + Sqls.toSqlInFragment(configTypes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we convert that to jooq as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that one is a bit harder to convert to jooq, because it is using the jobSelectAndJoin() method which takes in the subquery and inserts that into the SELECT statement string that it builds. However, I may be able to build the query with jooq and then call something like toSql() to convert it into a string that can be passed to that method. I'll play around with that for a bit but no guarantees on if I can get it working properly

Copy link
Contributor Author

@lmossman lmossman Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to convert these to jooq using getSql(ParamType.INLINED), so I have pushed that up to this PR as well! Thanks for the suggestion, I think it looks a lot cleaner now

@@ -353,6 +362,39 @@ public List<Job> listJobs(final Set<ConfigType> configTypes, final String config
jobSelectAndJoin(jobsSubquery) + ORDER_BY_JOB_TIME_ATTEMPT_TIME)));
}

@Override
public List<Job> listJobsIncludingId(final Set<ConfigType> configTypes, final String connectionId, final long targetJobId, final int pagesize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have use case for listJobsWithStatusIncludingJobId? I am asking because it feels like we may just want to have a configurable query function rather than specific queries each time.
Some parts such as pagination should be common for example.
If this make sense, we can track it as a task later, doesn't have to be in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we have a use case for that today -- currently the only thing using listJobsWithStatus is the JobCleaner which I don't think is even running, and it doesn't have the same need of directly linking to a specific job that the frontend has

@lmossman lmossman temporarily deployed to more-secrets September 8, 2022 18:20 Inactive
@lmossman lmossman temporarily deployed to more-secrets September 8, 2022 19:34 Inactive
@lmossman lmossman temporarily deployed to more-secrets September 8, 2022 22:53 Inactive
@lmossman lmossman temporarily deployed to more-secrets September 9, 2022 00:44 Inactive
@lmossman lmossman force-pushed the lmossman/add-including-job-to-list-jobs-api branch from c57a400 to ed71403 Compare September 9, 2022 17:57
@lmossman lmossman temporarily deployed to more-secrets September 9, 2022 18:00 Inactive
@lmossman lmossman temporarily deployed to more-secrets September 9, 2022 18:00 Inactive
@lmossman lmossman temporarily deployed to more-secrets September 12, 2022 21:17 Inactive
@lmossman lmossman merged commit a15288a into master Sep 12, 2022
@lmossman lmossman deleted the lmossman/add-including-job-to-list-jobs-api branch September 12, 2022 23:32
robbinhan pushed a commit to robbinhan/airbyte that referenced this pull request Sep 29, 2022
* start implementation of new persistence method

* add includingJobId and totalJobCount to job list request

* format

* update local openapi as well

* refactor queries into JOOQ and return empty list if target job cannot be found

* fix descriptions and undo changes from other branch

* switch including job to starting job

* fix job history handler tests

* rewrite jobs subqueries in jooq

* fix multiple config type querying

* remove unnecessary casts

* switch back to 'including' and return multiple of page size necessary to include job

* undo webapp changes

* fix test description

* format
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
* start implementation of new persistence method

* add includingJobId and totalJobCount to job list request

* format

* update local openapi as well

* refactor queries into JOOQ and return empty list if target job cannot be found

* fix descriptions and undo changes from other branch

* switch including job to starting job

* fix job history handler tests

* rewrite jobs subqueries in jooq

* fix multiple config type querying

* remove unnecessary casts

* switch back to 'including' and return multiple of page size necessary to include job

* undo webapp changes

* fix test description

* format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Related to the api area/documentation Improvements or additions to documentation area/platform issues related to the platform area/scheduler area/server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants