Improve job list API with more fetching capabilities #16415

lmossman · 2022-09-07T22:07:29Z

What

Relates to #16369

As described in the issue linked above, the frontend currently does not have a great way to fetch all jobs up to a specific job ID with the current pagination implementation in the list jobs API endpoint.

This PR aims to solve this by adding a couple of fields to the list jobs API:

includingJobId is added to the JobListRequestBody, which will be used to retrieve all jobs created after and including the specified job, so that specific jobs can be linked directly with a URL and the frontend can retrieve all necessary jobs with a single request
totalJobCount is added to the JobReadList response, which contains the total count of jobs in the connection. This solves a tangentially-related issue where the frontend does not know if more jobs can be retrieved for a connection until all of them have been retrieved. Adding a total count value to the list jobs response is a standard approach in pagination to help the frontend understand if there are more jobs to load, and therefore decide whether or not to show the Load more button.

How

Add the above fields to the open API specification, and implement the logic for them through queries in the DefaultJobPersistence class. The only additional queries being added here are queries to retrieve a single job record and retrieve the count of jobs for a connection, which should both be very quick.

🚨 User Impact 🚨

No breaking changes for users - just some new fields being added to the list jobs API which API users can decide to use if they would like.

gosusnp

Partial review here since I have a question about the API change itself.

gosusnp · 2022-09-07T22:36:12Z

airbyte-api/src/main/openapi/config.yaml

@@ -3769,6 +3769,9 @@ components:
            $ref: "#/components/schemas/JobConfigType"
        configId:
          type: string
+        includingJobId:
+          description: If the job with this ID exists for the specified connection, returns all jobs created after and including this job, or the full pagination pagesize if that list is smaller than a page. Otherwise, this field is ignored.


This feels hard to use from a client perspective.
If I call the API using this attribute, the only way to know that the jobId doesn't exist is to check all the results to see if the jobId appears.
I am not sure about your use cases, but returning an empty list if the job doesn't exists feels more straightforward.

The use case is that someone can link to a particular job by creating a url such as https://cloud.airbyte.io/workspaces/a385d3d9-4512-4a3a-9240-cea3d237c822/connections/aa3b8ef1-2f39-48b8-bad4-dd12b0d3876d/status#575674::0 -- notice the #575674::0 at the end; the first number is the job ID (second is the attempt ID). This type of link can also be generated by clicking the "link" icon button on a job in the UI.

My reasoning here was that if someone correctly supplies a job ID that is in the connection, this will work as expected. Otherwise, they are specifying a job ID that doesn't exist in the connection, so there is nothing for us to do with that #jobId::attemptId suffix, and we should therefore just ignore that and return the normal job list.

However, your comment has me thinking that it might be a better user experience if we instead showed a specific message indicating that the linked job could not be found, with a button to return them to the normal job list page. With that approach, I think your suggestion of returning an empty list of the job cannot be found makes sense, so I can look into making that change

I've changed this to return an empty list if the target job cannot be found

Sorry for the late reply. Can I voice the opposite opinion here? :D

I feel if we can't find the ID that's passed in, we should just return the specified page size instead and ignore trying to find that job. I think that would play nicer together with the frontend, since we otherwise might want to add additional logic in the FE.

The ID will just be part of the fragment of the URL, a user might accidentally change that to an invalid ID. If you try to load the page like that, we'd now just see an empty page with a "Load more" button. That feels like a bit weird behavior to me, and I think we rather want to just load the 25 first jobs still in this case. We can of course address that in the FE by making another API call if we see the list is empty, but had a totalCount that's > 0, but I feel it would be easier specifying this API in a way that it will return the regular page size if the ID isn't found.

@timroes I have actually already put up FE changes to show a message like The linked job could not be found with a link that takes them back to the normal sync history page: #16517
It didn't end up being too much more complex to do this

Even better UX. Ignore my comment then

gosusnp · 2022-09-07T22:40:26Z

...eduler-persistence/src/main/java/io/airbyte/scheduler/persistence/DefaultJobPersistence.java

+  @Override
+  public Long getJobCount(final Set<ConfigType> configTypes, final String connectionId) throws IOException {
+    final Result<Record> result = jobDatabase.query(ctx -> ctx.fetch(
+        "SELECT COUNT(*) FROM jobs WHERE CAST(config_type AS VARCHAR) in " + Sqls.toSqlInFragment(configTypes) + " AND scope = '" + connectionId


Any reason why we are not using jooq to generate the query here?

Not any particularly good reason -- I was mainly just following the trend of the other job query implementations. But using jooq is definitely more maintainable, so I can switch to that instead 👍

I've refactored the full queries that I added into jooq and they look a lot better now. Thanks for the suggestion!

jdpgrailsdev

benmoriceau · 2022-09-08T16:49:50Z

airbyte-api/src/main/openapi/config.yaml

@@ -3769,6 +3769,9 @@ components:
            $ref: "#/components/schemas/JobConfigType"
        configId:
          type: string
+        includingJobId:
+          description: If the job with this ID exists for the specified connection, returns all jobs created after and including this job, or the full pagination pagesize if that list is smaller than a page. Returns an empty list if this job is specified and cannot be found in this connection.


Should we rename it starting jobId? It sounds more descriptive than includingJobId

Yeah I actually like startingJobId, I think that is better. I'll make that change 👍

benmoriceau · 2022-09-08T16:53:53Z

...eduler-persistence/src/main/java/io/airbyte/scheduler/persistence/DefaultJobPersistence.java

+    }
+
+    // list of jobs up to target job is larger than pagesize, so return that list
+    final String jobsSubquery = "(SELECT * FROM jobs WHERE CAST(config_type AS VARCHAR) in " + Sqls.toSqlInFragment(configTypes)


Should we convert that to jooq as well?

I think that one is a bit harder to convert to jooq, because it is using the jobSelectAndJoin() method which takes in the subquery and inserts that into the SELECT statement string that it builds. However, I may be able to build the query with jooq and then call something like toSql() to convert it into a string that can be passed to that method. I'll play around with that for a bit but no guarantees on if I can get it working properly

I was able to convert these to jooq using getSql(ParamType.INLINED), so I have pushed that up to this PR as well! Thanks for the suggestion, I think it looks a lot cleaner now

gosusnp · 2022-09-08T17:56:20Z

...eduler-persistence/src/main/java/io/airbyte/scheduler/persistence/DefaultJobPersistence.java

@@ -353,6 +362,39 @@ public List<Job> listJobs(final Set<ConfigType> configTypes, final String config
        jobSelectAndJoin(jobsSubquery) + ORDER_BY_JOB_TIME_ATTEMPT_TIME)));
  }

+  @Override
+  public List<Job> listJobsIncludingId(final Set<ConfigType> configTypes, final String connectionId, final long targetJobId, final int pagesize)


We don't have use case for listJobsWithStatusIncludingJobId? I am asking because it feels like we may just want to have a configurable query function rather than specific queries each time.
Some parts such as pagination should be common for example.
If this make sense, we can track it as a task later, doesn't have to be in this PR.

I don't believe we have a use case for that today -- currently the only thing using listJobsWithStatus is the JobCleaner which I don't think is even running, and it doesn't have the same need of directly linking to a specific job that the frontend has

… be found

… to include job

* start implementation of new persistence method * add includingJobId and totalJobCount to job list request * format * update local openapi as well * refactor queries into JOOQ and return empty list if target job cannot be found * fix descriptions and undo changes from other branch * switch including job to starting job * fix job history handler tests * rewrite jobs subqueries in jooq * fix multiple config type querying * remove unnecessary casts * switch back to 'including' and return multiple of page size necessary to include job * undo webapp changes * fix test description * format

github-actions bot added area/api Related to the api area/documentation Improvements or additions to documentation area/platform issues related to the platform area/scheduler area/server labels Sep 7, 2022

lmossman temporarily deployed to more-secrets September 7, 2022 22:13 Inactive

lmossman force-pushed the lmossman/add-including-job-to-list-jobs-api branch from cd91a7d to a49f985 Compare September 7, 2022 22:14

lmossman marked this pull request as ready for review September 7, 2022 22:15

lmossman requested review from timroes, alovew, benmoriceau, gosusnp and jdpgrailsdev September 7, 2022 22:15

lmossman temporarily deployed to more-secrets September 7, 2022 22:16 Inactive

gosusnp reviewed Sep 7, 2022

View reviewed changes

lmossman requested a review from a team as a code owner September 8, 2022 00:31

github-actions bot added the area/frontend Related to the Airbyte webapp label Sep 8, 2022

lmossman requested review from gosusnp and removed request for a team September 8, 2022 00:32

lmossman temporarily deployed to more-secrets September 8, 2022 00:33 Inactive

github-actions bot removed the area/frontend Related to the Airbyte webapp label Sep 8, 2022

lmossman temporarily deployed to more-secrets September 8, 2022 00:37 Inactive

lmossman temporarily deployed to more-secrets September 8, 2022 00:53 Inactive

lmossman force-pushed the lmossman/add-including-job-to-list-jobs-api branch from 8f080cf to d5a946c Compare September 8, 2022 01:06

lmossman temporarily deployed to more-secrets September 8, 2022 01:09 Inactive

jdpgrailsdev approved these changes Sep 8, 2022

View reviewed changes

benmoriceau reviewed Sep 8, 2022

View reviewed changes

gosusnp approved these changes Sep 8, 2022

View reviewed changes

lmossman temporarily deployed to more-secrets September 8, 2022 18:20 Inactive

lmossman temporarily deployed to more-secrets September 8, 2022 19:34 Inactive

lmossman temporarily deployed to more-secrets September 8, 2022 22:53 Inactive

gosusnp approved these changes Sep 8, 2022

View reviewed changes

lmossman temporarily deployed to more-secrets September 9, 2022 00:44 Inactive

lmossman mentioned this pull request Sep 9, 2022

🪟 🐛 Fix direct job linking to work with pagination #16517

Merged

lmossman added 14 commits September 9, 2022 10:56

start implementation of new persistence method

8eb89b8

add includingJobId and totalJobCount to job list request

88a99d8

format

9002461

update local openapi as well

80c213e

refactor queries into JOOQ and return empty list if target job cannot…

aeaffad

… be found

fix descriptions and undo changes from other branch

df71109

switch including job to starting job

d68fc5a

fix job history handler tests

ac750d9

rewrite jobs subqueries in jooq

ef2d6dc

fix multiple config type querying

6419233

remove unnecessary casts

84b8dbe

switch back to 'including' and return multiple of page size necessary…

5d53ab7

… to include job

undo webapp changes

5a5adea

fix test description

ed71403

lmossman force-pushed the lmossman/add-including-job-to-list-jobs-api branch from c57a400 to ed71403 Compare September 9, 2022 17:57

format

c991275

lmossman temporarily deployed to more-secrets September 9, 2022 18:00 Inactive

Merge branch 'master' into lmossman/add-including-job-to-list-jobs-api

09d8b92

lmossman temporarily deployed to more-secrets September 12, 2022 21:17 Inactive

lmossman merged commit a15288a into master Sep 12, 2022

lmossman deleted the lmossman/add-including-job-to-list-jobs-api branch September 12, 2022 23:32

octavia-squidington-iii mentioned this pull request Sep 13, 2022

Bump Airbyte version from 0.40.5 to 0.40.6 #16656

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve job list API with more fetching capabilities #16415

Improve job list API with more fetching capabilities #16415

lmossman commented Sep 7, 2022 •

edited

Loading

gosusnp left a comment

gosusnp Sep 7, 2022

lmossman Sep 7, 2022

lmossman Sep 8, 2022

timroes Sep 9, 2022

lmossman Sep 9, 2022

timroes Sep 9, 2022

gosusnp Sep 7, 2022

lmossman Sep 7, 2022

lmossman Sep 8, 2022

jdpgrailsdev left a comment

benmoriceau Sep 8, 2022

lmossman Sep 8, 2022

benmoriceau Sep 8, 2022

lmossman Sep 8, 2022

lmossman Sep 8, 2022 •

edited

Loading

gosusnp Sep 8, 2022

lmossman Sep 8, 2022

Improve job list API with more fetching capabilities #16415

Improve job list API with more fetching capabilities #16415

Conversation

lmossman commented Sep 7, 2022 • edited Loading

What

How

Recommended reading order

🚨 User Impact 🚨

gosusnp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdpgrailsdev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmossman Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmossman commented Sep 7, 2022 •

edited

Loading

lmossman Sep 8, 2022 •

edited

Loading