Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggestion: save state from temporal temporal #7253

Conversation

cgardens
Copy link
Contributor

@cgardens cgardens commented Oct 21, 2021

Branching off this PR and thinking about a different approach: #7219

Main things I'm trying to focus on:

  1. save the state from temporal instead the job submitter. more robust to transient db failures because we can retry inside temporal (but we can't in the job submitter)
  2. do not add config database as a dependency in the DefaultJobPersistence. I think it's important to keep these databases separate in the code.
  3. Handling making non-document-database-style queries in the configs database. There are two choices here, and I'm pretty ambivalent between them. See options below.

Option 1: Separate Abstraction for "complex" queries in the configs db.

In this PR, I added a new persistence abstraction for the config db, called ConfigPersistence2 (just ignore the bad naming for now, if we go in this direction we can fix it). The new abstraction is akin to what we do for the JobsPersistence, it takes in the database and then all the queries we write against that db can go in there. If we go with this approach we will have 3 abstractions.
* ConfigPersistence - document-database-style interface over the config database.
* ConfigRepository - syntactic sugar over ConfigPersistence
* ConfigPersistence2 - all non document-database-style queries.
* as we normalize the config database schema, ConfigPersistence2 will get bigger and ConfigPersistence will get deemphasized.

Option 2: treat sync state like we do other objects in the configs db

we just add the sync state "table" the airbyte_configs table and have it look the same as all of the other configs. Then we worry about normalizing it later when we do it for the rest of the configs.

If we think Option2 is the right thing for now, that's fine with me too. Option 1 leaves us in a halfsey state and if we are not going to normalize the schemas soon, then that's probably a bad thing.

@github-actions github-actions bot added area/core area/worker Related to worker labels Oct 21, 2021
@tuliren
Copy link
Contributor

tuliren commented Oct 21, 2021

Charles and I talked about this, and we decided to go with option 2. I will merge it into my PR and update it accordingly.

@cgardens cgardens added area/platform issues related to the platform and removed area/core labels Oct 23, 2021
@tuliren tuliren merged commit bba689a into liren/copy-job-attempt-state-take-2 Oct 24, 2021
@tuliren tuliren deleted the cgardens/copy-job-attempt-state-take-2 branch October 24, 2021 16:18
@tuliren
Copy link
Contributor

tuliren commented Oct 24, 2021

@cgardens, to go further on option 2, we can just add a state field in StandardSync, instead of adding a new config schema type. I think that's the most straightforward way before the normalization. WDYT?

@cgardens
Copy link
Contributor Author

i think it is worth keeping them separate because we know the lifecycle of the objects is different. StandardSync is a configuration object that gets updated via user interaction. The state gets updated by the runnings syncs.

tuliren added a commit that referenced this pull request Oct 26, 2021
* Add migration to create latest state table

* Log migration name

* Expose db variables to airbyte-db

* Implement migration

* Fix migration test

* temp

* Rebase on master

* Save state in temporal (#7253)

* Copy state to airbyte_configs table

* Add standard sync state

* Move state methods to config repository

* Add unit tests

* Fix unit tests

* Register standard sync state in migration

* Add comment

* Use config model instead of json node

* Add comments

* Remove unnecessary method

* Fix migration query

* Remove unused config database

* Move persist statement and log the call

* Update dev doc

* Add unit tests for sync workflow

Co-authored-by: Charles <giardina.charles@gmail.com>
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
* Add migration to create latest state table

* Log migration name

* Expose db variables to airbyte-db

* Implement migration

* Fix migration test

* temp

* Rebase on master

* Save state in temporal (airbytehq#7253)

* Copy state to airbyte_configs table

* Add standard sync state

* Move state methods to config repository

* Add unit tests

* Fix unit tests

* Register standard sync state in migration

* Add comment

* Use config model instead of json node

* Add comments

* Remove unnecessary method

* Fix migration query

* Remove unused config database

* Move persist statement and log the call

* Update dev doc

* Add unit tests for sync workflow

Co-authored-by: Charles <giardina.charles@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/worker Related to worker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants