Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Add spec to source/destination_definition model #6174

Closed
cgardens opened this issue Sep 17, 2021 · 2 comments
Closed

[EPIC] Add spec to source/destination_definition model #6174

cgardens opened this issue Sep 17, 2021 · 2 comments
Assignees
Labels
needs-triage priority/medium Medium priority type/enhancement New feature or request type/refactoring Focused on refactoring.

Comments

@cgardens
Copy link
Contributor

cgardens commented Sep 17, 2021

Tell us about the problem you're trying to solve

We recently added a cache for specs to improve performance in Cloud. While that seems to solve the performance issues, we're running into a new headache around fetching a spec.

For secrets management, the config repository needs access to the spec to be able to identify which fields are secrets (example). Because these specs now are needed even at times when migrations are running having the possibility that trying to get the spec might force us to kick off a job is uncomfortable. We have double checked that for now it is safe, but it remains an unintuitive dependency that can bite us.

Describe the solution you’d like

I think what we're seeing is that we make our lives much easier by keeping the spec in our database in the source/destination_definitions table. This should be safe to do because the spec in the definitions table is scoped by version which should be static for each connector version.

Even for dev images the behavior requires being a little more careful. Each time you change the spec, you would need to re-update the connector. This is a little odd but I would argue is a pretty minor tradeoff for a development use case that we can document clearly (we can even broadcast it in the UI).

I think this can be done resiliently and without any extra burden on users. There are 2 places that would be affected.

Seeded Connectors

We seed connectors into new versions of Airbyte. Right now we just seed the docker container name and version. In addition we should see the spec itself. This information can be published when we publish a new version of a connector (actually it already is published into the cache, so it's no extra work). Then when we build the Airbyte platform it can consume the specs for each version of the connector that it uses.

Manually updating or adding connectors in the UI

The interface exposed to the user can still be exactly the same here. All they have to do is say the docker image name and tag. Then based off of that, we would run get spec on that container to get the spec before saving it into the source/destination definition. This way, everywhere else in the code, we will know we can fetch the spec from the database and not worry about triggering a job in temporal. If the spec is the bucket cache, we can just use that and we do not even need to run the spec method on that container.

I think this is removes a lot of headache and performance problems in the app without being harder for users to use and without being anymore brittle.

Describe the alternative you’ve considered or used

We've done a lot of caching: in memory cache, bucket cache, etc.

To preempt an argument... this is not messing with the Airbyte protocol. That all stays the same. It is affectively a smart caching strategy that our app / platform uses that embraces the Airbyte protocol. Fundamentally it still relies on connectors exposing a get spec endpoint and that that return value is static and immutable for each version of a connector.

Execution

@jrhizor fyi
@sherifnada wdyt?

@cgardens cgardens added the type/enhancement New feature or request label Sep 17, 2021
@sherifnada
Copy link
Contributor

Paying the price of a static spec seems acceptable for the gain.

we should tie the spec to a specific version and not just to the definition wholesale because the spec would need to be updated every time the connector version is changed

@cgardens cgardens added priority/medium Medium priority type/refactoring Focused on refactoring. labels Sep 20, 2021
@cgardens
Copy link
Contributor Author

This would help in this case as well: #6415 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage priority/medium Medium priority type/enhancement New feature or request type/refactoring Focused on refactoring.
Projects
None yet
Development

No branches or pull requests

4 participants