[EPIC] Add spec to source/destination_definition model #6174
Labels
needs-triage
priority/medium
Medium priority
type/enhancement
New feature or request
type/refactoring
Focused on refactoring.
Tell us about the problem you're trying to solve
We recently added a cache for specs to improve performance in Cloud. While that seems to solve the performance issues, we're running into a new headache around fetching a spec.
For secrets management, the config repository needs access to the spec to be able to identify which fields are secrets (example). Because these specs now are needed even at times when migrations are running having the possibility that trying to get the spec might force us to kick off a job is uncomfortable. We have double checked that for now it is safe, but it remains an unintuitive dependency that can bite us.
Describe the solution you’d like
I think what we're seeing is that we make our lives much easier by keeping the spec in our database in the source/destination_definitions table. This should be safe to do because the spec in the definitions table is scoped by version which should be static for each connector version.
Even for dev images the behavior requires being a little more careful. Each time you change the spec, you would need to re-update the connector. This is a little odd but I would argue is a pretty minor tradeoff for a development use case that we can document clearly (we can even broadcast it in the UI).
I think this can be done resiliently and without any extra burden on users. There are 2 places that would be affected.
Seeded Connectors
We seed connectors into new versions of Airbyte. Right now we just seed the docker container name and version. In addition we should see the spec itself. This information can be published when we publish a new version of a connector (actually it already is published into the cache, so it's no extra work). Then when we build the Airbyte platform it can consume the specs for each version of the connector that it uses.
Manually updating or adding connectors in the UI
The interface exposed to the user can still be exactly the same here. All they have to do is say the docker image name and tag. Then based off of that, we would run get spec on that container to get the spec before saving it into the source/destination definition. This way, everywhere else in the code, we will know we can fetch the spec from the database and not worry about triggering a job in temporal. If the spec is the bucket cache, we can just use that and we do not even need to run the spec method on that container.
I think this is removes a lot of headache and performance problems in the app without being harder for users to use and without being anymore brittle.
Describe the alternative you’ve considered or used
We've done a lot of caching: in memory cache, bucket cache, etc.
To preempt an argument... this is not messing with the Airbyte protocol. That all stays the same. It is affectively a smart caching strategy that our app / platform uses that embraces the Airbyte protocol. Fundamentally it still relies on connectors exposing a get spec endpoint and that that return value is static and immutable for each version of a connector.
Execution
StandardSourceDefinitionRead
in the API so that the UI does not need to make a separate call to get a spec.@jrhizor fyi
@sherifnada wdyt?
The text was updated successfully, but these errors were encountered: