-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add deployments to tracking model #4837
Conversation
...eduler/persistence/src/main/java/io/airbyte/scheduler/persistence/DefaultJobPersistence.java
Outdated
Show resolved
Hide resolved
/** | ||
* deploymentId - identifier for the deployment | ||
*/ | ||
private final UUID deploymentId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opinion: I think it's useful to leave a comment here summarising how deploymetId -> workspace tie together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instanceId
? I think instance
has been the most common way to refer to the concept of a deployment
(and that's also reflected in our docs). Either is fine with me, just don't want to diverge naming. Feels like it's an AirbyteInstance
with an instanceId
and deploymentMode
/deploymentEnvironment
(vars that describe the deployment of the instance with that id).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed with @jrhizor offline. our goal here is to make sure the data model matches naming in the docs. we are keeping deployment, but i updated naming in docs to match deployment.
/** | ||
* Returns a deployment UUID. | ||
*/ | ||
Optional<UUID> getDeployment() throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving the deployment id -> all workspaces for a deployment comment here is also fine.
@@ -330,6 +338,36 @@ public void importDatabaseFromArchive(final Path storageRoot, final String airby | |||
} | |||
} | |||
|
|||
/** | |||
* The deployment concept is specific to the environment that Airbyte is running in (not the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing. I was just wondering why.
@@ -176,6 +179,18 @@ public void configure() { | |||
server.join(); | |||
} | |||
|
|||
private static void createDeploymentIfNoneExists(final JobPersistence jobPersistence) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make sure I understand: it makes sense to have the server do this because the scheduler's start up depends on the server. this way, we know the scheduler can access the deployment id as soon as it starts up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. at least how we have everything structured now, we know that the main method of the server runs first and scheduler waits for the database to be in a ready state (in other words fully set up by the sever) before it does anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is actually a great reminder... because now the the tracker depends on the database for the deployment id, so we cannot initialize the tracking client until after busy loop that waits for the db to be set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment - looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall. just some comments on naming.
/** | ||
* deploymentId - identifier for the deployment | ||
*/ | ||
private final UUID deploymentId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instanceId
? I think instance
has been the most common way to refer to the concept of a deployment
(and that's also reflected in our docs). Either is fine with me, just don't want to diverge naming. Feels like it's an AirbyteInstance
with an instanceId
and deploymentMode
/deploymentEnvironment
(vars that describe the deployment of the instance with that id).
...eduler/persistence/src/main/java/io/airbyte/scheduler/persistence/DefaultJobPersistence.java
Outdated
Show resolved
Hide resolved
171c035
to
a60ccd9
Compare
deployment clean clean up key/value; add deployment mode add coment explaining deployment id lifecycle move client initialization so that it will include deployment id unit test fix key case docs instance => deployment fix setdeployment method
3706f57
to
129ca8f
Compare
configs.getTrackingStrategy(), | ||
// todo (cgardens) - we need to do the `#runServer` pattern here that we do in `ServerApp` so that | ||
// the deployment mode can be set by the cloud version. | ||
new Deployment(DeploymentMode.OSS, jobPersistence.getDeployment().orElseThrow(), configs.getWorkerEnvironment()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrhizor i did this heinous thing to get around the merge conflict. Obviously this is not how we want to set deployment mode. do you have thoughts on the right way to inject it? i'm still wrapping my head around the new factory.
throws IOException { | ||
// filter out the deployment record from the import data, if it exists. | ||
Stream<JsonNode> stream = metadataTableStream | ||
.filter(record -> record.get(DefaultJobPersistence.METADATA_KEY_COL).asText().equals(DatabaseSchema.AIRBYTE_METADATA.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgardens, this line looks problematic. The metadata records are like this:
Record: {"key":"server_uuid","value":"e895a584-7dbf-48ce-ace6-0bc9ea570c34"}
Record: {"key":"deployment_id","value":"55b923a9-42e5-4b0d-a5d2-b8d0316bfb2b"}
Record: {"key":"airbyte_version","value":"dev"}
Record: {"key":"2021-07-26T03:34:31.996221Z_init_db","value":"dev"}
The METADATA_KEY_COL
column is never AIRBYTE_METADATA
. So none of the metadata will be imported. Based on the comment, what you want is the following, right?
.filter(record -> record.get(DefaultJobPersistence.METADATA_KEY_COL).asText().equals(DatabaseSchema.AIRBYTE_METADATA.toString())); | |
.filter(record -> !record.get(DefaultJobPersistence.METADATA_KEY_COL).asText().equals(DEPLOYMENT_ID_KEY)); |
I think this is going to cause a severe problem for anyone that tries to import their configs. Right now the database connection depends on the existence of the service_uuid
metadata. If that record does not exist, the server and scheduler will be waiting for the connection forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix here: #4977.
depends on #4797
(arguably) blocks #4716
What
How
ServerApp.java
)ConfigDumpImporter.java
)to do