deprecate DEFAULT_WORKSPACE_ID #5009

cgardens · 2021-07-27T01:37:48Z

What

Describe what the change is solving
It helps to add screenshots if it affects the frontend.

How

In the main method of the ServerApp, create a workspace if no workspace exists. This replaces the existing behavior where if the default workspace exists but hasn't been assigned a customer id (for segment tracking) then we do assign it. Now this will be handled as part of creating the workspace.
Remove workspace as the seeds in airbyte-config:init
Add workspaceId to the TrackingClient iface, so that tracking get fetch the appropriate workspace id

Note

We need to make a trade off here. Because we are opting to not add an "organization" level concept now the tracking story gets a little complicated. The rest of this section will assume that we are not adding an organization concept now. @michel-tricot & @jrhizor would love your thoughts on this to sanity check I haven't over looked some better option.

Option 1: (this is this PR as currently implemented)

In this approach the workspace becomes our core tracking identifier because we choosing to not have a formal way of connecting them. This gets weird in a few cases:

When users have multiple workspaces in the OSS project.
When users do the old manual upgrade process where the old workspace was replaced by the new one (and aliased in amplitude). That's not really reasonable to think about anymore.

For our metrics in BQ we can build up a user identity by assuming that all workspaces that have ever been on the same deployment are the same user. I am not sure how realistic this is in amplitude. In which case we would have some double counting of users in some metrics because we would assume each workspace is a separate user.

Option 2: Restrict multiple workspace in the OSS product

The only other thing I just thought of is saying that in the OSS version we allow one workspace. That should resolve, I think, all of these problems. Potentially just keeping this limitation in the UI will be good enough (I think there was only one user trying to use multiple workspace support via the API and I think they aren't doing it anywhere). In the cases that someone has used the API to have multiple workspaces, the metrics will be wrong, but we will assume it is small enough because it is API-only to not affect things. While not ideal long term this probably could get us by until we add an organization concept.

In this case, we can pretty much follow the protocol we already have on master. Basically whenever we use the import endpoint we alias any existing workspaces to whatever the first workspace is in the import. Again in the case where someone has more than one workspace from setting up in the API that'll be a little funky, but it should be incredibly rare and cause only small errors.

Doing this doesn't seem like that much additional work on top of what's already in this PR, while it's a hack, I at least feel like the metrics will be very accurate. In the multiple workspace world in OSS, I think we might totally slag our metrics.

Pre-merge Checklist

- need to write a migration that takes every workspace id that is a DEFAULT_WORKSPACE_ID and generate a new UUID for it. Likely will do this as a separate PR as it is easy to review separately.

cgardens · 2021-07-27T01:39:58Z

...te-scheduler/persistence/src/main/java/io/airbyte/scheduler/persistence/WorkspaceHelper.java

 import io.airbyte.validation.json.JsonValidationException;
 import java.io.IOException;
 import java.util.Objects;
 import java.util.UUID;
 import java.util.concurrent.ExecutionException;
 import org.checkerframework.checker.nullness.qual.NonNull;

+// todo (cgardens) - this class is in an unintuitive module. it is weird that you need to import


I had to move this class so that JobNotifier and JobTracker could get at it. feels like a lame brain move, but I couldn't find a better place, and was reticent to add another module just for this.

cgardens · 2021-07-27T01:41:16Z

airbyte-server/src/main/java/io/airbyte/server/ConfigDumpImporter.java

@@ -119,7 +106,6 @@ public ImportRead importDataWithSeed(String targetVersion, File archive, Path se
  private ImportRead importDataInternal(String targetVersion, File archive, Optional<Path> seedPath) {
    Preconditions.checkNotNull(seedPath);

-    final Optional<UUID> previousCustomerIdOptional = getCurrentCustomerId();


no longer trying to track and alias previous workspace. in the automigration case this was superfluous anyway. in the case of using the import endpoint this could mess up metrics a little. see the note about the trade off we are making in the main description of the pr.

tuliren

Nit. Some of the files have changes that just add final keywords to the variables. This seems unnecessary, especially because they are not affected by this PR, and some of the files are tests only (e.g. ConfigPersistenceBuilderTest.java), and I don't think anyone is going to redefine those variables there.

tuliren · 2021-07-27T23:56:09Z

airbyte-analytics/src/main/java/io/airbyte/analytics/LoggingTrackingClient.java

-        identitySupplier.get().getAirbyteVersion(),
-        identitySupplier.get().getCustomerId(),
+        identityFetcher.apply(workspaceId).getAirbyteVersion(),
+        identityFetcher.apply(workspaceId).getCustomerId(),


This function may query the database. So we probably want to store the result of identityFetcher.apply(workspaceId) instead of calling the apply twice here.

airbyte-config/persistence/src/test/java/io/airbyte/config/persistence/BaseTest.java

cgardens · 2021-07-28T15:48:28Z

Nit. Some of the files have changes that just add final keywords to the variables. This seems unnecessary, especially because they are not affected by this PR, and some of the files are tests only (e.g. ConfigPersistenceBuilderTest.java), and I don't think anyone is going to redefine those variables there.

Haha. I just do that instinctively when I read code now so that I know that reassignment can't happen. Does it mess you up if I add the finals in? Or is the issue that it makes the reviewing hard because the diff is noisy? I can revise my behavior to make PR reviews easier.

harmony getting close Revert "wip" This reverts commit 8a7ab00. tracking fixed to work with workspace? purge default workspace id fix HealthCheckHandlerTest fix tests

tuliren · 2021-07-29T00:00:07Z

Haha. I just do that instinctively when I read code now so that I know that reassignment can't happen. Does it mess you up if I add the finals in? Or is the issue that it makes the reviewing hard because the diff is noisy? I can revise my behavior to make PR reviews easier.

No, the finals do not mess up with my change. But they are a distraction for the code review. In general reassignment is not a big concern. And people usually are careful not to do it. So having finals all over the places seem like a hassle.

cgardens requested review from jrhizor and tuliren July 27, 2021 01:37

github-actions bot added the area/platform issues related to the platform label Jul 27, 2021

cgardens commented Jul 27, 2021

View reviewed changes

tuliren approved these changes Jul 28, 2021

View reviewed changes

cgardens mentioned this pull request Jul 28, 2021

Migrations for deprecating default workspace and making namespaceDefinition required #5043

Merged

cgardens added 2 commits July 28, 2021 13:48

wip

147b60e

harmony getting close Revert "wip" This reverts commit 8a7ab00. tracking fixed to work with workspace? purge default workspace id fix HealthCheckHandlerTest fix tests

various type clean up

d75601c

cgardens force-pushed the cgardens/add_workspace_id_to_tracking branch from f908ed6 to d75601c Compare July 28, 2021 20:52

cgardens merged commit 1f58fb7 into master Jul 28, 2021

cgardens deleted the cgardens/add_workspace_id_to_tracking branch July 28, 2021 23:59

cgardens mentioned this pull request Jul 29, 2021

scope tracking by workspace #4838

Closed

tuliren mentioned this pull request Aug 4, 2021

🐛 Create workspace before initializing tracking client #5202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deprecate DEFAULT_WORKSPACE_ID #5009

deprecate DEFAULT_WORKSPACE_ID #5009

cgardens commented Jul 27, 2021 •

edited

Loading

cgardens Jul 27, 2021

cgardens Jul 27, 2021

tuliren left a comment

tuliren Jul 27, 2021

cgardens commented Jul 28, 2021

tuliren commented Jul 29, 2021

deprecate DEFAULT_WORKSPACE_ID #5009

deprecate DEFAULT_WORKSPACE_ID #5009

Conversation

cgardens commented Jul 27, 2021 • edited Loading

What

How

Note

Option 1: (this is this PR as currently implemented)

Option 2: Restrict multiple workspace in the OSS product

Recommended reading order

Pre-merge Checklist

cgardens Jul 27, 2021

Choose a reason for hiding this comment

cgardens Jul 27, 2021

Choose a reason for hiding this comment

tuliren left a comment

Choose a reason for hiding this comment

tuliren Jul 27, 2021

Choose a reason for hiding this comment

cgardens commented Jul 28, 2021

tuliren commented Jul 29, 2021

cgardens commented Jul 27, 2021 •

edited

Loading