-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add astro dbt deploy
command
#1670
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1670 +/- ##
==========================================
- Coverage 86.74% 86.35% -0.40%
==========================================
Files 114 116 +2
Lines 16712 16977 +265
==========================================
+ Hits 14497 14660 +163
- Misses 1324 1394 +70
- Partials 891 923 +32 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few Qs, and one issue in the git URL parsing (some of my Qs might be about the CoreAPI endpoints though, rather than the CLI impl)
cloud/deploy/bundle.go
Outdated
func getDeployment(organizationID, deploymentID string, platformCoreClient astroplatformcore.CoreClient) (*astroplatformcore.Deployment, error) { | ||
resp, err := platformCoreClient.GetDeploymentWithResponse(context.Background(), organizationID, deploymentID) | ||
if err != nil { | ||
return nil, err | ||
} | ||
err = astrocore.NormalizeAPIError(resp.HTTPResponse, resp.Body) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
return resp.JSON200, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exists almost identical to CoreGetDeployment()
from ./cloud/deployment package -- could we use that instead please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, will use that.
request := astrocore.UpdateDeployRequest{ | ||
BundleTarballVersion: &tarballVersion, | ||
} | ||
resp, err := coreClient.UpdateDeployWithResponse(context.Background(), organizationID, input.DeploymentID, deployID, request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalizeDeploy
calls platformCoreClient.FinalizeDeployWithResponse
-- what's the difference bewtween finalizeDeploy and updateDeploy? (and why do we have both)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In v1alpha1 updateDeploy doesn't allow the user to just update the deploy entry, it is only used to tell Astro to finalize the deploy (i.e. roll out the image/DAGs to the deployment). In v1beta1 we decided to follow CRUD semantics more closely and have updateDeploy be for updating the deploy entry and finalizeDeploy be a verb route (/finalize) that instructs Astro to roll out the deploy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what jeremy said. We will eventually get the v1alpha1 and v1beta1 terminology the same, once we know all users have moved to a cli version that uses v1beta1 endpoints and we have a couple of customers using v1alpha1 endpoints, those will be migrated to using v1beta1 endpoints
cloud/deploy/bundle.go
Outdated
if git.HasUncommittedChanges() { | ||
logrus.Warn("Local repository has uncommitted changes, skipping Git metadata retrieval") | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know where this ultimately ends up, but instead of having no git metadata could we have the -dirty
flag appended like git describe does
:
$ git describe --tags --dirty
v1.27.0-12-gd79dc210-dirty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention of the Git metadata is to correlate a deploy with a commit that users can look up to see what the contents are/were (like we have done for GitHub integration), so we don't want to stamp the deploy with metadata that only relates to a dirty state on a local machine.
pkg/git/git.go
Outdated
if err != nil { | ||
return true | ||
} | ||
func GetRemoteRepository(remote string) (host, account, repo string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should probably have some unit tests of this parsing function please.
I think it will fail on ssh URLs like this one git@github.com:astronomer/astro-cli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about Git having these "SCP-like" URLs.
// get the deployment to deploy the dbt project to | ||
var deploymentID string | ||
if len(args) > 0 { | ||
// if provided, use the deployment ID from the command argument | ||
deploymentID = args[0] | ||
} else { | ||
// otherwise, prompt the user to select a deployment | ||
selectedDeployment, err := deployment.GetDeployment(workspaceID, "", deploymentName, false, nil, platformCoreClient, astroCoreClient) | ||
if err != nil { | ||
return err | ||
} | ||
deploymentID = selectedDeployment.Id | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth extracting this into a common/shared function so that the two deploy commands always work the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a fair question. In this change I really grappled with how much I wanted to unravel the existing deploy logic to fit in bundles. Frankly what is already there has over the years become a mess of poorly scoped function calls. This spot is a good example -- the existing deploy command has deployment resolution (including interactive prompting) not in the command code (where I would say it belongs) but buried deep in the service code in deployment.GetDeployment
. Here this also calls that function so at least we're reusing that part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments to be squashed, but looks fine overall 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cmd/cloud/dbt.go
Outdated
|
||
// check that there is a valid dbt project at the dbt project path | ||
dbtProjectYamlPath := filepath.Join(dbtProjectPath, dbtProjectYmlFilename) | ||
if _, err := os.Stat(dbtProjectYamlPath); os.IsNotExist(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this as part of PreRunE and move this check to utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of calling it from PreRunE
? I like having these checks in the Run
function so it's in one place. astro deploy
has a few checks in the Run
function and that doesn't seem to be an issue. I will move the logic to a util though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cobra offers PreRunE for this exact use case. To perform validations before the main command logic is executed. If it fails, it stops the execution command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but we don't have to use it because it exists. In my view it isn't helpful to have an artificial split like that. We don't use it for most checks in astro deploy
-- it only checks that it is running in a project directory, not whether the workspace can be determined, or that the dags
and image
flags aren't used together, or that the config can be saved, or that there aren't uncommitted changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked that the path is an user input and not derived from where this command is executed. This flow is different from how dag/image deploys work. We might want to confirm this with product once just so we are on the same page. This was not how i was imagining this check would work
My comment was running the preRunE check to confirm the current directory path is a dbt project or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the path is an user input and not derived from where this command is executed
Not quite, the code directly above this uses the current working directory by default, but the user can override it with a flag. This seems pretty useful as an option for CI where you've cloned a repo and want to deploy a dbt project in some subdirectory of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik we also do this in astro deploy
with the hidden (?) dags-path
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha! moving this to utils should be good then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made it a separate function in the file so we can reuse it for astro dbt delete
.
|
||
// if the deployment ID is not provided, get it from the deployment name or prompt the user | ||
if deploymentID == "" { | ||
selectedDeployment, err := deployment.GetDeployment(workspaceID, "", deploymentName, false, nil, platformCoreClient, astroCoreClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this under cloud/deploy/bundle.go. My understanding of cmd/cloud is for command definitions. Any work specific to the application should be under cloud/ and cmd/cloud should deal with constructing the command based on the flags passed and any pre-checks needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this part is about the command, because we have multiple ways that the command can specify the deployment:
- The id as an argument
- The name as a flag
- Otherwise from an interactive prompt, which itself can be filtered by another (workspace id) flag
I don't think it is right to pass down a bunch of argument/flag values to the service function and expect deployment resolution there -- that is coupling the application logic of deploying a bundle to the concern that the command that called it has multiple ways of specifying the deployment, and in this case it's even worse than that because it can sometimes involve an interactive prompt.
It should help that line 100 here is passing all the work of talking to the API to an existing service function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make a good point and i do see an advantage in this, its just my understanding on how i have understood this cobra package. We might need to refactor this in a bunch of different places in the future
cmd/cloud/root.go
Outdated
NewDeployCmd(), | ||
newDeploymentRootCmd(out), | ||
newWorkspaceCmd(out), | ||
newOrganizationCmd(out), | ||
} | ||
if config.CFG.DbtDeploysEnabled.GetBool() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we need this, since this is behind a feature flag anyway, and folks trying to use this without the feature flag would be thrown the appropriate error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea to keep the command hidden by default until the feature is at least in public preview. We did it like this for environment objects and it worked well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldnt be hiding the command...If folks run it they would see the error message and if they are curious and want to opt in, thats a good sign for us. Plus for anyone who wants to opt in, i dont see a knob for 2 configurations to be enabled for them to use this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the config and the command is now always added.
@rujhan-arora-astronomer Looping you in since there are some changes to the software code because of change in a common util function. |
} | ||
|
||
// get the current deployment so we can check the deploy is valid | ||
currentDeployment, err := deployment.CoreGetDeployment(c.Organization, input.DeploymentID, platformCoreClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we are doing get deployment twice in this flow...Once in cmd/cloud/dbt.go and again in this function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!
Description
This change adds a new command
astro dbt deploy
for deploying a dbt project to a deployment independently of the Astro project of the deployment. This uses the new bundle deploy type, where arbitrary files can be mounted on the Airflow containers at a selected mount path.The selection of the deployment is in line with
astro deploy
, where a deployment id argument can be provided, or one can be selected from a list.By default the dbt project will be mounted on the Airflow containers at
/usr/local/airflow/dbt/{dbt project name}
, where the project name is extracted from the dbt project'sdbt_project.yml
file. This mount path can be overridden with--mount-path
.The command is currently hidden behind the CLI config
dbt_deploys_enabled
until the feature is GA.The
goerr113
lint rule is also removed because it was repeatedly being overridden, and doesn't seem to add a lot of value.🧪 Functional Testing
astro dbt deploy
and manually downloaded the bundle to confirm its contentsastro deploy --dags
to confirm DAG deploys are not impacted📸 Screenshots
📋 Checklist
make test
before taking out of draftmake lint
before taking out of draft