Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add astro dbt deploy command #1670

Merged
merged 6 commits into from
Jun 28, 2024
Merged

Add astro dbt deploy command #1670

merged 6 commits into from
Jun 28, 2024

Conversation

jeremybeard
Copy link
Contributor

Description

This change adds a new command astro dbt deploy for deploying a dbt project to a deployment independently of the Astro project of the deployment. This uses the new bundle deploy type, where arbitrary files can be mounted on the Airflow containers at a selected mount path.

The selection of the deployment is in line with astro deploy, where a deployment id argument can be provided, or one can be selected from a list.

By default the dbt project will be mounted on the Airflow containers at /usr/local/airflow/dbt/{dbt project name}, where the project name is extracted from the dbt project's dbt_project.yml file. This mount path can be overridden with --mount-path.

The command is currently hidden behind the CLI config dbt_deploys_enabled until the feature is GA.

The goerr113 lint rule is also removed because it was repeatedly being overridden, and doesn't seem to add a lot of value.

🧪 Functional Testing

  • Unit tests added and updated
  • Ran astro dbt deploy and manually downloaded the bundle to confirm its contents
  • Ran astro deploy --dags to confirm DAG deploys are not impacted

📸 Screenshots

Screenshot 2024-06-24 at 3 35 28 PM

📋 Checklist

  • Rebased from the main (or release if patching) branch (before testing)
  • Ran make test before taking out of draft
  • Ran make lint before taking out of draft
  • Added/updated applicable tests
  • Tested against Astro-API (if necessary).
  • Tested against Houston-API and Astronomer (if necessary).
  • Communicated to/tagged owners of respective clients potentially impacted by these changes.
  • Updated any related documentation

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

Attention: Patch coverage is 63.52941% with 124 lines in your changes missing coverage. Please review.

Project coverage is 86.35%. Comparing base (d79dc21) to head (a7d14af).

Files Patch % Lines
cloud/deploy/bundle.go 60.60% 33 Missing and 19 partials ⚠️
pkg/git/git.go 32.20% 36 Missing and 4 partials ⚠️
cmd/cloud/dbt.go 78.04% 9 Missing and 9 partials ⚠️
pkg/fileutil/files.go 72.72% 3 Missing and 3 partials ⚠️
cloud/deploy/deploy.go 83.33% 3 Missing and 1 partial ⚠️
airflow/docker_image.go 0.00% 1 Missing ⚠️
cmd/cloud/deploy.go 50.00% 0 Missing and 1 partial ⚠️
cmd/software/deploy.go 0.00% 0 Missing and 1 partial ⚠️
houston/houston.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1670      +/-   ##
==========================================
- Coverage   86.74%   86.35%   -0.40%     
==========================================
  Files         114      116       +2     
  Lines       16712    16977     +265     
==========================================
+ Hits        14497    14660     +163     
- Misses       1324     1394      +70     
- Partials      891      923      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few Qs, and one issue in the git URL parsing (some of my Qs might be about the CoreAPI endpoints though, rather than the CLI impl)

Comment on lines 118 to 129
func getDeployment(organizationID, deploymentID string, platformCoreClient astroplatformcore.CoreClient) (*astroplatformcore.Deployment, error) {
resp, err := platformCoreClient.GetDeploymentWithResponse(context.Background(), organizationID, deploymentID)
if err != nil {
return nil, err
}
err = astrocore.NormalizeAPIError(resp.HTTPResponse, resp.Body)
if err != nil {
return nil, err
}

return resp.JSON200, nil
}
Copy link
Contributor

@ashb ashb Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exists almost identical to CoreGetDeployment() from ./cloud/deployment package -- could we use that instead please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, will use that.

cloud/deploy/bundle.go Show resolved Hide resolved
Comment on lines +162 to +165
request := astrocore.UpdateDeployRequest{
BundleTarballVersion: &tarballVersion,
}
resp, err := coreClient.UpdateDeployWithResponse(context.Background(), organizationID, input.DeploymentID, deployID, request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalizeDeploy calls platformCoreClient.FinalizeDeployWithResponse -- what's the difference bewtween finalizeDeploy and updateDeploy? (and why do we have both)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v1alpha1 updateDeploy doesn't allow the user to just update the deploy entry, it is only used to tell Astro to finalize the deploy (i.e. roll out the image/DAGs to the deployment). In v1beta1 we decided to follow CRUD semantics more closely and have updateDeploy be for updating the deploy entry and finalizeDeploy be a verb route (/finalize) that instructs Astro to roll out the deploy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what jeremy said. We will eventually get the v1alpha1 and v1beta1 terminology the same, once we know all users have moved to a cli version that uses v1beta1 endpoints and we have a couple of customers using v1alpha1 endpoints, those will be migrated to using v1beta1 endpoints

Comment on lines 177 to 180
if git.HasUncommittedChanges() {
logrus.Warn("Local repository has uncommitted changes, skipping Git metadata retrieval")
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where this ultimately ends up, but instead of having no git metadata could we have the -dirty flag appended like git describe does:

$ git describe --tags --dirty
v1.27.0-12-gd79dc210-dirty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of the Git metadata is to correlate a deploy with a commit that users can look up to see what the contents are/were (like we have done for GitHub integration), so we don't want to stamp the deploy with metadata that only relates to a dirty state on a local machine.

pkg/git/git.go Outdated
if err != nil {
return true
}
func GetRemoteRepository(remote string) (host, account, repo string, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should probably have some unit tests of this parsing function please.

I think it will fail on ssh URLs like this one git@github.com:astronomer/astro-cli

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about Git having these "SCP-like" URLs.

Comment on lines +97 to +109
// get the deployment to deploy the dbt project to
var deploymentID string
if len(args) > 0 {
// if provided, use the deployment ID from the command argument
deploymentID = args[0]
} else {
// otherwise, prompt the user to select a deployment
selectedDeployment, err := deployment.GetDeployment(workspaceID, "", deploymentName, false, nil, platformCoreClient, astroCoreClient)
if err != nil {
return err
}
deploymentID = selectedDeployment.Id
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth extracting this into a common/shared function so that the two deploy commands always work the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair question. In this change I really grappled with how much I wanted to unravel the existing deploy logic to fit in bundles. Frankly what is already there has over the years become a mess of poorly scoped function calls. This spot is a good example -- the existing deploy command has deployment resolution (including interactive prompting) not in the command code (where I would say it belongs) but buried deep in the service code in deployment.GetDeployment. Here this also calls that function so at least we're reusing that part.

cmd/cloud/dbt.go Show resolved Hide resolved
Copy link
Contributor

@neel-astro neel-astro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments to be squashed, but looks fine overall 🚀

cloud/deploy/bundle.go Outdated Show resolved Hide resolved
cloud/deploy/bundle.go Outdated Show resolved Hide resolved
cloud/deploy/bundle_test.go Outdated Show resolved Hide resolved
cloud/deploy/deploy.go Show resolved Hide resolved
cmd/cloud/dbt.go Show resolved Hide resolved
cmd/cloud/dbt.go Outdated Show resolved Hide resolved
cmd/cloud/dbt.go Outdated Show resolved Hide resolved
cmd/cloud/dbt_test.go Outdated Show resolved Hide resolved
config/cloud.go Show resolved Hide resolved
pkg/git/git.go Outdated Show resolved Hide resolved
Copy link
Contributor

@neel-astro neel-astro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cmd/cloud/dbt.go Outdated Show resolved Hide resolved
cmd/cloud/dbt.go Outdated

// check that there is a valid dbt project at the dbt project path
dbtProjectYamlPath := filepath.Join(dbtProjectPath, dbtProjectYmlFilename)
if _, err := os.Stat(dbtProjectYamlPath); os.IsNotExist(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this as part of PreRunE and move this check to utils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of calling it from PreRunE? I like having these checks in the Run function so it's in one place. astro deploy has a few checks in the Run function and that doesn't seem to be an issue. I will move the logic to a util though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cobra offers PreRunE for this exact use case. To perform validations before the main command logic is executed. If it fails, it stops the execution command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but we don't have to use it because it exists. In my view it isn't helpful to have an artificial split like that. We don't use it for most checks in astro deploy -- it only checks that it is running in a project directory, not whether the workspace can be determined, or that the dags and image flags aren't used together, or that the config can be saved, or that there aren't uncommitted changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked that the path is an user input and not derived from where this command is executed. This flow is different from how dag/image deploys work. We might want to confirm this with product once just so we are on the same page. This was not how i was imagining this check would work

My comment was running the preRunE check to confirm the current directory path is a dbt project or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the path is an user input and not derived from where this command is executed

Not quite, the code directly above this uses the current working directory by default, but the user can override it with a flag. This seems pretty useful as an option for CI where you've cloned a repo and want to deploy a dbt project in some subdirectory of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik we also do this in astro deploy with the hidden (?) dags-path flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha! moving this to utils should be good then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it a separate function in the file so we can reuse it for astro dbt delete.


// if the deployment ID is not provided, get it from the deployment name or prompt the user
if deploymentID == "" {
selectedDeployment, err := deployment.GetDeployment(workspaceID, "", deploymentName, false, nil, platformCoreClient, astroCoreClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this under cloud/deploy/bundle.go. My understanding of cmd/cloud is for command definitions. Any work specific to the application should be under cloud/ and cmd/cloud should deal with constructing the command based on the flags passed and any pre-checks needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this part is about the command, because we have multiple ways that the command can specify the deployment:

  • The id as an argument
  • The name as a flag
  • Otherwise from an interactive prompt, which itself can be filtered by another (workspace id) flag

I don't think it is right to pass down a bunch of argument/flag values to the service function and expect deployment resolution there -- that is coupling the application logic of deploying a bundle to the concern that the command that called it has multiple ways of specifying the deployment, and in this case it's even worse than that because it can sometimes involve an interactive prompt.

It should help that line 100 here is passing all the work of talking to the API to an existing service function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a good point and i do see an advantage in this, its just my understanding on how i have understood this cobra package. We might need to refactor this in a bunch of different places in the future

NewDeployCmd(),
newDeploymentRootCmd(out),
newWorkspaceCmd(out),
newOrganizationCmd(out),
}
if config.CFG.DbtDeploysEnabled.GetBool() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need this, since this is behind a feature flag anyway, and folks trying to use this without the feature flag would be thrown the appropriate error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to keep the command hidden by default until the feature is at least in public preview. We did it like this for environment objects and it worked well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldnt be hiding the command...If folks run it they would see the error message and if they are curious and want to opt in, thats a good sign for us. Plus for anyone who wants to opt in, i dont see a knob for 2 configurations to be enabled for them to use this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the config and the command is now always added.

@kushalmalani
Copy link
Contributor

@rujhan-arora-astronomer Looping you in since there are some changes to the software code because of change in a common util function.

}

// get the current deployment so we can check the deploy is valid
currentDeployment, err := deployment.CoreGetDeployment(c.Organization, input.DeploymentID, platformCoreClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we are doing get deployment twice in this flow...Once in cmd/cloud/dbt.go and again in this function call.

Copy link
Contributor

@kushalmalani kushalmalani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

@jeremybeard jeremybeard merged commit 1162f34 into main Jun 28, 2024
3 of 5 checks passed
@jeremybeard jeremybeard deleted the add-dbt-deploy-command branch June 28, 2024 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants