-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Runtime Agnostic Healthcheck #1740
Conversation
075590e
to
e737fd9
Compare
e737fd9
to
d088cd2
Compare
} | ||
|
||
// If we've successfully gotten a healthcheck response, print the status. | ||
err = printStatus(settingsFile, envConns, project, d.composeService, airflowDockerVersion, noBrowser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printStatus()
was previously called by the original checkWebserverHealth()
function. That function has been removed and replaced and doesn't concern itself with printing the message. That now just happens at the "top-level" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call.
} else { | ||
fmt.Println("\nProject is running! All components are now available.") | ||
} | ||
fmt.Println("\nProject is running! All components are now available.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The healthcheck now works for all runtimes so there's no need to differentiate here.
@@ -1433,7 +1397,7 @@ func printStatus(settingsFile string, envConns map[string]astrocore.EnvironmentO | |||
fmt.Println("\nUnable to open the webserver URL, please visit the following link: " + webserverURL) | |||
} | |||
} | |||
return errComposeProjectRunning | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why we were returning an error for success, but we don't anymore.
healthcheck: | ||
test: curl --fail http://webserver:8080/health || exit 1 | ||
interval: 2s | ||
retries: 15 | ||
start_period: 5s | ||
timeout: 60s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This original check built into the compose file does not work for orbstack or podman. We remove it here and it's been replaced by the new functionality.
@@ -19,7 +17,7 @@ type Suite struct { | |||
origCmdExec func(cmd string, stdout, stderr io.Writer, args ...string) error | |||
origGetDockerClient func() (client.APIClient, error) | |||
origInitSettings func(id, settingsFile string, envConns map[string]astrocore.EnvironmentObjectConnection, version uint64, connections, variables, pools bool) error | |||
origCheckWebserverHealth func(settingsFile string, envConns map[string]astrocore.EnvironmentObjectConnection, project *types.Project, composeService api.Service, airflowDockerVersion uint64, noBrowser bool, timeout time.Duration) error | |||
origCheckWebserverHealth func(url string, timeout time.Duration) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just updating to the signature.
25af892
to
ca5e07d
Compare
ca5e07d
to
47a5644
Compare
@@ -82,7 +81,6 @@ const ( | |||
var ( | |||
errNoFile = errors.New("file specified does not exist") | |||
errSettingsPath = "error looking for settings.yaml" | |||
errComposeProjectRunning = errors.New("project is up and running") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an error
that we used to indicate success for some reason. This isn't used anymore as we just return nil
if things work (no error).
// CheckM1Image checks if the CLI is currently running on M1 architecture | ||
// next it checks if the runtime version of the image building is above 6.0.4 | ||
// if the image is M1 architecture and runtime version is less than or equal to 6.0.4 it will print true | ||
var CheckM1Image = func(imageLabels map[string]string) bool { | ||
if !isM1(runtime.GOOS, runtime.GOARCH) { | ||
// the architecture is not arm64 no need to print message | ||
return false | ||
} | ||
runtimeVersion, ok := imageLabels[runtimeVersionLabelName] | ||
if !ok { | ||
// cannot determine runtime version print message by default | ||
return true | ||
} | ||
|
||
return versions.LessThanOrEqualTo(runtimeVersion, M1ImageRuntimeVersion) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is purely used to decide if the astro dev start --wait
flag defaults to 1m
or 5m
. Thinking we just simplify and use 5m
. The CLI progresses as soon as it sees the 200
from the webserver /health
endpoint.
bed5d87
to
4caa877
Compare
startupTimeout = waitTime | ||
// check if user provided a waitTime | ||
// default is 1 minute | ||
if waitTime != 1*time.Minute { | ||
startupTimeout = waitTime | ||
} else if CheckM1Image(imageLabels) { | ||
// user did not provide a waitTime | ||
// if running darwin/M1 architecture | ||
// we wait for a longer startup time | ||
startupTimeout = 5 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm removing a chunk of logic around this M1 architecture check and runtime comparison. The only reason we have all this is to bump the startup timeout from 1m
to 5m
. I've just made 5m
the default across the board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we feel strongly about not changing the default across the board, we could revert this piece. It just feels like unnecessary cruft at this point, and this feels like a fine time to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cutting out these legacy "M1" bits, it is definitely time to remove it.
I believe we added the 5 minute timeout because when Apple Silicon first came out we did not have arm64 support in the Astro Runtime image and so the macOS emulation was really slow. We have long since added that arm64 support, so I'm wondering if the default timeout should be 1 minute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I'm fine with making it 1m
across the board now. @neel-astro that sound ok to you?
@@ -52,7 +51,6 @@ const ( | |||
defaultAirflowVersion = uint64(0x2) //nolint:gomnd | |||
triggererAllowedRuntimeVersion = "4.0.0" | |||
triggererAllowedAirflowVersion = "2.2.0" | |||
M1ImageRuntimeVersion = "6.0.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 6.x
version of runtime has been deprecated as of March 2024 as stated in our docs.
airflow/docker.go
Outdated
var checkWebserverHealth = func(settingsFile string, envConns map[string]astrocore.EnvironmentObjectConnection, project *types.Project, composeService api.Service, airflowDockerVersion uint64, noBrowser bool, timeout time.Duration) error { | ||
if config.CFG.DockerCommand.GetString() == podman { | ||
err := printStatus(settingsFile, envConns, project, composeService, airflowDockerVersion, noBrowser) | ||
if err != nil { | ||
if !errors.Is(err, errComposeProjectRunning) { | ||
return err | ||
} | ||
} | ||
} else { | ||
ctx, cancel := context.WithTimeout(context.Background(), timeout) | ||
defer cancel() | ||
// check if webserver is healthy for user | ||
err := composeService.Events(ctx, project.Name, api.EventsOptions{ | ||
Services: []string{WebserverDockerContainerName}, Consumer: func(event api.Event) error { | ||
marshal, err := json.Marshal(map[string]interface{}{ | ||
"action": event.Status, | ||
}) | ||
if err != nil { | ||
return err | ||
} | ||
if string(marshal) == `{"action":"health_status: healthy"}` { | ||
err := printStatus(settingsFile, envConns, project, composeService, airflowDockerVersion, noBrowser) | ||
if err != nil { | ||
return err | ||
} | ||
} | ||
|
||
return nil | ||
}, | ||
}) | ||
if err != nil { | ||
if errors.Is(err, context.DeadlineExceeded) { | ||
fmt.Printf("\n") | ||
return fmt.Errorf("there might be a problem with your project starting up. The webserver health check timed out after %s but your project will continue trying to start. Run 'astro dev logs --webserver | --scheduler' for details.\n\nTry again or use the --wait flag to increase the time out", timeout) | ||
} | ||
if !errors.Is(err, errComposeProjectRunning) { | ||
return err | ||
} | ||
} | ||
} | ||
return nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is being removed in favor of the new one declared in its own file below - health_check.go
.
4caa877
to
0f96cb3
Compare
0f96cb3
to
d5c9fcf
Compare
cmd/airflow_test.go
Outdated
@@ -407,7 +407,7 @@ func (s *AirflowSuite) TestAirflowStart() { | |||
|
|||
mockContainerHandler := new(mocks.ContainerHandler) | |||
containerHandlerInit = func(airflowHome, envFile, dockerfile, imageName string) (airflow.ContainerHandler, error) { | |||
mockContainerHandler.On("Start", "", "airflow_settings.yaml", "", "", false, false, 1*time.Minute, map[string]astrocore.EnvironmentObjectConnection(nil)).Return(nil).Once() | |||
mockContainerHandler.On("Start", "", "airflow_settings.yaml", "", "", false, false, 5*time.Minute, map[string]astrocore.EnvironmentObjectConnection(nil)).Return(nil).Once() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally unit tests should use the constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent, it simplifies the code and gains more functionality as a result! Appreciate the proactive PR comments too, makes it all very clear.
Approving ahead of a couple of small comments to look into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a step in the right direction, but I am slightly concerned about potentially breaking the behavior for folks using docker-compose override or custom docker-compose in their local setup (applies to both docker and podman users, and I have seen a decent amount of examples where we suggest customers use it to customize any change), because in those cases we can't for sure say that the port mapping would be 8080 or even the same as that of the webserver.port in config (because changing port number is quite a common occurrence).
We need to think through a way to keep supporting those. Maybe adjust our error messaging to ask the user to set the webserver port in the config even if they are using custom docker-compose or override.
// This fires on every tick of our timer to run the healthcheck. | ||
// We return successfully from this function when we get a 200 status code. | ||
case <-ticker.C: | ||
statusCode, _ := healthCheck(ctx, client, url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: either we could log the error or drop the error
from the return arguments
@neel-astro I've updated the healthcheck URL to get the port from the config now. I also attempted to override compose with:
This results in both our standard 8080 and 9090 being bound, so the override file can't wipe out our existing port and it'll always be the first in the list of ports (or the overridden port from the config). Also, the healthcheck that was defined previously via the compose file had 8080 hard-coded, so this should actually result in an improvement here now that we grab it from the config. |
feedeb0
to
7352506
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this change would still be slightly disruptive for anyone using astro dev start --compose-file <custom-compose-file>
with a different webserver port, but there is a workaround for them by setting astro config set webserver.port <webserver-port>
. But since this solves the problem for a bunch of other broken use cases, I am on board with the change.
@neel-astro do you have an example of what that compose file would look like to override the websever port in a problematic way? As far as I know, any ports defined in a custom compose file are added to the standard definition we provide as the base, so 8080 (or whatever the user configures) will always be bound. Or are you referring to overriding the airflow setting to change the port that |
@schnie there is a functionality within So theoretically, users could export the compose file, edit it to replace the existing 8080 webserver port, and spin up the local setup. |
Hmm, yea I was able to recreate that here locally. Does seem like a bit of an edge case. I did have some code I was playing with that actually grabbed the port from the webserver service in the compose project. Do you know of any use-case where someone out there would be doing this kind of change? If we know of someone actually doing this on the webserver maybe we think about it some more. Otherwise, it does feel like a lot for someone to go through for something like this. |
Yeah, I think it's fine because in such cases the new way to handle the case would be for them to set the |
@yanmastin-astro Add to docs release notes? "Additional improvements: Streamlined the behavior of |
Description
This PR aims to provide a consistent local dev startup experience across container runtimes. Today,
astro dev start
will print the status update and attempt to open the browser to the airflow dashboard once its up and running. This is nice and it works for Docker Desktop. It does not work as expected fororbstack
orpodman
. For these, the command line hangs and we end up opening the browser to a URL that isn't ready yet. This is a bad experience.We're exploring the idea of using
podman
as our default runtime going forward and this is another small fix towards preparing for something like that. This fix will improve todays UX either way.🎟 Issue(s)
Related to https://github.com/astronomer/astro/issues/24344
🧪 Functional Testing
Have tested this using
astro dev start
on:📸 Screenshots
📋 Checklist
make test
before taking out of draftmake lint
before taking out of draftTested against Astro-API (if necessary).Tested against Houston-API and Astronomer (if necessary).