Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: make docker health check configurable #1624

Merged
merged 1 commit into from
Jan 10, 2019

Conversation

adnxn
Copy link
Contributor

@adnxn adnxn commented Oct 16, 2018

Summary

carried over from #1522

Implementation details

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes:

Description for the changelog

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@adnxn adnxn requested a review from a team October 16, 2018 23:15
@adnxn adnxn mentioned this pull request Oct 16, 2018
8 tasks
@@ -50,3 +51,10 @@ func (params *TelemetrySessionParams) time() ttime.Time {
})
return params._time
}

func (params *TelemetrySessionParams) isMetricsDisabled() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name makes me a little bit confusing. It is called isMetricsDisabled, but when DisableMetrics flag is true, the function will not return true. Do you want to say MetricsSessionDisabled or something? Maybe change the function name and add some description as comments here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair. changed to isContainerHealthMetricsDisabled since checking against both DisabledMetrics && DisableDockerHealthCheck

result: true,
err: nil,
description: "both telemetry and health metrics were disable should return true",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it in golang, for && operator, if the first one is false, it will not proceed to check the value of second parameter? Since seems you miss a case, when DisableMetrics is false and DisableDockerHealthCheck is true.

gomock.Any()).AnyTimes().Return([]string{}, nil)

ctx, cancel := context.WithCancel(context.TODO())
// Cancel the context to cancel async routines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without it we end up with flakey tests that have run away go routines. maybe you've seen some errors similar to "goroutine ended after test" kind of thing. please correct me if i'm wrong though.

capabilities, err := agent.capabilities()
require.NoError(t, err)

capMap := make(map[string]bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can use assert.NotContains instead of these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh cool, changed.

return
}
if ok {
seelog.Warnf("Metrics were disabled, not start the telemetry session")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: start -> starting

@@ -73,6 +73,19 @@ func (*mockStatsEngine) GetTaskHealthMetrics() (*ecstcs.HealthMetadata, []*ecstc
return nil, nil, nil
}

// TestDisableMetrics tests the StartMetricsSession will return immediately if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering what is the result of breaking the logic we test here? it seems that in that case we will just not return immediately, but i'm not sure whether the test will fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm yea this is a weird one, this looks like it needs a StartMetricsSession refactor to be able to have more sensible testing. we can track this as tech debt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a TODO for this

ok, err := tc.param.isMetricsDisabled()
if tc.err != nil {
assert.Error(t, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else we should assert no error?


func (params *TelemetrySessionParams) isMetricsDisabled() (bool, error) {
if params.Cfg != nil {
return params.Cfg.DisableMetrics && params.Cfg.DisableDockerHealthCheck, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for understanding, is DisableMetrics something that we have introduced previously but never used? Would it be better to add a description in the PR summary for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DisableMetrics maps to ECS_DISABLE_METRICS and is already covered in the README. this PR is to add a toggle for the health checks, which relies on the underlying tcs/metrics subsystem.

@adnxn
Copy link
Contributor Author

adnxn commented Oct 17, 2018

thanks for taking a pass at this, will make changes and need to update the test for this actually as well.

@adnxn adnxn changed the title config: make docker health check configurable [wip] config: make docker health check configurable Oct 17, 2018
@adnxn adnxn force-pushed the richardpen/container-health-config branch from 72f1713 to 877a310 Compare January 9, 2019 22:57
@adnxn adnxn changed the title [wip] config: make docker health check configurable config: make docker health check configurable Jan 9, 2019
@@ -73,6 +73,19 @@ func (*mockStatsEngine) GetTaskHealthMetrics() (*ecstcs.HealthMetadata, []*ecstc
return nil, nil, nil
}

// TestDisableMetrics tests the StartMetricsSession will return immediately if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a TODO for this

@adnxn
Copy link
Contributor Author

adnxn commented Jan 10, 2019

windows ftest failing:

=== RUN   TestTwoTasksSharedLocalVolume
go : panic: test timed out after 30m0s
At C:\Users\Administrator\AppData\Local\Temp\amazon-ecs-agent\go\src\github.com\aws\amazon-ecs-agent\scripts\run-functi
onal-tests.ps1:26 char:3
+   go test -tags functional -timeout=30m -v ../agent/functional_tests/ ...
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (panic: test timed out after 30m0s:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

@adnxn
Copy link
Contributor Author

adnxn commented Jan 10, 2019

merging this to dev and tracking intermittent TestTwoTasksSharedLocalVolume separately here #1786.

@adnxn adnxn merged commit 6635069 into aws:dev Jan 10, 2019
@adnxn adnxn added this to the 1.25.0 milestone Jan 23, 2019
@adnxn adnxn deleted the richardpen/container-health-config branch February 19, 2019 19:50
@adnxn adnxn restored the richardpen/container-health-config branch February 19, 2019 19:51
@adnxn adnxn deleted the richardpen/container-health-config branch October 21, 2019 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants