Skip to content

Commit

Permalink
eval broker test: enable canceled eval reaper
Browse files Browse the repository at this point in the history
Following the update to the go1.20 toolchain, the eval broker integration test
started to become flaky.

The `Eval.Ack` RPC never blocks when trying to wake up
`reapCancelableEvaluations` because the reaper has a periodic timer that ensures
the evals will get reaped eventually. With the reaper effectively disabled in
the test, it's possible for the eval broker to ack the two ready evals before
the reaper goroutine ever gets scheduled, at which point the cancelable evals
won't get reaped in this test. The test was always wrong in this sense, but the
toolchain update seems to have changed the timing enough that it now fails
frequently.

This changeset also adjusts the timeout and reduces the number of concurrent
schedulers to reduce contention on the eval broker's lock during the test. The
test has been updated to use `shoenig/test/wait` so we can have sensible
reporting of the results rather than just a timeout error when things go
wrong. This is how I verified that the eval broker's internal logic was still
correct and that the problem was just the reaper not running.
  • Loading branch information
tgross committed Feb 9, 2023
1 parent 88cd93b commit e462084
Showing 1 changed file with 23 additions and 10 deletions.
33 changes: 23 additions & 10 deletions nomad/eval_broker_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,16 @@ import (
"time"

msgpackrpc "github.com/hashicorp/net-rpc-msgpackrpc"
"github.com/shoenig/test"
"github.com/shoenig/test/must"
"github.com/shoenig/test/wait"
"github.com/stretchr/testify/require"

"github.com/hashicorp/nomad/ci"
"github.com/hashicorp/nomad/nomad/mock"
"github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs"
"github.com/hashicorp/nomad/testutil"
"github.com/shoenig/test"
"github.com/shoenig/test/must"
"github.com/stretchr/testify/require"
)

var (
Expand Down Expand Up @@ -1566,17 +1568,28 @@ func TestEvalBroker_IntegrationTest(t *testing.T) {
// be one eval processesed at a time no matter how many schedulers we run

config := DefaultConfig()
config.NumSchedulers = 4
config.EvalReapCancelableInterval = time.Minute * 10
require.NoError(t, srv.Reload(config))
config.NumSchedulers = 2
config.EvalReapCancelableInterval = 500 * time.Millisecond
must.NoError(t, srv.Reload(config))

// assert that all but 2 evals were canceled and that the eval broker state
// has been cleared

require.Eventually(t, func() bool {
got := getEvalStatuses()
return got[structs.EvalStatusComplete] == 2 && got[structs.EvalStatusCancelled] == 9
}, 2*time.Second, time.Millisecond*100)
var got map[string]int

must.Wait(t, wait.InitialSuccess(
wait.Timeout(10*time.Second),
wait.Gap(50*time.Millisecond),
wait.BoolFunc(func() bool {
got = getEvalStatuses()
return got[structs.EvalStatusComplete] == 2 &&
got[structs.EvalStatusCancelled] == 9
}),
),
must.Func(func() string {
return fmt.Sprintf("expected map[complete:2 canceled:9] within timeout, got: %v with broker status=%#v", got, getStats())
}),
)

must.Eq(t, BrokerStats{TotalReady: 0, TotalUnacked: 0,
TotalPending: 0, TotalCancelable: 0}, getStats())
Expand Down

0 comments on commit e462084

Please sign in to comment.