-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve status reporting in UI #1262
Conversation
46e83b7
to
b06d357
Compare
…ws in getRepairRunsForClusterPrioritiseRunning.
90fd282
to
5eefcfd
Compare
…r and state and then query the main `repair_runs` table using the found UUIDs.
5eefcfd
to
def2799
Compare
…ist.size(), limit.orElse(...))
86c1626
to
b5c0989
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the Cucumber scenario to verify the feature is correctly implemented:
@sidecar
Scenario Outline: Verify that ongoing repairs are prioritized over finished ones when listing the runs
Given that reaper <version> is running
And reaper has no cluster in storage
When an add-cluster request is made to reaper with authentication
Then reaper has the last added cluster in storage
And a new repair is added for "test" and keyspace "test_keyspace"
And I add and abort 10 repairs for "test" and keyspace "test_keyspace2"
Then when I list the last 10 repairs, I can see 1 repairs at "NOT_STARTED" state
And when I list the last 10 repairs, I can see 9 repairs at "ABORTED" state
When the last added cluster is deleted
Then reaper has no longer the last added cluster in storage
${cucumber.upgrade-versions}
The 2 new steps to implement are:
And I add and abort ?? repairs for "???" and keyspace "???"
Then when I list the last ?? repairs, I can see ? repairs at "????" state
cc5d146
to
bf77898
Compare
…iority statuses. Fix issue in MemoryStorage and add test for it.
bf77898
to
5e126e6
Compare
src/server/src/test/java/io/cassandrareaper/acceptance/BasicSteps.java
Outdated
Show resolved
Hide resolved
@And("^I add 11 and abort the most recent 10 repairs for cluster \"([^\"]*)\" and keyspace \"([^\"]*)\"$") | ||
public void addAndAbortRepairs(String clusterName, String keyspace) throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11
and 10
should be variables, so that we can reuse this step if needed with different numbers.
@And("^I add 11 and abort the most recent 10 repairs for cluster \"([^\"]*)\" and keyspace \"([^\"]*)\"$") | |
public void addAndAbortRepairs(String clusterName, String keyspace) throws Throwable { | |
@And("^I add (\\d+) and abort the most recent (\\d+) repairs for cluster \"([^\"]*)\" and keyspace \"([^\"]*)\"$") | |
public void addAndAbortRepairs(int nbRepairRuns, int abortedRepairRuns, String clusterName, String keyspace) throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the requested changes. Thinking on this, should this not be several steps? So perhaps the spec should be:
- When I add 11 repairs
- When I set the state on the most recent 10 repairs to ABORTED
That would better promote reusability I think.
|
||
RUNNERS.parallelStream().forEach(runner -> { | ||
Integer iter = 1; | ||
while (iter <= 11) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be creating just 10 repairs here, not 11.
Also I don't think you should use a parallel stream across runners, because each runner will create 10 runs here.
Just pick the first runner by using RUNNERS.get(0).callReaper(...)
at line 2913, and remove the surrounding stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about that. The runners all run against the same cluster? Changed as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE creating too few repairs, with this logic I create 1,2,3,4,5,6,7,8,9,10,11 - which is 11 I'm pretty sure?
I used my fingers to count, so someone needs to rescind my math postgrad if I'm wrong about this.
3d23dab
to
330b0bb
Compare
500662f
to
e8d061e
Compare
d911355
to
b2f6ac3
Compare
b2f6ac3
to
e38cc83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, I still have a few request so that we don't create confusion amongst users.
final Collection<RepairRun> repairRuns = context.storage.getRepairRunsForCluster(clusterName, limit); | ||
final Collection<RepairRun> repairRuns = context | ||
.storage | ||
.getRepairRunsForClusterPrioritiseRunning(clusterName, limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to do the same on line 665. It's the method that lists repair runs throughput all registered clusters.
Currently it's still using the old getRepairRunsForCluster()
method which isn't optimized to prioritize running repairs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that on line 665, it is adding the runs from ALL clusters into a list. I'll need to re-sort the runs so that the statuses are in the right order irrespective of cluster. Once you let me know the precise way you want the sorting done, I might just encapsulate it in a function so that we aren't duplicating that logic in multiple places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, you'll need to merge lists from different clusters and re-sort to apply the limits.
FYI, repair ids are timeuuids, which makes it possible to apply the sorting on them directly using their time component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be done and working now, although the cucumber test doesn't test against multiple clusters. I'm not sure if I should add a multi-cluster test to confirm that this endpoint does indeed work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be fairly hard to achieve with the limited resources we have at our disposal.
A mocked test wouldn't help much I guess, so we're left with manual testing for multi cluster 🤷
src/server/src/main/java/io/cassandrareaper/storage/MemoryStorage.java
Outdated
Show resolved
Hide resolved
src/server/src/main/resources/db/cassandra/032_add_2i_status.cql
Outdated
Show resolved
Hide resolved
Co-authored-by: Alexander Dejanovski <alex@thelastpickle.com>
…r_runs/` as well as `/repair_runs/cluster`
… and the timeUUID to determine ordering.
I've created a new method on My only concern with this is that I don't like having methods on RepairRun since it doesn't have a lot right now and is almost a model object (having few methods beyond a Maybe we shouldn't be doing the sorting in the storage layer at all since this type of sorting is purely a presentation layer concern (so perhaps implement in 'RepairRunResource' and leave responses from storage layers unsorted)? I'm open to feedback on the design here, as you know the patterns in this codebase better than me. |
96e642f
to
0a0faaa
Compare
2e09f7a
to
093633c
Compare
Manual testing suggests that this isn't working again for some reason. I've tried flipping the ordering in these conditions, but neither ordering appears to give us the result we want in the UI:
More concerning, I haven't seen the cucumber tests failing either way, so they don't appear to be detecting errors. It may be the case that there is some ordering being applied in the UI, since I note that the repair runs are always ordered by start time (not creation time, which UUID should give us, I think?) I'll investigate further. |
src/server/src/main/java/io/cassandrareaper/resources/RepairRunResource.java
Outdated
Show resolved
Hide resolved
src/server/src/main/java/io/cassandrareaper/core/RepairRun.java
Outdated
Show resolved
Hide resolved
Manual testing confirms this PR works as intended. The outstanding questions are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suggestion to simplify the code a little bit.
No need to try testing multi cluster stuff in cucumber, we won't have enough resources in GHA to do so.
Also, it's fine to keep create/abort in the same cucumber step for now. If we need it later, we can refactor it in two separate steps.
final Collection<RepairRun> repairRuns = context.storage.getRepairRunsForCluster(clusterName, limit); | ||
final Collection<RepairRun> repairRuns = context | ||
.storage | ||
.getRepairRunsForClusterPrioritiseRunning(clusterName, limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be fairly hard to achieve with the limited resources we have at our disposal.
A mocked test wouldn't help much I guess, so we're left with manual testing for multi cluster 🤷
for (RunState state : | ||
Arrays | ||
.stream(RunState.values()) | ||
.filter(v -> | ||
Arrays.asList("RUNNING", "PAUSED", "NOT_STARTED") | ||
.contains(v.toString())) | ||
.collect(Collectors.toList()) | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): Do you think this could be simplified to the following?
for (String state:Arrays.asList("RUNNING", "PAUSED", "NOT_STARTED")){
I'm not sure why we need the stream().filter().collect() here, but I could be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, it would be simpler. I'll make this change.
728c08c
to
b23b40e
Compare
The status page doesn't always show repairs which are actually running. We should sort the returned results according to which statuses are most "interesting" and show the most interesting rows first.
Fixes
#1217