Repairs Not progressing, when Incremental repairs weren't used and cluster has range movement #1367

rathan1723 · 2023-08-24T15:59:18Z

Reaper Version in use v 3.2.0

Currently, for non-incremental repairs with one segment per node, Reaper is not resilient against changes in token ranges; any repair schedules created before a range change would subsequently fail consistently, resulting in continuous retries. In addition, these failing repairs seem to take consistent priority over repairs for other keyspaces, blocking progress completely.

ERROR

2023-06-07T14:10:55.595Z: ERROR [2023-06-07 14:10:54,386] [****:1cfdb910-df93-11ed-9996-a1697ff8c53a] i.c.j.ClusterFacade - [tokenRangeToEndpoint] no replicas found for token range io.cassandrareaper.core.Segment@41265603 2023-06-07T14:10:55.595Z: ERROR [2023-06-07 14:10:54,387] [****:1cfdb910-df93-11ed-9996-a1697ff8c53a] i.c.s.RepairRunner - RepairRun FAILURE, scheduling retry 2023-06-07T14:10:55.595Z: java.lang.IllegalArgumentException: no hosts provided to connectAny 2023-06-07T14:10:55.595Z: at com.google.common.base.Preconditions.checkArgument(Preconditions.java:135) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.jmx.JmxConnectionFactory.connectAny(JmxConnectionFactory.java:135) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.jmx.ClusterFacade.connectImpl(ClusterFacade.java:885) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.jmx.ClusterFacade.connect(ClusterFacade.java:869) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.service.RepairRunner.startNextSegment(RepairRunner.java:472) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.service.RepairRunner.run(RepairRunner.java:235) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 2023-06-07T14:10:55.596Z: at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) 2023-06-07T14:10:55.596Z: at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) 2023-06-07T14:10:55.596Z: at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) 2023-06-07T14:10:55.596Z: at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 2023-06-07T14:10:55.596Z: at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) 2023-06-07T14:10:55.596Z: at java.base/java.lang.Thread.run(Thread.java:829) 2023-06-07T14:10:00.009Z: ERROR [2023-06-07 14:09:59,816] [****:a8424cd0-df92-11ed-9996-a1697ff8c53a] i.c.j.ClusterFacade - [tokenRangeToEndpoint] no replicas found for token range io.cassandrareaper.core.Segment@60b78a64 2023-06-07T14:10:00.009Z: ERROR [2023-06-07 14:09:59,816] [****:a8424cd0-df92-11ed-9996-a1697ff8c53a] i.c.s.RepairRunner - RepairRun FAILURE, scheduling retry 2023-06-07T14:10:00.009Z: java.lang.IllegalArgumentException: no hosts provided to connectAny 2023-06-07T14:10:00.009Z: at com.google.common.base.Preconditions.checkArgument(Preconditions.java:135) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.jmx.JmxConnectionFactory.connectAny(JmxConnectionFactory.java:135) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.jmx.ClusterFacade.connectImpl(ClusterFacade.java:885) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.jmx.ClusterFacade.connect(ClusterFacade.java:869) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.service.RepairRunner.startNextSegment(RepairRunner.java:472) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.service.RepairRunner.run(RepairRunner.java:235) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 2023-06-07T14:10:00.009Z: at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) 2023-06-07T14:10:00.009Z: at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) 2023-06-07T14:10:00.009Z: at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) 2023-06-07T14:10:00.009Z: at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 2023-06-07T14:10:00.062Z: at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) 2023-06-07T14:10:00.062Z: at java.base/java.lang.Thread.run(Thread.java:829)

Code flow for the error,

RepairRunner

private void startNextSegment() throws ReaperException, InterruptedException {
  boolean scheduleRetry = true;

  // We want to know whether a repair was started,
  // so that a rescheduling of this runner will happen.
  boolean repairStarted = false;

  // We have an empty slot, so let's start new segment runner if possible.
  // When in sidecar mode, filter on ranges that the local node is a replica for only.
  LOG.info("Attempting to run new segment...");
  List<RepairSegment> nextRepairSegments
      = context.config.isInSidecarMode()
          ? ((IDistributedStorage) context.storage)
              .getNextFreeSegmentsForRanges(
                  repairRunId, localEndpointRanges)
          : context.storage.getNextFreeSegments(
              repairRunId);

  Optional<RepairSegment> nextRepairSegment = Optional.empty();
  Collection<String> potentialReplicas = new HashSet<>();
  for (RepairSegment segment : nextRepairSegments) {
    Map<String, String> potentialReplicaMap = this.repairRunService.getDCsByNodeForRepairSegment(
        cluster, segment.getTokenRange(), repairUnit.getKeyspaceName(), repairUnit);
    potentialReplicas = repairUnit.getIncrementalRepair()
        ? Collections.singletonList(segment.getCoordinatorHost())
        : potentialReplicaMap.keySet();
    JmxProxy coordinator = clusterFacade.connect(cluster, potentialReplicas);
    if (nodesReadyForNewRepair(coordinator, segment, potentialReplicaMap, repairRunId)) {
      nextRepairSegment = Optional.of(segment);
      break;
    }
  }

Since incremental repair is not in use, Reaper computes potentialReplicas as the keys returned from getDCsByNodeForRepairSegment(), which is based on the current ring state and on data stored when the schedule was created (segment.getTokenRange())

RepairRunService

Map<String, String> getDCsByNodeForRepairSegment(
    Cluster cluster,
    Segment segment,
    String keyspace,
    RepairUnit repairUnit) throws ReaperException {

  final int maxAttempts = 2;
  for (int attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      JmxProxy jmxConnection = clusterFacade.connect(cluster);
      // when hosts are coming up or going down, this method can throw an UndeclaredThrowableException
      Collection<String> nodes = clusterFacade.tokenRangeToEndpoint(cluster, keyspace, segment);
      Map<String, String> dcByNode = Maps.newHashMap();
      nodes.forEach(node -> dcByNode.put(node, EndpointSnitchInfoProxy.create(jmxConnection).getDataCenter(node)));
      if (repairUnit.getDatacenters().isEmpty()) {
        return dcByNode;
      } else {
        return dcByNode.entrySet().stream()
          .filter(entry -> repairUnit.getDatacenters().contains(entry.getValue()))
          .collect(Collectors.toMap(entry -> entry.getKey(), entry -> entry.getValue()));
      }
    }

The mapping is computed partly based on the output of tokenRangeToEndpoint(), which attempts to find a node that owns the range completely enclosing the segment under repair

ClusterFacade

public List<String> tokenRangeToEndpoint(Cluster cluster, String keyspace, Segment segment) {
  Set<Map.Entry<List<String>, List<String>>> entries;
  try {
    entries = getRangeToEndpointMap(cluster, keyspace).entrySet();
  } catch (ReaperException e) {
    LOG.error("[tokenRangeToEndpoint] no replicas found for token range {}", segment, e);
    return Lists.newArrayList();
  }

  for (Map.Entry<List<String>, List<String>> entry : entries) {
    BigInteger rangeStart = new BigInteger(entry.getKey().get(0));
    BigInteger rangeEnd = new BigInteger(entry.getKey().get(1));
    if (new RingRange(rangeStart, rangeEnd).encloses(segment.getTokenRanges().get(0))) {
      return entry.getValue();
    }
  }
  LOG.error("[tokenRangeToEndpoint] no replicas found for token range {}", segment);
  LOG.debug("[tokenRangeToEndpoint] checked token ranges were {}", entries);
  return Lists.newArrayList();
}

With one segment per node, each Segment corresponds to a single token range. If any additive/lateral range movements occur, then for at least some stored segments, no single endpoint will completely enclose its range. So tokenRangeToEndpoint() will return an empty list, which results in getDCsByNodeForRepairSegment() to return an empty map, which ultimately results in an empty list of potential coordinators being passed to connectAny()

RepairRunner - JmxConnectionFactory

@VisibleForTesting
public final JmxProxy connectAny(Collection<Node> nodes) throws ReaperException {

  Preconditions.checkArgument(
      null != nodes && !nodes.isEmpty(), "no hosts provided to connectAny");

  List<Node> nodeList = new ArrayList<>(nodes);
  Collections.shuffle(nodeList);

resulting in the errors logged above

The cluster in use has no incremental repair. What is the right way to proceed here, does it make sense to special case use cases involving range movement, in the same way that incremental repair is special cased.

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: REAP-33

The text was updated successfully, but these errors were encountered:

rathan1723 · 2023-11-02T12:02:26Z

what would be the direction to proceed, I see code change in 3.2.1 for reaper to be resilient for topology change with incremental repair, updated code and backend data update, should i approach in the same way for the above issue

adejanovski · 2023-11-15T12:24:46Z

Hi @rathan1723, repairs cannot be resilient to topology changes as the token ranges will be different than the ones that were used to compute the segments. What we did is make them resilient to ip address changes when the topology doesn't change.
Now what we'd need to do is make sure actual topology changes will fail the repair early on instead of trying over and over again to re-run the segments that are no longer valid.

andresbeckruiz · 2024-11-06T22:25:22Z

Hi @adejanovski , I created a patch to fix this issue (#1521) that is ready for review.

adejanovski added this to K8ssandra Aug 24, 2023

adejanovski moved this to Ready in K8ssandra Nov 15, 2023

adejanovski added the ready label Nov 15, 2023

sync-by-unito bot assigned adejanovski Sep 3, 2024

andresbeckruiz linked a pull request Sep 3, 2024 that will close this issue

Fail repair after additive change in cluster topology #1521

Open

sync-by-unito bot unassigned adejanovski Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repairs Not progressing, when Incremental repairs weren't used and cluster has range movement #1367

Repairs Not progressing, when Incremental repairs weren't used and cluster has range movement #1367

rathan1723 commented Aug 24, 2023 •

edited by sync-by-unito bot

Loading

rathan1723 commented Nov 2, 2023 •

edited

Loading

adejanovski commented Nov 15, 2023

andresbeckruiz commented Nov 6, 2024

Repairs Not progressing, when Incremental repairs weren't used and cluster has range movement #1367

Repairs Not progressing, when Incremental repairs weren't used and cluster has range movement #1367

Comments

rathan1723 commented Aug 24, 2023 • edited by sync-by-unito bot Loading

rathan1723 commented Nov 2, 2023 • edited Loading

adejanovski commented Nov 15, 2023

andresbeckruiz commented Nov 6, 2024

rathan1723 commented Aug 24, 2023 •

edited by sync-by-unito bot

Loading

rathan1723 commented Nov 2, 2023 •

edited

Loading