Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repairs Not progressing, when Incremental repairs weren't used and cluster has range movement #1367

Open
rathan1723 opened this issue Aug 24, 2023 · 3 comments · May be fixed by #1521
Open
Labels

Comments

@rathan1723
Copy link

rathan1723 commented Aug 24, 2023

Project board link

Reaper Version in use v 3.2.0

Currently, for non-incremental repairs with one segment per node, Reaper is not resilient against changes in token ranges; any repair schedules created before a range change would subsequently fail consistently, resulting in continuous retries. In addition, these failing repairs seem to take consistent priority over repairs for other keyspaces, blocking progress completely.

image
image
image
image

ERROR

2023-06-07T14:10:55.595Z: ERROR [2023-06-07 14:10:54,386] [****:1cfdb910-df93-11ed-9996-a1697ff8c53a] i.c.j.ClusterFacade - [tokenRangeToEndpoint] no replicas found for token range io.cassandrareaper.core.Segment@41265603 2023-06-07T14:10:55.595Z: ERROR [2023-06-07 14:10:54,387] [****:1cfdb910-df93-11ed-9996-a1697ff8c53a] i.c.s.RepairRunner - RepairRun FAILURE, scheduling retry 2023-06-07T14:10:55.595Z: java.lang.IllegalArgumentException: no hosts provided to connectAny 2023-06-07T14:10:55.595Z: at com.google.common.base.Preconditions.checkArgument(Preconditions.java:135) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.jmx.JmxConnectionFactory.connectAny(JmxConnectionFactory.java:135) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.jmx.ClusterFacade.connectImpl(ClusterFacade.java:885) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.jmx.ClusterFacade.connect(ClusterFacade.java:869) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.service.RepairRunner.startNextSegment(RepairRunner.java:472) 2023-06-07T14:10:55.596Z: at io.cassandrareaper.service.RepairRunner.run(RepairRunner.java:235) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 2023-06-07T14:10:55.596Z: at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) 2023-06-07T14:10:55.596Z: at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) 2023-06-07T14:10:55.596Z: at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) 2023-06-07T14:10:55.596Z: at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 2023-06-07T14:10:55.596Z: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 2023-06-07T14:10:55.596Z: at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) 2023-06-07T14:10:55.596Z: at java.base/java.lang.Thread.run(Thread.java:829) 2023-06-07T14:10:00.009Z: ERROR [2023-06-07 14:09:59,816] [****:a8424cd0-df92-11ed-9996-a1697ff8c53a] i.c.j.ClusterFacade - [tokenRangeToEndpoint] no replicas found for token range io.cassandrareaper.core.Segment@60b78a64 2023-06-07T14:10:00.009Z: ERROR [2023-06-07 14:09:59,816] [****:a8424cd0-df92-11ed-9996-a1697ff8c53a] i.c.s.RepairRunner - RepairRun FAILURE, scheduling retry 2023-06-07T14:10:00.009Z: java.lang.IllegalArgumentException: no hosts provided to connectAny 2023-06-07T14:10:00.009Z: at com.google.common.base.Preconditions.checkArgument(Preconditions.java:135) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.jmx.JmxConnectionFactory.connectAny(JmxConnectionFactory.java:135) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.jmx.ClusterFacade.connectImpl(ClusterFacade.java:885) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.jmx.ClusterFacade.connect(ClusterFacade.java:869) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.service.RepairRunner.startNextSegment(RepairRunner.java:472) 2023-06-07T14:10:00.009Z: at io.cassandrareaper.service.RepairRunner.run(RepairRunner.java:235) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 2023-06-07T14:10:00.009Z: at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) 2023-06-07T14:10:00.009Z: at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) 2023-06-07T14:10:00.009Z: at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) 2023-06-07T14:10:00.009Z: at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 2023-06-07T14:10:00.009Z: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 2023-06-07T14:10:00.062Z: at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) 2023-06-07T14:10:00.062Z: at java.base/java.lang.Thread.run(Thread.java:829)

Code flow for the error,

RepairRunner

private void startNextSegment() throws ReaperException, InterruptedException {
  boolean scheduleRetry = true;

  // We want to know whether a repair was started,
  // so that a rescheduling of this runner will happen.
  boolean repairStarted = false;

  // We have an empty slot, so let's start new segment runner if possible.
  // When in sidecar mode, filter on ranges that the local node is a replica for only.
  LOG.info("Attempting to run new segment...");
  List<RepairSegment> nextRepairSegments
      = context.config.isInSidecarMode()
          ? ((IDistributedStorage) context.storage)
              .getNextFreeSegmentsForRanges(
                  repairRunId, localEndpointRanges)
          : context.storage.getNextFreeSegments(
              repairRunId);

  Optional<RepairSegment> nextRepairSegment = Optional.empty();
  Collection<String> potentialReplicas = new HashSet<>();
  for (RepairSegment segment : nextRepairSegments) {
    Map<String, String> potentialReplicaMap = this.repairRunService.getDCsByNodeForRepairSegment(
        cluster, segment.getTokenRange(), repairUnit.getKeyspaceName(), repairUnit);
    potentialReplicas = repairUnit.getIncrementalRepair()
        ? Collections.singletonList(segment.getCoordinatorHost())
        : potentialReplicaMap.keySet();
    JmxProxy coordinator = clusterFacade.connect(cluster, potentialReplicas);
    if (nodesReadyForNewRepair(coordinator, segment, potentialReplicaMap, repairRunId)) {
      nextRepairSegment = Optional.of(segment);
      break;
    }
  }

Since incremental repair is not in use, Reaper computes potentialReplicas as the keys returned from getDCsByNodeForRepairSegment(), which is based on the current ring state and on data stored when the schedule was created (segment.getTokenRange())

RepairRunService

Map<String, String> getDCsByNodeForRepairSegment(
    Cluster cluster,
    Segment segment,
    String keyspace,
    RepairUnit repairUnit) throws ReaperException {

  final int maxAttempts = 2;
  for (int attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      JmxProxy jmxConnection = clusterFacade.connect(cluster);
      // when hosts are coming up or going down, this method can throw an UndeclaredThrowableException
      Collection<String> nodes = clusterFacade.tokenRangeToEndpoint(cluster, keyspace, segment);
      Map<String, String> dcByNode = Maps.newHashMap();
      nodes.forEach(node -> dcByNode.put(node, EndpointSnitchInfoProxy.create(jmxConnection).getDataCenter(node)));
      if (repairUnit.getDatacenters().isEmpty()) {
        return dcByNode;
      } else {
        return dcByNode.entrySet().stream()
          .filter(entry -> repairUnit.getDatacenters().contains(entry.getValue()))
          .collect(Collectors.toMap(entry -> entry.getKey(), entry -> entry.getValue()));
      }
    }

The mapping is computed partly based on the output of tokenRangeToEndpoint(), which attempts to find a node that owns the range completely enclosing the segment under repair

ClusterFacade

public List<String> tokenRangeToEndpoint(Cluster cluster, String keyspace, Segment segment) {
  Set<Map.Entry<List<String>, List<String>>> entries;
  try {
    entries = getRangeToEndpointMap(cluster, keyspace).entrySet();
  } catch (ReaperException e) {
    LOG.error("[tokenRangeToEndpoint] no replicas found for token range {}", segment, e);
    return Lists.newArrayList();
  }

  for (Map.Entry<List<String>, List<String>> entry : entries) {
    BigInteger rangeStart = new BigInteger(entry.getKey().get(0));
    BigInteger rangeEnd = new BigInteger(entry.getKey().get(1));
    if (new RingRange(rangeStart, rangeEnd).encloses(segment.getTokenRanges().get(0))) {
      return entry.getValue();
    }
  }
  LOG.error("[tokenRangeToEndpoint] no replicas found for token range {}", segment);
  LOG.debug("[tokenRangeToEndpoint] checked token ranges were {}", entries);
  return Lists.newArrayList();
}

With one segment per node, each Segment corresponds to a single token range. If any additive/lateral range movements occur, then for at least some stored segments, no single endpoint will completely enclose its range. So tokenRangeToEndpoint() will return an empty list, which results in getDCsByNodeForRepairSegment() to return an empty map, which ultimately results in an empty list of potential coordinators being passed to connectAny()

RepairRunner - JmxConnectionFactory

@VisibleForTesting
public final JmxProxy connectAny(Collection<Node> nodes) throws ReaperException {

  Preconditions.checkArgument(
      null != nodes && !nodes.isEmpty(), "no hosts provided to connectAny");

  List<Node> nodeList = new ArrayList<>(nodes);
  Collections.shuffle(nodeList);

resulting in the errors logged above

The cluster in use has no incremental repair. What is the right way to proceed here, does it make sense to special case use cases involving range movement, in the same way that incremental repair is special cased.

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: REAP-33

@rathan1723
Copy link
Author

rathan1723 commented Nov 2, 2023

what would be the direction to proceed, I see code change in 3.2.1 for reaper to be resilient for topology change with incremental repair, updated code and backend data update, should i approach in the same way for the above issue

@adejanovski
Copy link
Contributor

Hi @rathan1723, repairs cannot be resilient to topology changes as the token ranges will be different than the ones that were used to compute the segments. What we did is make them resilient to ip address changes when the topology doesn't change.
Now what we'd need to do is make sure actual topology changes will fail the repair early on instead of trying over and over again to re-run the segments that are no longer valid.

@andresbeckruiz
Copy link

Hi @adejanovski , I created a patch to fix this issue (#1521) that is ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants