Supporting load distribution across a cluster #258

tsg21 · 2022-03-11T09:05:38Z

The README states:

that said, if you want to distribute work across a cluster at point of submission, this is also supported

I do, in fact, have that exact requirement but I don't see how to achieve that with the TransactionOutbox API as it stands. How is it supported?

I think you might argue that this functionality is outside the scope of this library, but transaction-outbox is very close to supporting this. All I think it needs is a way to schedule a work item for asynchronous processing by the background thread(s), and not immediate execution in a post-commit hook. This could be done by adding an extra flag to ParameterizedScheduleBuilder.

Is this a feature you would consider adding? I might be able to spin up a PR if so.

The text was updated successfully, but these errors were encountered:

badgerwithagun · 2022-03-12T00:21:55Z

The issue is that there are too many different messaging protocols, load balancers and service discovery mechanisms. I wouldn't want to introduce more. All the support is there to wire into yours, though.

I'll outline how ours works, as an example.

First, you need #260, so LGTM.

Our load balancer (Nomad) gives each registered service its own DNS name, e.g. invoicing.service.consul. Therefore all we have to do is to route processing through the load balancer to distribute it around instances of the app.

We define a REST endpoint like this:

POST /outbox/process/:entryId

Which takes a TransactionOutboxEntry as its body, handled like this:

var objectMapper = myNormalObjectMapper.copy();
objectMapper.setDefaultTyping(TransactionOutboxJacksonModule.typeResolver());
objectMapper.registerModule(new TransactionOutboxJacksonModule());
TransactionOutboxEntry entry = objectMapper.readValue(request.bodyAsBytes(), TransactionOutboxEntry.class);
Submitter submitter = ExecutorSubmitter.builder().executor(localExecutor).logLevelWorkQueueSaturation(Level.INFO).build();
submitter.submit(entry, outbox.get()::processNow);

Where localExecutor is an ExecutorService operating on an ArrayBlockingQueue of limited size, so the endpoint never blocks,.

Then create your own Submitter implementation which pushes requests to that endpoint, again via an ExecutorService with a limited queue:

@Slf4j
@Builder
@AllArgsConstructor
class TransactionOutboxRemotingSubmitter implements Submitter {


  private final TransactionOutboxRemotingResource remotingResource;
  private final ExecutorService localExecutor;
  private final String url;

  @Override
  public void submit(TransactionOutboxEntry entry, Consumer<TransactionOutboxEntry> leIgnore) {
    try {
      localExecutor.execute(() -> processRemotely(entry));
      log.info("Queued {} to be sent for remote processing", entry.description());
    } catch (RejectedExecutionException e) {
      log.info("Will queue {} for processing when local executor is available", entry.description());
    } catch (Exception e) {
      log.warn("Failed to queue {} for execution at {}. It will be re-attempted later.", entry.description(), url, e);
    }
  }

  private void processRemotely(TransactionOutboxEntry entry) {
    try {
      // Sends to /outbox/process/:entryId using the same objectMapper as on the receiving side
      remotingResource.process(entry.getId(), entry);
      log.info("Submitted {} for remote processing at {}", entry.description(), url);
    } catch (Exception e) {
      log.warn(
        "Failed to submit {} for remote processing at {}. It will be re-attempted later.",
        entry.description(),
        url,
        e
      );
    }
  }
}

Works solidly in production (in fact, once we put this live we ended up DOSing the DB with the volume before we tuned everything else 😊)

badgerwithagun · 2022-03-14T12:14:35Z

@tsg21 - I've updated the README on my PR to provide more information on this: https://github.com/gruelbox/transaction-outbox/blob/fix-236/README.md#clustering

badgerwithagun · 2022-03-14T18:55:17Z

This has now been merged so you should be good to give it a whirl 👍🏻

tsg21 · 2022-03-16T12:43:16Z

I will do that...

tsg21 · 2023-09-27T17:59:38Z

We now use this in production backed by AWS SQS it and it works extremely reliably, with queues of over 50k items. I would in principle like to contribute the code to the project but it mashes together Spring and AWS SQS support. I think for it to fit into the modular structure of the project I would need to decouple the code from Spring, which would require a re-write.

Is there any interest in this?

badgerwithagun · 2023-09-28T07:51:41Z

Very interested indeed @tsg21 .

badgerwithagun · 2023-12-23T18:15:09Z

Something you're likely to have time for, @tsg21 ?

badgerwithagun added the question Further information is requested label Mar 14, 2022

gruelbox locked and limited conversation to collaborators Dec 23, 2023

badgerwithagun converted this issue into discussion #543 Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Supporting load distribution across a cluster #258

Supporting load distribution across a cluster #258

tsg21 commented Mar 11, 2022

badgerwithagun commented Mar 12, 2022 •

edited

Loading

badgerwithagun commented Mar 14, 2022

badgerwithagun commented Mar 14, 2022

tsg21 commented Mar 16, 2022

tsg21 commented Sep 27, 2023

badgerwithagun commented Sep 28, 2023

badgerwithagun commented Dec 23, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Supporting load distribution across a cluster #258

Supporting load distribution across a cluster #258

Comments

tsg21 commented Mar 11, 2022

badgerwithagun commented Mar 12, 2022 • edited Loading

badgerwithagun commented Mar 14, 2022

badgerwithagun commented Mar 14, 2022

tsg21 commented Mar 16, 2022

tsg21 commented Sep 27, 2023

badgerwithagun commented Sep 28, 2023

badgerwithagun commented Dec 23, 2023

This issue was moved to a discussion.

badgerwithagun commented Mar 12, 2022 •

edited

Loading