Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemoveEntriesOlderThan is not bounded #143

Open
ramonsmits opened this issue Jan 7, 2016 · 3 comments
Open

RemoveEntriesOlderThan is not bounded #143

ramonsmits opened this issue Jan 7, 2016 · 3 comments

Comments

@ramonsmits
Copy link
Member

ramonsmits commented Jan 7, 2016

The current implementation both the client side queryover and server side delete statements are unbounded. If you have a very large number of items that will be deleted that that impacts locking and writing huge amounts of data in a single transaction to the journal log which can result in deadlocks

A better implementation is to do batched deletes in separate transactions until there are no more items to be purged.

@DavidBoike Already updated the QueryOver clientside to a server side single SQL query in the master branch but this is missing in develop.

Using SQL server this can be done via DELETE top (10000) FROM OutboxRecord. Oracle uses the LIMIT keyword and other databases might need a nested SELECT like DELETE FROM X WHERE id in (select id from X LIMIT 10000).

An alternative is: not having an bounded resultset, use the rowcount and if it passes a certain tresshold to log a WARNING that indicates that the cleanup interval might be set to low.

@ramonsmits
Copy link
Member Author

ramonsmits commented Jan 7, 2016

After a chat with @DavidBoike we agreed that it would be hard to do in a database agnostic way.

We agreed on to test the affected row count and log a warning when a configurable arbitrary (6.000?) limit is passed. This means that if a customer is having more than 100 msg/s that he will be getting warnings to use a smaller cleanup interval.

https://github.com/Particular/NServiceBus.NHibernate/blob/develop/src/NServiceBus.NHibernate/Outbox/OutboxPersister.cs#L99

Also, the timer logic should make sure that the cleanup task cannot be overlapping. It should detect if the task is already running and if it is, log an error.

https://github.com/Particular/NServiceBus.NHibernate/blob/develop/src/NServiceBus.NHibernate/Outbox/NHibernateOutboxStorage.cs#L91

@ramonsmits
Copy link
Member Author

To extend this, the timer logic should be auto sensing with a min and max period.

By default we should start with a minute.

  • We should keep track of the last X (25?) deletion row counts and execution timestamps
  • Sum those and divide that by an ideal (configurable) batch size (1.000?).
  • Calculate the factor (batchsize/sum)
  • Calculate the sleep time: period in milliseconds * factor
  • min / max this
  • Sleep :-)

In high volume environments we will automatically delete more often resulting in less likely to escalate to a page or table lock.

In low volume environments we will maybe just run the cleanup every hour.

All of these can be configurable:

  • Min cleanup interval
  • Max cleanup interval
  • Cleanup batch size
  • History set size

@ramonsmits
Copy link
Member Author

I checked if this is still relevant and the logic is still the same:

var queryString = $"delete from {typeof(TEntity).Name} where Dispatched = true And DispatchedAt < :date";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants