Improve performance #32

mewa · 2019-10-15T14:04:40Z

I have read the contributing guide.

Report

First of all thanks for a nice and robust tool!

I thought it could use some performance improvements. Currently scanning and dumping data from bigger data sets takes quite a bit of time and there are a couple reasons for that:

When observing the reads and writes they happen almost serially. Making them concurrent would certainly improve the situation. By making the reads concurrent we should also be able to scan ahead (currently when scanning we're idling while fetching the next chunk of data).
Another thing is the default COUNT returned when scanning is 10. Increasing it is definitely a low-hanging fruit, but for obvious reasons it cannot be set too high and since it wouldn't make sense to hard-code an arbitrary value I feel that giving the user an option to increase the COUNT manually is the right thing to do.

The text was updated successfully, but these errors were encountered:

retpolanne · 2019-12-23T14:40:09Z

I +1 this issue. The dump-restoring process can be really slow using rump, especially for a large Redis.

I was thinking about which parts can run concurrently and which parts should run together:

I believe scanning can't be made concurrently since you rely on a cursor that is returned on each run. Maybe increasing the count can be beneficial in this case.
DUMP and TTL could be called concurrently, but they have to return together (since you would want to restore them both on the same command)
RESTORE could also be called concurrently.

Maybe we could have:

a struct containing both the Dump result and the TTL
a goroutine for getting the dump/ttl and adding it to a channel
a function that listens to this channel and calls a goroutine to restore on the destination Redis.

Edit: Actually, looking at the code, both read and write are called within a goroutine, and so they signal each other via the message bus. So maybe just calling the dump and TTL within a goroutine that blocks before sending content to the message bus can be helpful.

nixtrace · 2020-01-08T17:19:48Z

Thanks for the reports, we're looking into a performance regression introduced with the latest version, related to pipelining. We'll update the issue as soon as we have more details.

mewa · 2020-01-10T14:21:29Z

@vinicyusmacedo What I meant was that you're scanning the next bit of work while you are processing items from the previous scans effectively interleaving them with the dumps. This became an apparent bottleneck when we were extracting keys by pattern where scans would return no results (or little), often many times in a row.
As for parallelization, it should be possible in theory (see this article) but I think at this stage doing so is a bit far-fetched.

@nixtrace I don't know about the regression since we've only started using rump in October but we've definitely seen an improvement with some of our changes (especially with our use case of filtering by pattern), i.e.:

moving scans to a separate goroutine and running dumps in in a pool of goroutines - this should also help making use of radix's built-in pipelining.
allowing to change the COUNT param

We also have introduced a QoL change to allow passing PATTERN to scans, per our own requirements.

I tried to keep these features separate for easy integration (and actually opened this issue with an intent of donating code). Here you can preview the above-mentioned changes, respectively: concurrency, adjusting count parameter and adjusting scan pattern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance #32

Improve performance #32

mewa commented Oct 15, 2019 •

edited

Loading

retpolanne commented Dec 23, 2019 •

edited

Loading

nixtrace commented Jan 8, 2020

mewa commented Jan 10, 2020

Improve performance #32

Improve performance #32

Comments

mewa commented Oct 15, 2019 • edited Loading

Report

retpolanne commented Dec 23, 2019 • edited Loading

nixtrace commented Jan 8, 2020

mewa commented Jan 10, 2020

mewa commented Oct 15, 2019 •

edited

Loading

retpolanne commented Dec 23, 2019 •

edited

Loading