Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #32

Open
1 task done
mewa opened this issue Oct 15, 2019 · 3 comments
Open
1 task done

Improve performance #32

mewa opened this issue Oct 15, 2019 · 3 comments

Comments

@mewa
Copy link

mewa commented Oct 15, 2019

Report

First of all thanks for a nice and robust tool!

I thought it could use some performance improvements. Currently scanning and dumping data from bigger data sets takes quite a bit of time and there are a couple reasons for that:

  • When observing the reads and writes they happen almost serially. Making them concurrent would certainly improve the situation. By making the reads concurrent we should also be able to scan ahead (currently when scanning we're idling while fetching the next chunk of data).

  • Another thing is the default COUNT returned when scanning is 10. Increasing it is definitely a low-hanging fruit, but for obvious reasons it cannot be set too high and since it wouldn't make sense to hard-code an arbitrary value I feel that giving the user an option to increase the COUNT manually is the right thing to do.

@retpolanne
Copy link

retpolanne commented Dec 23, 2019

I +1 this issue. The dump-restoring process can be really slow using rump, especially for a large Redis.

I was thinking about which parts can run concurrently and which parts should run together:

  • I believe scanning can't be made concurrently since you rely on a cursor that is returned on each run. Maybe increasing the count can be beneficial in this case.
  • DUMP and TTL could be called concurrently, but they have to return together (since you would want to restore them both on the same command)
  • RESTORE could also be called concurrently.

Maybe we could have:

  • a struct containing both the Dump result and the TTL
  • a goroutine for getting the dump/ttl and adding it to a channel
  • a function that listens to this channel and calls a goroutine to restore on the destination Redis.

Edit: Actually, looking at the code, both read and write are called within a goroutine, and so they signal each other via the message bus. So maybe just calling the dump and TTL within a goroutine that blocks before sending content to the message bus can be helpful.

@nixtrace
Copy link
Member

nixtrace commented Jan 8, 2020

Thanks for the reports, we're looking into a performance regression introduced with the latest version, related to pipelining. We'll update the issue as soon as we have more details.

@mewa
Copy link
Author

mewa commented Jan 10, 2020

@vinicyusmacedo What I meant was that you're scanning the next bit of work while you are processing items from the previous scans effectively interleaving them with the dumps. This became an apparent bottleneck when we were extracting keys by pattern where scans would return no results (or little), often many times in a row.
As for parallelization, it should be possible in theory (see this article) but I think at this stage doing so is a bit far-fetched.

@nixtrace I don't know about the regression since we've only started using rump in October but we've definitely seen an improvement with some of our changes (especially with our use case of filtering by pattern), i.e.:

  • moving scans to a separate goroutine and running dumps in in a pool of goroutines - this should also help making use of radix's built-in pipelining.
  • allowing to change the COUNT param

We also have introduced a QoL change to allow passing PATTERN to scans, per our own requirements.

I tried to keep these features separate for easy integration (and actually opened this issue with an intent of donating code). Here you can preview the above-mentioned changes, respectively: concurrency, adjusting count parameter and adjusting scan pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants