Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for compressed batches of messages #63

Open
benwh opened this issue Jun 1, 2020 · 6 comments
Open

Support for compressed batches of messages #63

benwh opened this issue Jun 1, 2020 · 6 comments

Comments

@benwh
Copy link

benwh commented Jun 1, 2020

Hi @mia-0032

Firstly thanks for your work on this plugin - we've been using it at GoCardless with great success for some time now.

We've recently developed a new feature for this plugin, that batches as many messages up as it can within the flush_interval, then compresses these with Zlib and sends them via a single Pub/Sub message, with an attribute set on it to indicate that it contains a compressed batch of messages.

The implementation for this can be found in this PR.

Given that we can easily achieve a 90% compression ratio, and that we're sending upwards of 10TiB of logs over Pub/Sub each month, this results in significant cost savings while still maintaining decoupling between the sender and receiver. We've been running this code for a couple of weeks now without any issues.


We'd like to contribute this piece of work upstream to your repository. But before we do that, I wanted to check with you whether this is something that you'd accept, and what's the easiest way of us doing this for you? The PR consists of a few main pieces:

  1. Adding Rubocop configuration, and fixing up any violations (we use Rubocop heavily at GoCardless, and find it makes it easier to keep the codebase consistent. We'd be happy to adjust the rules if there's any that you object to)
  2. Adding Prometheus metrics around the existing operations (this gives us a better view of the performance of the plugin)
  3. Adding the compress_batches feature (this also adds some more Prometheus metrics, related to compression)

For us, it would be ideal if we could get all of these changes merged back in to your upstream, to make any future features easy to develop without having a large divergence of the fork.

Which of these commits would you be willing for us to submit as a PR, and in what format: one PR per commit, or one PR with all of the commits? Also would there be any changes you'd like up-front before we open a PR?

Thanks!

@GLStephen
Copy link

I'm not 100% sure of the level of maintenance at this point. Your feature sounds very very interesting though. If you end up at a dead end and end up forking please let me know. My vote would be a single PR so others could pull into their forks if the maintenance here isn't where we need it.

@mia-0032
Copy link
Owner

@benwh @GLStephen Thank you for your comments. I'm sorry for my late replying.

I thinkcompress_batches feature is very useful. Adding Rubocop configuration is welcome. Prometheus function seems to be useful for monitoring.

I've looked at the Pull Requests presented. Is it possible to split the pull requests so that the changes are easy to understand?

@mia-0032
Copy link
Owner

I added compression/decompression feature on v1.5.0.

@benwh
Copy link
Author

benwh commented Jan 26, 2021

@mia-0032 Thanks for your work on this! Apologies that I haven't had the time to split up the proposed features and PR them.

For our use-case having decompression enabled 'globally' at the subscription level isn't feasible, as we have some producers which can compress and others that cannot, hence rely on an attribute in the Pub/Sub message to mark the message as compressed. Additionally, we've found that batching messages provides much of the benefit when compressing.

If I were to rebase and implement the features listed above, would you be willing to merge these in? The only potential downside is that it adds a lot of complexity compared to the current implementation.

@mia-0032
Copy link
Owner

@benwh Thank you for your replying.

I understand that globally compression and decompression is inconvenient in some use cases. I thought that the current compression feature can't support migrating no-compression to compression. So I feel it is good solution to determine if messages was compressed by attribute.

I don't have the environment publishing much data so I can't verify the benefits of batch publishing.
But if it has performance benefits I would like to merge.

@DragosDumitrache
Copy link

DragosDumitrache commented Feb 2, 2023

@mia-0032 This is something I've taken over from @benwh for the time being. For about 800 million records per day, we have on average $300 dollars/day, which adds up to a decent sum over the course of a month/year. I can't give you access to our grafana dashboards showing this, but based on that, I'd like to rebase our changes on top of yours and try to integrate them upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants