Support for compressed batches of messages #63

benwh · 2020-06-01T22:31:04Z

Firstly thanks for your work on this plugin - we've been using it at GoCardless with great success for some time now.

We've recently developed a new feature for this plugin, that batches as many messages up as it can within the flush_interval, then compresses these with Zlib and sends them via a single Pub/Sub message, with an attribute set on it to indicate that it contains a compressed batch of messages.

The implementation for this can be found in this PR.

Given that we can easily achieve a 90% compression ratio, and that we're sending upwards of 10TiB of logs over Pub/Sub each month, this results in significant cost savings while still maintaining decoupling between the sender and receiver. We've been running this code for a couple of weeks now without any issues.

We'd like to contribute this piece of work upstream to your repository. But before we do that, I wanted to check with you whether this is something that you'd accept, and what's the easiest way of us doing this for you? The PR consists of a few main pieces:

Adding Rubocop configuration, and fixing up any violations (we use Rubocop heavily at GoCardless, and find it makes it easier to keep the codebase consistent. We'd be happy to adjust the rules if there's any that you object to)
Adding Prometheus metrics around the existing operations (this gives us a better view of the performance of the plugin)
Adding the compress_batches feature (this also adds some more Prometheus metrics, related to compression)

For us, it would be ideal if we could get all of these changes merged back in to your upstream, to make any future features easy to develop without having a large divergence of the fork.

Which of these commits would you be willing for us to submit as a PR, and in what format: one PR per commit, or one PR with all of the commits? Also would there be any changes you'd like up-front before we open a PR?

Thanks!

The text was updated successfully, but these errors were encountered:

GLStephen · 2020-08-09T20:53:12Z

I'm not 100% sure of the level of maintenance at this point. Your feature sounds very very interesting though. If you end up at a dead end and end up forking please let me know. My vote would be a single PR so others could pull into their forks if the maintenance here isn't where we need it.

mia-0032 · 2020-08-11T09:55:17Z

@benwh @GLStephen Thank you for your comments. I'm sorry for my late replying.

I thinkcompress_batches feature is very useful. Adding Rubocop configuration is welcome. Prometheus function seems to be useful for monitoring.

I've looked at the Pull Requests presented. Is it possible to split the pull requests so that the changes are easy to understand?

mia-0032 · 2021-01-26T10:17:23Z

I added compression/decompression feature on v1.5.0.

benwh · 2021-01-26T10:59:54Z

@mia-0032 Thanks for your work on this! Apologies that I haven't had the time to split up the proposed features and PR them.

For our use-case having decompression enabled 'globally' at the subscription level isn't feasible, as we have some producers which can compress and others that cannot, hence rely on an attribute in the Pub/Sub message to mark the message as compressed. Additionally, we've found that batching messages provides much of the benefit when compressing.

If I were to rebase and implement the features listed above, would you be willing to merge these in? The only potential downside is that it adds a lot of complexity compared to the current implementation.

mia-0032 · 2021-01-27T02:22:48Z

@benwh Thank you for your replying.

I understand that globally compression and decompression is inconvenient in some use cases. I thought that the current compression feature can't support migrating no-compression to compression. So I feel it is good solution to determine if messages was compressed by attribute.

I don't have the environment publishing much data so I can't verify the benefits of batch publishing.
But if it has performance benefits I would like to merge.

DragosDumitrache · 2023-02-02T15:37:43Z

@mia-0032 This is something I've taken over from @benwh for the time being. For about 800 million records per day, we have on average $300 dollars/day, which adds up to a decent sum over the course of a month/year. I can't give you access to our grafana dashboards showing this, but based on that, I'd like to rebase our changes on top of yours and try to integrate them upstream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for compressed batches of messages #63

Support for compressed batches of messages #63

benwh commented Jun 1, 2020

GLStephen commented Aug 9, 2020

mia-0032 commented Aug 11, 2020

mia-0032 commented Jan 26, 2021

benwh commented Jan 26, 2021

mia-0032 commented Jan 27, 2021

DragosDumitrache commented Feb 2, 2023 •

edited

Loading

Support for compressed batches of messages #63

Support for compressed batches of messages #63

Comments

benwh commented Jun 1, 2020

GLStephen commented Aug 9, 2020

mia-0032 commented Aug 11, 2020

mia-0032 commented Jan 26, 2021

benwh commented Jan 26, 2021

mia-0032 commented Jan 27, 2021

DragosDumitrache commented Feb 2, 2023 • edited Loading

DragosDumitrache commented Feb 2, 2023 •

edited

Loading