-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for batching messages into a single kinesis record #24
Comments
Hello, thanks for your idea. Currently we are using PutRecords API which allows us putting up to 500 records at 1 API call without client-side batching records. As you mentioned, actually we can client-side batching up to 1MB, which may allow us beyond 500 records. However, this will introduces complexity into consumer side. The consumers would have to de-compose batched records before processing. So at this point of time, I don't want to do this in my idea. What do you think about this? |
@imaifactory Thanks for the response & nice work so far I was thinking that a format like this would optimise the shard throughput, given the kinesis shard limits - 1,000 per sec per shard putrecord limit for Kinesis (& 1MB/sec) But, on a second thought, if a single log message is ~1kb, then your going to hit the MB/sec limit regardless of the putrecord limit. So thinking about it, batching (reducing putrecords) would only make sense if you were also able to reduce the size of the messages (optimise MB/sec rate), for example by reducing the payload size by enabling GZIP compression (UTF-8) on the batched data. Of course you are right, the consumer would need to parse the record per the scheme in which it was put. I believe such a feature would have to be enabled by a configuration option on the fluent-plugin-kinesis. If the user does that, then naturally, the user would then be responsible for updating their consumer code to parse the batched records in the new format, to me this is no problem. Thanks |
Hmm, makes sense, thanks. Regarding compression: I think it actually reduce payload size/network throughput, however it's difficult to optimise the batched record size to Shard Limitation(1MB/sec). For example, whether batched 1500 records of 1KB records fit under 1MB or not, depends on its contents. Of course, it reduce your network throughput. Do you still want compression feature on this? |
@imaifactory Yep, This feature would be very good for me. I think it makes sense to separate those 2 configuration options. And then the compression type and compression encoding options. So finally, the relevant config might read something like:
|
@cj74 Sorry for long lag. Thank you for your suggestion about options. OK, it makes sense. I will put this to my queue. |
Hey @imaifactory, hope you are good, any news on how this feature is going? Thanks |
Sorry for keeping you waited. Unfortunately I don't have update on this. If you are in a hurry, is it possible for you to send PR? Also, I should inform everyone in a separate thread though, I want to tell you that the primary owner of this project has been handing over now. I will make a issue to inform when it is settled. Thank you. |
Hey @imaifactory Actually, now that Amazon have released Kinesis Firehose, support for this would actually trump my requirement for the gzip/batching features, because all I am doing is pushing to S3 (maybe other people would still find gzip/batching useful though). Kinesis Firehose should also be easier to implement in the project. |
Based on your use case, I would opt for Kinesis Firehose because with it, you do not have to deploy consumer application and you do not have to worry about they are operating normally. So, if configuration capability for compression and chunking of Kinesis Firehose satisfies you, why not use Firehose. |
@imaifactory now we just need firehose support in aws-fluent-plugin-kinesis, then! |
Use this! The plugin below is developed by @winebarrel, who is also greatly contributing to this plugin! |
I added Gzip support for Fluent plugin kinesis. Here is a pull request for this: #39 |
I'm going to close this.
|
Hi
It would be cool to be able to batch a number of log messages per kinesis put record in some way, since kinesis supports up to a 1MB payload and a single log message can easy be <1kb. So it seems wasteful on high throughput systems to use 1 kinesis record per log message.
If you create such a feature, it could be provided by a config option, and then be limited on number of log messages and/or size, with a minimum send timeout.
Formatting for the data blob would be an issue here (e.g. multiple JSON records per message). Perhaps there could also be a config option for this too, whether it goes into a bigger json (records unordered), json array, or simply the same format but line separated.
Thanks
The text was updated successfully, but these errors were encountered: