Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#38 - Support zlib compression #39

Merged
merged 1 commit into from
Oct 30, 2015

Conversation

rahulashok
Copy link
Contributor

Added Gzip support for Fluent plugin kinesis

@riywo
Copy link
Contributor

riywo commented Oct 23, 2015

Hi @rahul007ashok ,

This is very interesting. I'm checking it out.

@rahulashok
Copy link
Contributor Author

Hi @riywo ,

I was wondering if you have any questions or suggestions. Thanks

@riywo
Copy link
Contributor

riywo commented Oct 28, 2015

Hi @rahul007ashok ,

Could you confirm this contribution is released under Apache 2.0?

Best

@rahulashok
Copy link
Contributor Author

Hi @riywo
Yes, this contribution will be under Apache 2.0.
Rahul

@riywo
Copy link
Contributor

riywo commented Oct 28, 2015

Thank you for confirmation!

@@ -220,7 +222,11 @@ def get_key(name, record)
end

def build_data_to_put(data)
Hash[data.map{|k, v| [k.to_sym, v] }]
if @gzip_compression
Hash[data.map{|k, v| [k.to_sym, k=="data" ? Zlib::Deflate.deflate(v) : v] }]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In write method, this plugin checks whether the size of record exceeds 1MB limitation or not. So, deflating after the checking will cause some trouble.

@riywo
Copy link
Contributor

riywo commented Oct 28, 2015

Hi @rahul007ashok ,

I've checked somehow and added a comment. Could you modify it to deflate before the checking size of records?

Also, could you add the description for gzip_compression option in README.md?

Best,
riywo

@rahulashok
Copy link
Contributor Author

Hi @riywo ,

Added documentation to the README and also modified the write method to deflate before checking the size of records.

Thanks,
Rahul

@riywo
Copy link
Contributor

riywo commented Oct 28, 2015

@rahul007ashok

Thanks for update. Let me check...

Could you rebase your changes to a single commit please?

@riywo riywo added this to the v0.4.0 milestone Oct 28, 2015
@rahulashok rahulashok force-pushed the 38-gzip-compression-support branch 2 times, most recently from 6048d7c to a9577f4 Compare October 28, 2015 16:31
@rahulashok
Copy link
Contributor Author

@riywo
Done.

@rahulashok rahulashok changed the title #38 - Support gzip compression #38 - Support zlib compression Oct 29, 2015
@rahulashok
Copy link
Contributor Author

@riywo
I have modified the pull request to do zlib instead. Zlib compression is more widely used and is more efficient than gzip. Updated code, documentation and commit message to reflect this.

riywo added a commit that referenced this pull request Oct 30, 2015
@riywo riywo merged commit 963a7b6 into awslabs:master Oct 30, 2015
@riywo
Copy link
Contributor

riywo commented Oct 30, 2015

Thank you for the PR! I will release the new version later.

@wryun
Copy link

wryun commented Nov 5, 2015

Don't really know anything about fluentd plugins, but why isn't this done in the format method instead? Then we save space buffering on disk as well (and waste less memory during processing).

@rahulashok
Copy link
Contributor Author

The format method returns a MessagePack serialized array. MessagePack is not compatible with zlib/gzip(or other compression methods). So it's not possible for the format method to do the compression(although this would be better).

@wryun
Copy link

wryun commented Nov 5, 2015

But the format method can also return a raw string, AFAIK.

Would make it somewhat harder to do, but not impossible? e.g. first 4 bytes of each record is length of record. Or you can get fancy and use some encoding like protobuf (i.e. anything which allows concatenation of binary outputs to make a valid sequence).

@wryun
Copy link

wryun commented Nov 5, 2015

Hmm, also, why doesn't MessagePack 'support zlib/gzip'? Can't it just store the zipped field in the binary format? (ok, I should try this myself)

@riywo
Copy link
Contributor

riywo commented Nov 12, 2015

v0.4.0 is also released at rubygems.
https://rubygems.org/gems/fluent-plugin-kinesis/versions/0.4.0

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants