Skip to content
This repository has been archived by the owner on Mar 4, 2021. It is now read-only.

Incorrect handling of Cassandra Snappy compressed SSTables. #10

Closed
danchia opened this issue Feb 21, 2014 · 5 comments
Closed

Incorrect handling of Cassandra Snappy compressed SSTables. #10

danchia opened this issue Feb 21, 2014 · 5 comments
Milestone

Comments

@danchia
Copy link
Contributor

danchia commented Feb 21, 2014

Aegisthus will actually miss out rows in compressed SSTables, as it does not correctly track the sstable file size.

The SSTableRecordReader tracks the position in file to determine if there are still records to read. However, based on SSTableReader it's tracking the position in the uncompressed stream.

However, in AegSplit after initializing the compressedInputStream, we don't adjust end to be the actual uncompressed stream length (available in CompressionMetadata).

I sort of have a working patch, will submit in a PR in a few days.

@charsmith
Copy link
Contributor

Awesome. I wrote the compressed reader as a proof of concept when we were considering adopting them a couple of years ago. But we haven't actually adopted them, so I haven't had a chance to verify the data against any real cases.

@charsmith charsmith added this to the Release 0.2 milestone Mar 19, 2014
@jimternet
Copy link

Did this get merged?

@charsmith
Copy link
Contributor

The pull request was never made so I fixed this in another pull that is now merged. This commit for reference:

638ffa0

I believe this issue is closed and I am going to go ahead and mark it now. There is another project that delves further into compressed SSTables: https://github.com/fullcontact/hadoop-sstable which is probably even better. Netflix doesn't have a lot of compressed tables so we haven't had a need to optimize this use case.

@danchia
Copy link
Contributor Author

danchia commented Jun 9, 2014

Ack, I totally forgot to get around to making the PR..

I've looked through the commit referenced, and that indeed should fix the problem I reference.

@danielbwatson
Copy link
Contributor

Going to close this since it should be fixed now.

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants