Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

closes #12 - Create copy constructors for LongArrayOutput and ByteBuf… #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

billoley
Copy link

Provide a mechanism to get an exact copy of each BitOutput implementation.

If a user keeps a reference to the BitOutput implementation passed to a Compressor/GorillaCompressor, this will allow an exact copy to be made, finalized (as if Compressor.close() was called), and passed to a Decompressor/GorillaDecompressor without affecting the original Compressor/GorillaCompressor.

Useful in a production environment where we want to read the contents of the Compressor/GorillaCompressor but are not ready to close.

@burmanm
Copy link
Owner

burmanm commented Jan 30, 2018

Why don't you just do ByteBuffer.duplicate() ? It's the same backing array, but you have new pointers which allow you to read the underlying data - without disrupting the original process.

https://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html#duplicate()

@billoley
Copy link
Author

My understanding is that before passing the data from a Compressor to the Decompressor, that Compressor.close() needs to be called.

After I get a copy of the Compressor's BitOutput object, I call:

    newCompressorOutput.writeBits(0x0F, 4);
    newCompressorOutput.writeBits(0xFFFFFFFF, 32);
    newCompressorOutput.skipBit();
    newCompressorOutput.flush();

.. and then initialize the Compressor.

If this is not true, then I may have other options.

I think there are also a few synchronization issues with sharing the underlying data.

  1. Continue to ingest and compress TSDB data while queries are happening.
  2. If I need to age off the compressor data (N hour cache) at the same time that it is being queried, I would have to synchronize around the use of the Compressor and Decompressor.

@burmanm
Copy link
Owner

burmanm commented Jan 31, 2018

You only need to call close() before reading unknown number of items with Decompressor. If you know how many items to read, then there's no need for close(). Given that possibility, you could create a wrapper around your active compressor, such as the following example (modify to your needs, including any concurrent protection):

public class CompressInterface {

    private int count = 0;
    private ByteBufferBitOutput out;
    private GorillaCompressor compressor;

    public CompressInterface(long blockStamp) {
        out = new ByteBufferBitOutput(8192);
        compressor = new GorillaCompressor(blockStamp, out);
    }

    /**
     * Add new value to the compression stream
     */
    public void put(long timestamp, double value) {
        compressor.addValue(timestamp, value);
        count++;
    }


    /**
     * Read current values from the compression stream
     */
    public Stream<Pair> stream() {
        Stream.Builder<Pair> builder = Stream.builder();
        GorillaDecompressor decompressor = new GorillaDecompressor(new ByteBufferBitInput(out.getByteBuffer()
                .duplicate()));

        for(int i = 0; i < count; i++) {
            builder.accept(decompressor.readPair());
        }
        return builder.build();
    }
}

As for aging the data, you should close() the compressor as soon as the datapoint passes your block mark. And at that point, the underlying buffer is immutable -> no problems with synchronization. You only need this type of wrapper before the block is closed. Unless I misunderstood your last requirement.

@burmanm
Copy link
Owner

burmanm commented Feb 13, 2018

@billoley Did the previous comment solve your issue? Or is there still something you would wish to be changed? I might consider bookkeeping "helpers" for next release if you wish to have them from the library.

@billoley
Copy link
Author

billoley commented Feb 22, 2018 via email

@billoley
Copy link
Author

I'm actually using the LongArrayOutput version of BitOutput. The accessor for the backing array makes a copy before returning it, but it sizes the copy to length + 1. When I use this array in a Decompressor and read the number of entries that I added, I get an ArrayIndexOutOfBoundsException. This does not happen if I close the compressor first.

Considering that the getLongArray() method is already copying the array (and this solution doesn't work without replicating a lot of the code to implement close()), I think that adding a copy constructor would be a good idea.

@burmanm
Copy link
Owner

burmanm commented Mar 13, 2018

Can you create a test that causes that ArrayIndexOutOfBoundsException? If you added n-items and can't read n-items back, then there's a bug somewhere.

@burmanm
Copy link
Owner

burmanm commented Mar 14, 2018

I pushed 2.1.0 version which allows to get the underlying array with getLongArray() and it won't do a copy anymore.

@billoley
Copy link
Author

billoley commented Mar 14, 2018 via email

@burmanm
Copy link
Owner

burmanm commented Mar 14, 2018

Fixed in commit 32807f2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants