Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a soft reference based shared cache for S3 reads #5357

Merged
merged 29 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
82bc5a1
Added a concurrent hash map based shared cache
malhotrashivam Apr 4, 2024
61b70ba
Moved to a soft reference based cache
malhotrashivam Apr 4, 2024
03dd512
Added KeyedLongObjectHashMap based cache
malhotrashivam Apr 8, 2024
c3394ce
Merge branch 'main' into sm-s3-cache
malhotrashivam Apr 23, 2024
c03c7f9
Review comments
malhotrashivam Apr 24, 2024
6fcb50f
Moved from a KeyedIntObjectHashMap to KeyedObjectHashMap
malhotrashivam Apr 24, 2024
e0d4025
Renamed the class
malhotrashivam Apr 24, 2024
a8994c4
Merge branch 'main' into sm-s3-cache
malhotrashivam Apr 24, 2024
0da90b5
Merge branch 'main' into sm-s3-cache
malhotrashivam May 3, 2024
322681b
Review comments
malhotrashivam May 3, 2024
b35db77
Removing unused methods
malhotrashivam May 3, 2024
db09a4a
Improved comments
malhotrashivam May 3, 2024
89d1238
Minor tweaks
malhotrashivam May 3, 2024
ca7c74d
ParquetFileReader will use a single use context
malhotrashivam May 3, 2024
ebbe505
Added debug log messages
malhotrashivam May 3, 2024
9e72124
Merge branch 'main' into sm-s3-cache
malhotrashivam May 8, 2024
f65fbf7
Review comments part 1
malhotrashivam May 10, 2024
8a607a1
Review with Ryan Part 2
malhotrashivam May 10, 2024
364f30f
Review with Ryan Part 3
malhotrashivam May 10, 2024
57b5285
Review with Ryan Part 4
malhotrashivam May 13, 2024
60a4e23
Review with Ryan Part 5
malhotrashivam May 13, 2024
06b0226
Updated the defaults for S3 instructions
malhotrashivam May 13, 2024
b716c67
Renaming some variables
malhotrashivam May 13, 2024
8173578
Removed some unnecessary includes
malhotrashivam May 13, 2024
1a6eaae
Slightly tweaked the caching algorithm
malhotrashivam May 13, 2024
7624442
Python review
malhotrashivam May 13, 2024
20c3245
Review with Ryan contd.
malhotrashivam May 13, 2024
5fb2586
More review comments
malhotrashivam May 14, 2024
c0f4192
Some more review comments
malhotrashivam May 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ public interface MemoizableOperation<T extends DynamicNode & NotificationStepRec
Configuration.getInstance().getIntegerWithDefault("QueryTable.parallelWhereSegments", -1);

/**
* You can chose to enable or disable the column parallel select and update.
* You can choose to enable or disable the column parallel select and update.
*/
static boolean ENABLE_PARALLEL_SELECT_AND_UPDATE =
Configuration.getInstance().getBooleanWithDefault("QueryTable.enableParallelSelectAndUpdate", true);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ private static DeephavenCompressorAdapterFactory createInstance() {
// Use the Parquet LZ4_RAW codec, which internally uses aircompressor
Lz4RawCodec.class, CompressionCodecName.LZ4_RAW,

// The rest of these are aircompressor codecs which have fast / pure java implementations
// The rest of these are aircompressor codecs that have fast / pure java implementations
JdkGzipCodec.class, CompressionCodecName.GZIP,
LzoCodec.class, CompressionCodecName.LZO,
Lz4Codec.class, CompressionCodecName.LZ4,
Expand Down
3 changes: 3 additions & 0 deletions extensions/s3/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ dependencies {
implementation project(':Configuration')
implementation project(':log-factory')

// TODO Included this to get CleanupReferenceProcessorInstance, where should I move CleanupReferenceProcessorInstance?
implementation project(':engine-updategraph')
malhotrashivam marked this conversation as resolved.
Show resolved Hide resolved

implementation platform('software.amazon.awssdk:bom:2.23.19')
implementation 'software.amazon.awssdk:s3'
implementation 'software.amazon.awssdk:aws-crt-client'
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
//
// Copyright (c) 2016-2024 Deephaven Data Labs and Patent Pending
//
package io.deephaven.extensions.s3;

import io.deephaven.hash.KeyedObjectHashMap;
import io.deephaven.hash.KeyedObjectKey;
import org.jetbrains.annotations.NotNull;
import org.jetbrains.annotations.Nullable;
import software.amazon.awssdk.services.s3.S3Uri;
import io.deephaven.extensions.s3.S3ChannelContext.Request;

/**
* This class uses a ({@link KeyedObjectHashMap}) to cache {@link Request} objects based on their URI and fragment
* index.
*/
final class KeyedHashMapBasedRequestCache implements S3RequestCache {

private final KeyedObjectHashMap<Request.ID, Request> requests;

KeyedHashMapBasedRequestCache() {
requests = new KeyedObjectHashMap<>(new KeyedObjectKey.Basic<>() {
@Override
public Request.ID getKey(@NotNull final Request request) {
return request.getId();
}
});
malhotrashivam marked this conversation as resolved.
Show resolved Hide resolved
}

@Override
@Nullable
public Request getRequest(@NotNull final S3Uri uri, final long fragmentIndex) {
final Request request = requests.get(new Request.ID(uri, fragmentIndex));
return request == null ? null : request.acquire();
}

@Override
@NotNull
public Request getOrCreateRequest(@NotNull final S3Uri uri, final long fragmentIndex,
@NotNull final S3ChannelContext context) {
// TODO Do you think the acquiring part should be done by the caller or here?
// I kept it here because acquire could potentially fail and the caller would have to call again in a loop,
// which
// felt unnecessary compared to a guaranteed acquire.
return requests.compute(new Request.ID(uri, fragmentIndex), (key, existingRequest) -> {
if (existingRequest != null) {
final Request acquired = existingRequest.acquire();
if (acquired != null) {
return acquired;
}
}
final Request newRequest = Request.createAndAcquire(fragmentIndex, context);
// TODO Do you think the init part should be done by the context, inside createAndAcquire or here?
// Kept it here for now because caller doesn't know whether request is new or not. So should call init or
// not. Maybe we can make init more idempotent and the context would call it always.
newRequest.init();
return newRequest;
});
}

@Override
public void remove(@NotNull final Request request) {
requests.removeValue(request);
}
}
Loading
Loading