All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
- Updated commons-compress from
1.21
to1.26.0
, CVE-2024-26308
- Added jettison as dependency for HDFS-connector fat jar
- Added SECURITY.md
- Updated OCI Java SDK version to
3.39.0
- Updated org.bouncycastle from
jdk15on
tojdk15to18
- Added read retry policy to InputStream
- Updated OCI Java SDK version to
3.34.0
- Updated com.fasterxml.jackson.core:jackson-databind to
2.16.0
- Updated org.apache.avro.avro to
1.11.3
- Use org.apache.commons.commons-text
2.16.0
- Shaded io.netty packages
- Fixed external kinit OID issue and refactored SpnegoGenerator
- Fixed contract test failure AbstractContractSeekTest.testReadFullyZeroByteFile
- Fixed contract test failure AbstractContractSeekTest.testSeekReadClosedFile
- Added custom authenticator which uses OKE Workload Identity authentication
- Improved logs in InputStreams
- Added support to Kerberos authentication with SPNEGO token
- Added observability feature that generates metrics pertaining to operations conducted through the connector, such as reading, writing, and deleting data.
- Add multi-region support for the same configuration. This feature can be enabled by setting
fs.oci.client.multiregion.enabled
to true. Once enabled, the user has the option to append a region code or ID after the namespace in theoci://<bucket>@<namespace>.<region>/file
format. This action will result in the creation of a dedicated BmcFilesystem instance for the specified region. Your applications can then make use of different URIs to create BmcFilesystem instances, each directed towards distinct endpoints.
- Removed unnecessary HeadObject requests on directory objects for getFileStatus
- Enhanced object creation process by eliminating redundant ListObjects requests
- Fixed the problem that resulted in the premature termination of reading a single byte from an object using BmcParallelReadAheadFSInputStream
- Added support for namespace-prefixed domains in the Object Storage service
- Updated OCI Java SDK version to
3.17.1
- Updated
guava
version from30.1-jre
to32.0.1-jre
- Replaced LinkedBlockingQueue with SynchronousQueue to hand off tasks to the executor
- Added relocation for shaded package
javax.servlet
- Fixed the issue that caused object loss when performing a renaming operation, in case the target already existed
- Added support for parallel ranged GET requests in read-ahead mode
- Optimized recursive calls for list files and delete path
- Optimized implementation for
FileSystem.getContentSummary
- Replaced multipart request with PutObject for small object writes in
BmcMultipartOutputStream
- Added support for OCI Java SDK 3.x
- Added support for parameterized and realm-specific endpoint templates
- Added support for calculating part MD5s on a separate executor in BmcMultipartOutputStream
- Updated OCI Java SDK version to
3.12.1
- Updated
json-smart
from2.4.7
to2.4.9
- Fixed race when concurrently creating objects with BmcMultipartOutputStream
- Fixed the destination dir containing
$
character for renameDir
- Updated
io.netty:netty-codec
from version4.1.77.Final
to4.1.86.Final
- Fixed the
createFileStatus
to use thetimeModified
instead oftimeCreated
. This bug caused theLocatedFileStatus
to have wrongmodification_time
when objects are overwritten in OCI Object Storage. - Fixed the multipart upload default configuration to use the correct upload size of 128 MiB
- Updated OCI Java SDK version to
2.47.0
- Updated
com.fasterxml.woodstox:woodstox-core
from version6.2.3
to6.4.0
- Updated
com.fasterxml.jackson.core:jackson-databind
from version2.12.6.1
to2.13.4.2
- Added support for delegation token to HDFS connector. This feature can be enabled by setting the property
fs.oci.delegation.token.filepath
to the path of file having the delegation token.
- Updated to Hadoop version 3.3.4
- Updated to OCI Java SDK version 2.38.0
- Fixed multipart upload to use the correct upload size
- Fixed BmcFilesystem cache to correctly manage the items
- Fixed NullPointerException when default filesystem is oci
- Fixed IOException in ReadAheadFileInputStream to read all bytes in a parquet file
- Fixed OCI HDFS connector in read-ahead mode doesn't emit bytesRead input metric #77
- Fixed HDFS connector issue for not being able to use smart parquet add-on
- Fixed oci-hdfs jar file to not contain class files from
hadoop-common
andhadoop-hdfs
- Added support for specifying custom read stream class. This feature can be enabled by setting the property
fs.oci.io.read.custom.stream
to the name of the custom read stream class. - Added support for specifying custom write stream class. his feature can be enabled by setting the property
fs.oci.io.write.custom.stream
to the name of the custom write stream class. - Added support for smart parquet add-on feature
- Fixed non-daemon threads preventing JVM shutdown when
fs.oci.rename.operation.numthreads
was set to use a single thread for renaming.
- Fixed ArrayIndexOutOfBoundsException in read-ahead mode #69
- Added support for Resource Principals Authentication v2.2
- Added relocation for shaded packages
org.objectweb
- Added support for caching
BmcFilesystem
instances. This feature can be enabled by settingfs.oci.caching.filesystem.enabled
totrue. If enabled, the properties
fs.oci.caching.filesystem.maxsize.countand
fs.oci.caching.filesystem.initialcapacity.countcontrol the size of the cache, while either
fs.oci.caching.filesystem.expireafteraccess.secondsor
fs.oci.caching.filesystem.expireafterwrite.secondscontrol the expiration. If a
BmcFilesystem` instance exists in the cache with the same URI and configuration, the instance will be re-used, leading to performance improvements at the cost of the cache's memory footprint.
- Updated
log4j
dependencies to version2.17.1
to address CVE-2021-44832
- Updated
log4j
dependencies to version2.17.0
to address CVE-2021-45105
- Updated
log4j
dependencies to version2.16.0
to address CVE-2021-45046
- Updated to OCI Java SDK version 2.11.1
- Updated
log4j
dependencies to version2.15.0
to address CVE-2021-44228 - Removed dependencies
log4j-core
,log4j-slf4j-impl
,log4j-1.2-api
by default . To include these dependencies, runmvn
with option-Duse-slf4j-log4j
- Updated to OCI Java SDK version 2.7.1
- Updated Jetty version to
9.4.44
from11.0.6
(minimum requirement of Java 11) to add back support for older versions of Java
- Added support for disabling auto-close of object streams that are obtained through
getObject
operation. This can be disabled by settingfs.oci.object.autoclose.inputstream
tofalse
. If disabled, the streams obtained throughgetObject
that are completely read will not be closed automatically. If not specified, this option will be enabled by default.
- Updated to Hadoop version 3.3.1
- Added support for multi-part upload streaming using a finite-sized in-memory buffer. This mode can be enabled by setting
fs.oci.io.write.multipart.inmemory
totrue
. If enabled,fs.oci.client.multipart.numthreads
should be set to the number of parallel threads (greater than 1).fs.oci.io.write.multipart.inmemory
cannot be enabled at the same time asfs.oci.io.write.inmemory
.fs.oci.io.write.multipart.overwrite
controls whether objects are allowed to be overwritten (default isfalse
).fs.oci.io.write.multipart.inmemory.tasktimeout.seconds
sets the number of seconds before giving up for upload tasks that cannot start because the maximum number of parallel threads has been reached (default is900
, meaning 15 minutes).
- Added support for changing to Jersey default
HttpUrlConnectorProvider
for sending HTTP requests using thefs.oci.client.jersey.default.connector.enabled
configuration key - Added support for changing the maximum number of connections in the connection pool when using the Apache Connector for sending HTTP requests using the
fs.oci.client.apache.max.connection.pool.size
configuration key - Added support for changing the connection closing strategy when using the Apache Connector for sending HTTP requests using the
fs.oci.client.apache.connection.closing.strategy
configuration key - Added support for parallel
renameDirectory
operation and support for changing the number of threads when performing therenameDirectory
operation using thefs.oci.rename.operation.numthreads
configuration key - Added support for Resource Principals Authentication
- Added support for more ways to build object storage endpoint by using the
fs.oci.client.regionCodeOrId
configuration key and by using instance metadata
- Updated to OCI Java SDK version 2.0.0
- Usage of Jersey's
ApacheConnectorProvider
by default for sending HTTP requests - Performance issues due to upgrade to Jersey's
ApacheConnectorProvider
by default. For changing back to Jersey defaultHttpUrlConnectorProvider
and other performance enhancements, look intoBmcProperties
com.oracle.bmc.hdfs.store.BmcReadAheadFSInputStream read()
now correctly identifies EOF
- Added payload caching using the
fs.oci.caching.object.payload.enabled
key.- This is an on-disk cache that stores payloads in the directory indicated by
fs.oci.caching.object.payload.directory
. - The cache size can be configured using
fs.oci.caching.object.payload.maxweight.bytes
orfs.oci.caching.object.payload.maxsize.count
(mutually exclusive). - The property
fs.oci.caching.object.payload.initialcapacity.count
controls the initial cache size. - The cache's eviction policy is controlled using
fs.oci.caching.object.payload.expireafteraccess.seconds
orfs.oci.caching.object.payload.expireafterwrite.seconds
(mutually exclusive). - A consistency policy can be set using
fs.oci.caching.object.payload.consistencypolicy.class
. The default iscom.oracle.bmc.hdfs.caching.StrongConsistencyPolicy
, which ensures that the cache is consistent with objects in Object Storage (if an object is changed in Object Storage after it was cached, the cached item will be evicted, and the object will be loaded from Object Storage). If you know your data is immutable, you can set this property tocom.oracle.bmc.hdfs.caching.NoOpConsistencyPolicy
, which does not check for consistency, therefore reducing the number of requests by a factor of two. - It is important to (a) close streams read from HDFS, (b) read the streams to their end, or (c) allow those streams to be garbage-collected by the Java runtime. Otherwise, cached items may remain on disk even after they have been evicted.
- This is an on-disk cache that stores payloads in the directory indicated by
- Updated transitive Jetty dependencies to 9.4.40.v20210413 to address CVE-2021-28165.
- Updated to OCI Java SDK version 1.35.0
- Added metadata caching using the
fs.oci.caching.object.metadata.enabled
andfs.oci.caching.object.metadata.spec
configuration keys. Note that there is no check for consistency, and if your data in Object Storage changes, the cache may return outdated data. Therefore, it is most appropriate when your data is read-only and does not change. Use caution when applying these settings. - Added read-ahead and parquet caching. The read-ahead feature is configured using
fs.oci.io.read.ahead
andfs.oci.io.read.ahead.blocksize
. Parquet caching, which requiresfs.oci.io.read.ahead=true
, is controlled usingfs.oci.caching.object.parquet.enabled
andfs.oci.caching.object.parquet.spec
. Note that there is no check for consistency, and if your data in Object Storage changes, the cache may return outdated data. Therefore, it is most appropriate when your data is read-only and does not change. Use caution when applying these settings. - Added Jersey client logging, configured using
fs.oci.client.jersey.logging.enabled
,fs.oci.client.jersey.logging.level
, andfs.oci.client.jersey.logging.verbosity
.
- Updated to Hadoop version 3.3.0
- Updated to OCI Java SDK version 1.33.1
- Updated to OCI Java SDK version 1.25.2
- Fixed a potential data curruption problem with
RefreshableOnNotAuthenticatedProvider
. We recommend that you update to this version 3.2.1.3 or later. For details, see #35
- Updated to OCI Java SDK version 1.23.1
- Updated to Hadoop version 3.2.1
- Updated to OCI Java SDK version 1.22.1
- Release incorporates
hdfs-full
module.
- Updated to OCI Java SDK version 1.22.0
- Updated to OCI Java SDK version 1.17.5
- Updated to OCI Java SDK version 1.17.0
- Added DelayStrategy that resets the exponential backoff between retries after reaching a maximum time, configuratble using
fs.oraclebmc.client.retry.reset.threshold.seconds
- Updated to Java SDK version 1.14.0
- Updated to Java SDK version 1.6.2
- Updated version number to stem from Hadoop version 2.9.2
- Fix race condition in
BmcFileBackedOutputStream#createBufferFile
- Support for retries upon failures. Retry timeout is configurable via
fs.oci.client.retry.timeout.seconds
- Updated to Java SDK version 1.5.12
- BmcDirectFSInputStream#read now attempts to retry the read from the service when an IOException is thrown
- Updated to Java SDK version 1.4.2
- Added relocation for shaded packages
javax.annotation
,javax.validation
andjavax.inject
- Updated version number to stem from Hadoop version 2.7.7
- Updated to latest Java SDK (1.2.49) to leverage the updated Object Storage UploadManager with HTTP proxy support
- The configuration option of
MULTIPART_MIN_PART_SIZE_IN_MB
is now deprecated in favor ofMULTIPART_PART_SIZE_IN_MB
to correspond with the configuration changes for the UploadManager in the java SDK - Bouncy castle and JSR-305 jars are no longer bundled within the distribution jar and now must be included in the Hadoop CLASSPATH. Required third party jars are bundled under the
third-party/lib
folder of the distribution zip archive
- Support for configuring an HTTP proxy. More information can be found here
- Disabled caching of stale key id and private key in the
InstancePrincipalsCustomAuthenticator
class
- Updated to latest Java SDK (1.2.42) to pick up bug fixes
- Enabled progress reporting to Application Master during upload operation
- Enabled usage in a Hadoop deployment with kerberos
- Updated to latest Java SDK (1.2.41) to pick up bug fixes
- Added build instruction and fixed broken GitHub links in README
- Updated version number to stem from Hadoop version 2.7.2
- Release to GitHub
- Support instance principals authentication
- Replaced copy+delete rename operation with renameObject to improve performance
- Fetching the private key password now uses 'getPassword' from the Configuration instead of getting the string in plaintext
- Added ability to override configuration based on bucket and namespace being accessed
- Maven packages renamed from "oracle-bmc-" to "oci-"" (group id renamed from "com.oracle.bmc.sdk" to "com.oracle.oci.sdk")
- Renamed configuration properties (from "oraclebmc" to "oci"); old properties are deprecated (see "Deprecated" below).
- Renamed HDFS scheme (from "oraclebmc" to "oci"); old scheme is deprecated (see "Deprecated" below).
- HTTP user agent changed from "Oracle-BMC_HDFS_Connector/" to "Oracle-HDFS_Connector/"
- The old configuration properties ("oraclebmc") are deprecated; please use ("oci") instead. The old properties still work for backward compatibility, as long as the corresponding new property isn't set at the same time.
- The old HDFS scheme ("oraclebmc") is deprecated; please use "oci" instead. The old scheme still works for backward compatibility.
- Updated to latest Java SDK (1.2.5) to pick up change for request id truncation (to fix multipart uploads)
- Changed properties and constants to allow for more useful documentation
- Updated maven shade plugin to non-snapshot version
- Internal changes for how properties are loaded
- Support to use multi-part uploads when saving files
- Configuration options to tune multi-part upload behavior (or disable it)
- Bug in directory listing resulting in duplicate directories
- Concurrency issue when creating directory placeholders
- Improved "list directory" performance for large directories
- Using correct Date header for object creation time
- Bug with seek operation
- Updated to Oracle Cloud Infrastructure Java SDK 1.2.0
- Shading a few more dependencies (h2k)
- Doc updates
- Abstract Filesystem to support usagage within Yarn and Spark
- Updated to Oracle BMCS Java SDK 1.1.0 to pick up bug fixes
- License/copyright headers added to all source files as part of the build
- Now relocating shaded packages for Bouncycastle, Apache Commons, Glassfish
- Updated to Oracle Cloud Infrastructure Java SDK 1.0.1 to pick up bug fixes
- Including MD5 validation during copy operations
- Initial Release
- Support added for Hadoop 2.7.2 using Oracle Cloud Infrastructure Services Java SDK 1.0.0