Unable to read from Oracle S3 buckets #5720

mcolpus · 2025-01-28T14:37:25Z

Bug report

Expected behavior and actual behavior

With nextflow 23.04.3 I can make a channel from an S3 OCI bucket just fine. But using 23.10.4 or later (I've tested with 24.10.4 as well) it gives an error, seemingly because it tries to use an Amazon endpoint.

Steps to reproduce the problem

main.nf :

workflow {
    Channel
        .fromPath("s3://my-bucket/clean_reads/*fastq.gz")
        .take(1)
        .view()
}

nextflow.config

aws {
    accessKey = "*******************"
    secretKey = "*****************************"
    region = "uk-london-1"
    client {
        endpoint = "https://************.compat.objectstorage.uk-london-1.oraclecloud.com"
        s3PathStyleAccess = true
    }
}

Program output

When it works:

NXF_VER=23.04.3 nextflow run main.nf 

Nextflow 24.10.4 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.04.3
Launching `main.nf` [drunk_curran] DSL2 - revision: b1377beb96
/my-bucket/clean_reads/SAMD00000572.clean_1.fastq.gz

when it fails:

NXF_VER=23.10.4 nextflow run main.nf

Nextflow 24.10.4 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.4
Launching `main.nf` [admiring_yalow] DSL2 - revision: b1377beb96
ERROR ~ Unable to execute HTTP request: s3.uk-london-1.amazonaws.com

 -- Check '.nextflow.log' file for details

which makes it look like it's ignoring the endpoint provided.

Environment

Nextflow version: works on 23.04.3, fails on later
Java version:
openjdk 17.0.10 2024-01-16
OpenJDK Runtime Environment Temurin-17.0.10+7 (build 17.0.10+7)
OpenJDK 64-Bit Server VM Temurin-17.0.10+7 (build 17.0.10+7, mixed mode, sharing)
Operating system: ubuntu 22.04
Bash version: (use the command $SHELL --version): GNU bash, version 5.1.16(1)-releas

The text was updated successfully, but these errors were encountered:

bentsherman · 2025-02-06T02:10:46Z

Possibly related to #4732

I suspect there is a regression from the AWS config refactor that happened between 23.04 and 23.10

bobturneruk · 2025-02-06T09:17:16Z

Thanks for this @bentsherman. I think @mcolpus has been trying some workorunds suggested on Slack.

mcolpus · 2025-02-06T12:13:35Z

I think it is specifically the move from nf-amazon 1.16.2 to 2.0.0 (first introduced with 23.05.0-edge), that's where it stops working.
And 1.16.2 is incompatible with nextflow >=23.05.0

bentsherman · 2025-02-06T14:11:25Z

The aws config refactor was merged in 23.05: a74e42d

Likely there is something wrong in that commit around custom endpoints. I don't think we tested S3-compatible storage for that PR so it wouldn't surprise me

mcolpus · 2025-02-10T16:50:47Z

I've done some digging into nextflow/plugins/nf-amazon/src/main/nextflow/cloud/aws/AwsClientFactory.groovy.

In the function getS3Client think withForceGlobalBucketAccessEnabled actually wants to be false when using a custom endpoint.

I added some very basic logging to AmazonS3Client:

import com.amazonaws.Request
import com.amazonaws.Response
import com.amazonaws.handlers.RequestHandler2
import com.amazonaws.http.HttpResponse;

public class LoggingRequestHandler extends RequestHandler2 {
    @Override
    public void beforeRequest(Request<?> request) {
        System.out.println("----- Request Details -----");
        System.out.println("Endpoint: " + request.getEndpoint());
        System.out.println("HTTP Method: " + request.getHttpMethod());
        System.out.println("Resource Path: " + request.getResourcePath());
        System.out.println("Parameters: " + request.getParameters());
        System.out.println("---------------------------");
    }

    @Override
    public void afterResponse(Request<?> request, Response<?> response) {
        HttpResponse httpResponse = response.getHttpResponse();
        int statusCode = httpResponse.getStatusCode();
        String statusText = httpResponse.getStatusText();

        System.out.println("HTTP Response: ${statusCode} ${statusText}");
        // print response headers and values
        System.out.println("Response Headers: " + httpResponse.getHeaders());
        System.out.println("Response Body: " + httpResponse.getContent());
    }
}

which is then used in the function getS3Client:

builder.withRequestHandlers(new LoggingRequestHandler())

You can then see that the first S3 request is fine, but subsequent ones get adjusted to default amazon:

----- Request Details -----
Endpoint: https://my_namespace.compat.objectstorage.uk-london-1.oraclecloud.com
HTTP Method: HEAD
Resource Path: species-identification-dataset/
Parameters: [:]
---------------------------
HTTP Response: 200 OK
Response Headers: [access-control-allow-credentials:true, access-control-allow-methods:POST,PUT,GET,HEAD,DELETE,OPTIONS, access-control-allow-origin:*, access-control-expose-headers:access-control-allow-credentials,access-control-allow-methods,access-control-allow-origin,content-type,date,opc-client-info,opc-request-id,strict-transport-security,x-amz-bucket-region,x-amz-request-id,x-api-id,x-content-type-options, Content-Type:application/xml, date:Mon, 10 Feb 2025 16:44:02 GMT, opc-request-id:lhr-1:p37-NYIgx..., strict-transport-security:max-age=31536000; includeSubDomains, x-amz-bucket-region:uk-london-1, x-amz-request-id:lhr-1:p37-NYIgx..., x-api-id:s3-compatible, x-content-type-options:nosniff]
Response Body: null
----- Request Details -----
Endpoint: https://s3.uk-london-1.amazonaws.com
HTTP Method: GET
Resource Path: species-identification-dataset/
Parameters: [prefix:[folder/reads], max-keys:[250], encoding-type:[url]]
---------------------------
ERROR ~ Unable to execute HTTP request: s3.uk-london-1.amazonaws.com

 -- Check '.nextflow.log' file for details

Changing to .withForceGlobalBucketAccessEnabled(false) fixes this problem for me.

I gather that withForceGlobalBucketAccessEnabled should mean that if a user specifies the wrong region, then it will automatically detect the correct region and cache the change. I think the http header response x-amz-bucket-region:uk-london-1 might be triggering this change when it shouldn't

mcolpus · 2025-02-10T17:39:00Z

In S3FileSystemProvider.java function createFileSystem (L828) the variable global is set:

final boolean global = bucketName!=null;

I'm not sure exactly what the reasoning is, but I think we only want AWS to automatically change region is if region is not provided. Something like:

final boolean global = awsConfig.getRegion() == null;

mcolpus changed the title ~~Stopped working with Oracle S3 buckets~~ Unable to read from Oracle S3 buckets Jan 28, 2025

bentsherman added the storage/aws label Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to read from Oracle S3 buckets #5720

Unable to read from Oracle S3 buckets #5720

mcolpus commented Jan 28, 2025

bentsherman commented Feb 6, 2025

bobturneruk commented Feb 6, 2025

mcolpus commented Feb 6, 2025

bentsherman commented Feb 6, 2025

mcolpus commented Feb 10, 2025

mcolpus commented Feb 10, 2025

Unable to read from Oracle S3 buckets #5720

Unable to read from Oracle S3 buckets #5720

Comments

mcolpus commented Jan 28, 2025

Bug report

Expected behavior and actual behavior

Steps to reproduce the problem

Program output

Environment

bentsherman commented Feb 6, 2025

bobturneruk commented Feb 6, 2025

mcolpus commented Feb 6, 2025

bentsherman commented Feb 6, 2025

mcolpus commented Feb 10, 2025

mcolpus commented Feb 10, 2025