Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to read from Oracle S3 buckets #5720

Open
mcolpus opened this issue Jan 28, 2025 · 6 comments
Open

Unable to read from Oracle S3 buckets #5720

mcolpus opened this issue Jan 28, 2025 · 6 comments

Comments

@mcolpus
Copy link

mcolpus commented Jan 28, 2025

Bug report

Expected behavior and actual behavior

With nextflow 23.04.3 I can make a channel from an S3 OCI bucket just fine. But using 23.10.4 or later (I've tested with 24.10.4 as well) it gives an error, seemingly because it tries to use an Amazon endpoint.

Steps to reproduce the problem

main.nf :

workflow {
    Channel
        .fromPath("s3://my-bucket/clean_reads/*fastq.gz")
        .take(1)
        .view()
}

nextflow.config

aws {
    accessKey = "*******************"
    secretKey = "*****************************"
    region = "uk-london-1"
    client {
        endpoint = "https://************.compat.objectstorage.uk-london-1.oraclecloud.com"
        s3PathStyleAccess = true
    }
}

Program output

When it works:

NXF_VER=23.04.3 nextflow run main.nf 

Nextflow 24.10.4 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.04.3
Launching `main.nf` [drunk_curran] DSL2 - revision: b1377beb96
/my-bucket/clean_reads/SAMD00000572.clean_1.fastq.gz

when it fails:

NXF_VER=23.10.4 nextflow run main.nf

Nextflow 24.10.4 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.4
Launching `main.nf` [admiring_yalow] DSL2 - revision: b1377beb96
ERROR ~ Unable to execute HTTP request: s3.uk-london-1.amazonaws.com

 -- Check '.nextflow.log' file for details

which makes it look like it's ignoring the endpoint provided.

Environment

  • Nextflow version: works on 23.04.3, fails on later
  • Java version:
    openjdk 17.0.10 2024-01-16
    OpenJDK Runtime Environment Temurin-17.0.10+7 (build 17.0.10+7)
    OpenJDK 64-Bit Server VM Temurin-17.0.10+7 (build 17.0.10+7, mixed mode, sharing)
  • Operating system: ubuntu 22.04
  • Bash version: (use the command $SHELL --version): GNU bash, version 5.1.16(1)-releas
@mcolpus mcolpus changed the title Stopped working with Oracle S3 buckets Unable to read from Oracle S3 buckets Jan 28, 2025
@bentsherman
Copy link
Member

Possibly related to #4732

I suspect there is a regression from the AWS config refactor that happened between 23.04 and 23.10

@bobturneruk
Copy link

Thanks for this @bentsherman. I think @mcolpus has been trying some workorunds suggested on Slack.

@mcolpus
Copy link
Author

mcolpus commented Feb 6, 2025

I think it is specifically the move from nf-amazon 1.16.2 to 2.0.0 (first introduced with 23.05.0-edge), that's where it stops working.
And 1.16.2 is incompatible with nextflow >=23.05.0

@bentsherman
Copy link
Member

The aws config refactor was merged in 23.05: a74e42d

Likely there is something wrong in that commit around custom endpoints. I don't think we tested S3-compatible storage for that PR so it wouldn't surprise me

@mcolpus
Copy link
Author

mcolpus commented Feb 10, 2025

I've done some digging into nextflow/plugins/nf-amazon/src/main/nextflow/cloud/aws/AwsClientFactory.groovy.

In the function getS3Client think withForceGlobalBucketAccessEnabled actually wants to be false when using a custom endpoint.

I added some very basic logging to AmazonS3Client:

import com.amazonaws.Request
import com.amazonaws.Response
import com.amazonaws.handlers.RequestHandler2
import com.amazonaws.http.HttpResponse;

public class LoggingRequestHandler extends RequestHandler2 {
    @Override
    public void beforeRequest(Request<?> request) {
        System.out.println("----- Request Details -----");
        System.out.println("Endpoint: " + request.getEndpoint());
        System.out.println("HTTP Method: " + request.getHttpMethod());
        System.out.println("Resource Path: " + request.getResourcePath());
        System.out.println("Parameters: " + request.getParameters());
        System.out.println("---------------------------");
    }

    @Override
    public void afterResponse(Request<?> request, Response<?> response) {
        HttpResponse httpResponse = response.getHttpResponse();
        int statusCode = httpResponse.getStatusCode();
        String statusText = httpResponse.getStatusText();

        System.out.println("HTTP Response: ${statusCode} ${statusText}");
        // print response headers and values
        System.out.println("Response Headers: " + httpResponse.getHeaders());
        System.out.println("Response Body: " + httpResponse.getContent());
    }
}

which is then used in the function getS3Client:

builder.withRequestHandlers(new LoggingRequestHandler())

You can then see that the first S3 request is fine, but subsequent ones get adjusted to default amazon:

----- Request Details -----
Endpoint: https://my_namespace.compat.objectstorage.uk-london-1.oraclecloud.com
HTTP Method: HEAD
Resource Path: species-identification-dataset/
Parameters: [:]
---------------------------
HTTP Response: 200 OK
Response Headers: [access-control-allow-credentials:true, access-control-allow-methods:POST,PUT,GET,HEAD,DELETE,OPTIONS, access-control-allow-origin:*, access-control-expose-headers:access-control-allow-credentials,access-control-allow-methods,access-control-allow-origin,content-type,date,opc-client-info,opc-request-id,strict-transport-security,x-amz-bucket-region,x-amz-request-id,x-api-id,x-content-type-options, Content-Type:application/xml, date:Mon, 10 Feb 2025 16:44:02 GMT, opc-request-id:lhr-1:p37-NYIgx..., strict-transport-security:max-age=31536000; includeSubDomains, x-amz-bucket-region:uk-london-1, x-amz-request-id:lhr-1:p37-NYIgx..., x-api-id:s3-compatible, x-content-type-options:nosniff]
Response Body: null
----- Request Details -----
Endpoint: https://s3.uk-london-1.amazonaws.com
HTTP Method: GET
Resource Path: species-identification-dataset/
Parameters: [prefix:[folder/reads], max-keys:[250], encoding-type:[url]]
---------------------------
ERROR ~ Unable to execute HTTP request: s3.uk-london-1.amazonaws.com

 -- Check '.nextflow.log' file for details

Changing to .withForceGlobalBucketAccessEnabled(false) fixes this problem for me.

I gather that withForceGlobalBucketAccessEnabled should mean that if a user specifies the wrong region, then it will automatically detect the correct region and cache the change. I think the http header response x-amz-bucket-region:uk-london-1 might be triggering this change when it shouldn't

@mcolpus
Copy link
Author

mcolpus commented Feb 10, 2025

In S3FileSystemProvider.java function createFileSystem (L828) the variable global is set:

final boolean global = bucketName!=null;

I'm not sure exactly what the reasoning is, but I think we only want AWS to automatically change region is if region is not provided. Something like:

final boolean global = awsConfig.getRegion() == null;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants