Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CheckUrlExists work on Node18 #157

Merged
merged 30 commits into from
Aug 25, 2023
Merged

Conversation

coverbeck
Copy link
Contributor

@coverbeck coverbeck commented Aug 23, 2023

Description
Instead of using node-libcurl, use node-ftp and follow-redirects (which uses http and https) to check if urls exist.

Why it broke

  • The node-libcurl version we were using did not run on Node 18
  • After upgrading to the latest, Node 18 compatible version of node-libcurl, the lambda did not run, because node-libcurl has a native dependency on a version of libstdc++6 that is not deployed in the AWS lambda environment.

The fix
One option was to use a lambda layer to provide the native dependencies. It's a bit tricky, and I created a support case with AWS to figure it out. See JIRA issue for details. I ultimately decided against that approach because:

  1. It was hard :). We'd have to set up a layer, host the 4MB zip file, etc.
  2. I'm not entirely comfortable updating all those core native libraries, especially as AWS upgrades their runtime; we could eventually fall behind their upgrades, there could be mismatches...
  3. The layer, unless we did 2 layers, wouldn't run on ARM CPUs, e.g., my Mac, so I wouldn't be able to test the lambda locally.

So, I just got rid of node-libcurl, and I'm using other node libraries. The cons:

  1. No support for protocols other than http, https, and ftp (but are there any?).
  2. I don't know how curl does a HEAD for FTP; I think it's FTP size, but the behavior could be slightly different.

Issue
SEAB-5759

Security
If there are any concerns that require extra attention from the security team, highlight them here.

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.

@coverbeck coverbeck self-assigned this Aug 23, 2023
if (parsedUrl.port) {
options = { port: parsedUrl.port, ...options };
}
await ftpClient.access(options);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this or any other await fails, it will throw an exception, which will be caught on line 46/36 (note to self and anybody else who isn't familiar with JS promises and await.

checkUrlExists/lambda/index.js Outdated Show resolved Hide resolved
}
await ftpClient.access(options);
const size = await ftpClient.size(parsedUrl.path);
return size > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it plausible that a valid file will be empty? If so, >= here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was thinking an empty file would never be valid from a workflow input perspective. But I could see it both ways. I'm going to leave it as is for now, but if you or anybody feels strongly, I can change it.

@denis-yuen
Copy link
Member

  1. http, https, and ftp (but are there any?).

Thought we did s3 and gcp, but it looks like at least s3 is converted to https?

https://docs.dockstore.org/en/stable/faq.html?highlight=open%20data#what-is-an-open-data-tool-or-workflow-version
dockstore/dockstore#5397

Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check on the story with s3 urls

@@ -7,10 +7,10 @@
"author": "Dockstore",
"license": "MIT",
"dependencies": {
"node-libcurl": "^2.3.4"
"basic-ftp": "^5.0.3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, basic-ftp has more dependents and more downloads, so this is probably a good move in terms of maintainability
https://www.npmjs.com/package/basic-ftp?activeTab=readme
https://www.npmjs.com/package/node-libcurl

@@ -35,6 +35,10 @@ describe("Tests index", function () {
"ftp://ftp.this.is.fake.or.private.1000genomes.ebi.ac.uk/vol1/ftp/CHANGELOG";
await setupTest(url, false);
});
it("handles an unknown protocol", async () => {
const url = "s3://bucket/objectkey"; // We don't natively support s3 URIs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the other hand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the other hand

node-libcurl doesn't support s3 either. We've been getting around this in the web service by converting s3 and gs urls to https urls before invoking the lambda.

Per node-libcurl npm home page, we are losing support for the DICT, FILE, Gopher, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP protocols, but I don't think any workflows have test parameter files with those protocols

@coverbeck
Copy link
Contributor Author

  1. http, https, and ftp (but are there any?).

Thought we did s3 and gcp, but it looks like at least s3 is converted to https?

https://docs.dockstore.org/en/stable/faq.html?highlight=open%20data#what-is-an-open-data-tool-or-workflow-version dockstore/dockstore#5397

Saw this comment after your inline comment... Yes we convert s3 and gs URIs to https URLs in the web service. In hindsight, maybe we should have done the conversion here.

@coverbeck coverbeck merged commit 1acafc2 into develop Aug 25, 2023
7 checks passed
@coverbeck coverbeck deleted the feature/seab-5759/curl branch August 25, 2023 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants