Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow and unstable downloads from EBI upstreams increasing pipeline failures #305

Closed
2 tasks done
kltm opened this issue Oct 27, 2022 · 36 comments
Closed
2 tasks done

Comments

@kltm
Copy link
Member

kltm commented Oct 27, 2022

Many downloads are slow to the point of a few megabytes taking hours. Figure out the source/reason and fix. More details to come.

  • reinstate go-ontology-dev
  • Get EBI downloads near their original speed
@kltm
Copy link
Member Author

kltm commented Oct 28, 2022

Going through the logs, we seem to have the following slow downloads:

Trying locally, chebi.owl seems to be fine. goa_uniprot_all.gaf.gz is very very slow locally. Interestingly, for goa_uniprot_all.gaf.gz, switching to FTP gives me very fast (normal) speeds.

Looking through the logs, goa_uniprot_all.gaf.gz seems to have been getting worse since around the 14th; chebi.owl since the 25th.

@kltm
Copy link
Member Author

kltm commented Oct 28, 2022

Testing again, I can get good times for goa_uniprot_all.gaf.gz by switching to http or ftp, just not https.
That would explain why we took a dive since https://github.com/geneontology/go-site/pull/1914/files .

@cmungall @balhoff Putting chebi being slow to the side for a moment, this seems to be a fundamental issue with EBI's https service (as opposed to http or ftp). If you're getting similar results (I've tried "locally" and at LBL), rather than waiting for an upstream resolution, I'd just as soon switch

https://github.com/geneontology/go-ontology/pull/24202/files
https://github.com/geneontology/go-site/pull/1914/files
etc.

over to http so we can get things ticking over again and revisit later.

@kltm
Copy link
Member Author

kltm commented Oct 28, 2022

Basically, for the same file, we'll get different download speeds depending on the schema. For example

https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl ~ 4.5KB/s
http://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl ~ 15.0MB/s
ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl ~ 20.0MB/s

The difference between FTP and HTTP is not a lot and could probably be accounted for in a few different ways, the HTTPS download is remarkably slower.

@kltm
Copy link
Member Author

kltm commented Oct 28, 2022

@alexsign opened a ticket on our behalf and soon after the speed issue seems to have resolved for all the previously slow URLs that I've tested. Thank you!

@kltm kltm closed this as completed Oct 28, 2022
kltm added a commit that referenced this issue Oct 28, 2022
@kltm
Copy link
Member Author

kltm commented Nov 4, 2022

@alexsign Unfortunately, after a short break, the very slow download speeds have started again.

@kltm kltm reopened this Nov 4, 2022
@kltm
Copy link
Member Author

kltm commented Nov 4, 2022

@alexsign Doing some quick testing from a single site:
https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl ~ 424KB/s (~26m)
http://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl ~ 422B/s (~27m)
ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl ~ 9.0MB/s (~70s)
It currently appears that http and https now have the same (slow) download speeds, while ftp is at least an order of magnitude faster.

@kltm kltm changed the title Fix slow downloads on pipeline machine Slow downloads from EBI upstreams increasing pipeline failures Nov 4, 2022
@alexsign
Copy link

@kltm sorry for delay on this. The HTTPS download speed should improve now. Can you please test it and let me know.

@kltm
Copy link
Member Author

kltm commented Nov 11, 2022

@alexsign Cheers! It looks like we just missed the window with out current build; our next full build cycle will start tomorrow at midnight, so we'll get the results over the weekend. I've made a note to check in on them.
Thank you for all of your help on this.

@kltm
Copy link
Member Author

kltm commented Nov 14, 2022

@alexsign Doing some testing locally and seeing the current load status, the numbers are completely consistent with #305 (comment) ; basically, http and https still seem to be extremely slow and ftp is still relatively very very very fast.

@alexsign
Copy link

@kltm I sent your feedback to the EBI infrastructure team. Hopefully, they'll figure it out eventually.

@alexsign
Copy link

@kltm EBI infrastructure team made changes again. Please test when convenient.

@kltm
Copy link
Member Author

kltm commented Nov 16, 2022

@alexsign Previous runtime for downloads: 10h 33min; current runtime: 22min 59s. A huge improvement! Thank you for running this back and forth for us.

@kltm kltm closed this as completed Nov 16, 2022
@kltm
Copy link
Member Author

kltm commented Dec 4, 2022

@alexsign I hate to do this; maybe there is a person I should be contacting directly? Since about the 25th or 26th of November, downloads are slow again.

@kltm kltm reopened this Dec 4, 2022
@kltm
Copy link
Member Author

kltm commented Jan 9, 2023

Looks like things are "fast" again.

@kltm kltm closed this as completed Jan 9, 2023
@kltm
Copy link
Member Author

kltm commented Jan 10, 2023

@alexsign Whoops--my bad. Checking in on this, it is still an ongoing issue, with downloads for our main pipeline runs taking more than ten (10) hours.

@kltm kltm reopened this Jan 10, 2023
@kltm
Copy link
Member Author

kltm commented Mar 9, 2023

Just some current stats:

file thru eta
http://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl.gz 420KB/s ~100s
http://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl 428KB/s ~1600s
ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl.gz 9.0MB/s ~6s
ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl 7.17MB/s ~90s
https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl.gz 432KB/s ~100s
https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl 432KB/s ~1600s

Noting current stats from local machine. I'm not sure about switching over generally to compressed ontology files--that might end up being a bit fiddly or take us to retool with catalogs (i.e. #27 (comment)), but it's worth keeping in mind.

@alexsign
Copy link

@kltm I opened another ticket with EBI infrastructure team. Hopefully, it will be more successful than one before.

@kltm
Copy link
Member Author

kltm commented Mar 10, 2023

@alexsign No worries here--thank you for that :) The table above was just to be a note recent on the current state of play from here (not a poke in any way). We appreciate all of your help in mediating this!

@kltm kltm changed the title Slow downloads from EBI upstreams increasing pipeline failures Slow and unstable downloads from EBI upstreams increasing pipeline failures Apr 18, 2023
@kltm
Copy link
Member Author

kltm commented Apr 18, 2023

@pgaudet We've had the second case in a week where either the EBI server or connection is unstable and we lose a run; like:

00:03:19  Download of https://ftp.ebi.ac.uk/pub/databases/GO/goa/CHICKEN/goa_chicken_isoform.gaf.gz failed: Unable to establish SSL connection.

I think I'm ready to implement a local cache/mirror to support our runs so that we can at least deal with late-stage errors. @cmungall Do you see any problems with this? For the moment it would be a GO-only resource.

@kltm
Copy link
Member Author

kltm commented Apr 18, 2023

Noting that grabbing all EBI files, using FTP, takes:
mkdir -p /tmp/foo && cd /tmp/foo && grep -r "ftp.ebi.ac.uk" ~/local/src/git/go-site/metadata/datasets/ | grep -oh "https:.*" > /tmp/files.txt && sed -e 's/https/ftp/g' /tmp/files.txt > /tmp/files.txt.changed && time wget -i /tmp/files.txt.changed

Total wall clock time: 39m 53s
Downloaded: 83 files, 38G in 37m 32s (17.2 MB/s)

Which is a win over the 12+ hours we're currently at.

The upload from local was real 50m26.029s.

time s3cmd -c ~/SECRET_CRED --mime-type=text/plain put /tmp/foo/* s3://go-mirror

So let's say one and a half hours per mirror attempt.
Pulling from our self-made mirror, we have:

Total wall clock time: 13m 29s
Downloaded: 83 files, 38G in 13m 12s (48.8 MB/s)

This means that we're doing 3x speed over the old EBI FTP and likely have better stability.
I'd propose that we either 1) switch back to the old EBI FTP or 2) start the process of running off our own mirror.
Tagging @cmungall @pgaudet .

@kltm
Copy link
Member Author

kltm commented Apr 18, 2023

I'd also note that we could front this with LBL Cloudflare for free, which might be a fun experiment :)

@kltm
Copy link
Member Author

kltm commented Aug 23, 2023

Currently looking at using pipeline branch goa-copy-to-mirror to interact with mirror.geneontology.io.

@kltm
Copy link
Member Author

kltm commented Aug 23, 2023

Currently running a full mirror build test.

kltm added a commit that referenced this issue Aug 24, 2023
kltm added a commit to geneontology/go-site that referenced this issue Aug 25, 2023
kltm added a commit that referenced this issue Aug 25, 2023
@kltm
Copy link
Member Author

kltm commented Aug 25, 2023

@dustine32 I'm considering a non-breakign schema change for datasets.schema.yaml in the GO metadata, adding mirror_of. Essentially, I'd like to be able to refer to where we are directly getting the data for the pipeline (source) and how we are mirroring mirror_of in the same location. Any thoughts?

@dustine32
Copy link
Contributor

@kltm Yeah, I think I'm a fan of mirror_of. Is this just an extra field initially for information only or does other code need to be changed to use it?

@kltm
Copy link
Member Author

kltm commented Aug 25, 2023

This would be an additional field. Only new code--namely my mirroring code--would need to "see" it, and even then only if they cared about mirrors. Everything else could safely ignore it.

@dustine32
Copy link
Contributor

Ah, yeah, that sounds good to me!

@kltm
Copy link
Member Author

kltm commented Aug 25, 2023

@kltm
Copy link
Member Author

kltm commented Aug 25, 2023

Now testing on main pipeline branches.

@kltm
Copy link
Member Author

kltm commented Aug 25, 2023

Moving to clearing for testing.

@kltm
Copy link
Member Author

kltm commented Aug 28, 2023

Our download phase has gone from 11+h to 15+m. No oddities yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants