-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow and unstable downloads from EBI upstreams increasing pipeline failures #305
Comments
Going through the logs, we seem to have the following slow downloads:
Trying locally, chebi.owl seems to be fine. goa_uniprot_all.gaf.gz is very very slow locally. Interestingly, for goa_uniprot_all.gaf.gz, switching to FTP gives me very fast (normal) speeds. Looking through the logs, goa_uniprot_all.gaf.gz seems to have been getting worse since around the 14th; chebi.owl since the 25th. |
Testing again, I can get good times for goa_uniprot_all.gaf.gz by switching to http or ftp, just not https. @cmungall @balhoff Putting chebi being slow to the side for a moment, this seems to be a fundamental issue with EBI's https service (as opposed to http or ftp). If you're getting similar results (I've tried "locally" and at LBL), rather than waiting for an upstream resolution, I'd just as soon switch https://github.com/geneontology/go-ontology/pull/24202/files over to http so we can get things ticking over again and revisit later. |
Basically, for the same file, we'll get different download speeds depending on the schema. For example https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.owl ~ 4.5KB/s The difference between FTP and HTTP is not a lot and could probably be accounted for in a few different ways, the HTTPS download is remarkably slower. |
@alexsign opened a ticket on our behalf and soon after the speed issue seems to have resolved for all the previously slow URLs that I've tested. Thank you! |
@alexsign Unfortunately, after a short break, the very slow download speeds have started again. |
@alexsign Doing some quick testing from a single site: |
@kltm sorry for delay on this. The HTTPS download speed should improve now. Can you please test it and let me know. |
@alexsign Cheers! It looks like we just missed the window with out current build; our next full build cycle will start tomorrow at midnight, so we'll get the results over the weekend. I've made a note to check in on them. |
@alexsign Doing some testing locally and seeing the current load status, the numbers are completely consistent with #305 (comment) ; basically, http and https still seem to be extremely slow and ftp is still relatively very very very fast. |
@kltm I sent your feedback to the EBI infrastructure team. Hopefully, they'll figure it out eventually. |
@kltm EBI infrastructure team made changes again. Please test when convenient. |
@alexsign Previous runtime for downloads: 10h 33min; current runtime: 22min 59s. A huge improvement! Thank you for running this back and forth for us. |
@alexsign I hate to do this; maybe there is a person I should be contacting directly? Since about the 25th or 26th of November, downloads are slow again. |
Looks like things are "fast" again. |
@alexsign Whoops--my bad. Checking in on this, it is still an ongoing issue, with downloads for our main pipeline runs taking more than ten (10) hours. |
Just some current stats:
Noting current stats from local machine. I'm not sure about switching over generally to compressed ontology files--that might end up being a bit fiddly or take us to retool with catalogs (i.e. #27 (comment)), but it's worth keeping in mind. |
@kltm I opened another ticket with EBI infrastructure team. Hopefully, it will be more successful than one before. |
@alexsign No worries here--thank you for that :) The table above was just to be a note recent on the current state of play from here (not a poke in any way). We appreciate all of your help in mediating this! |
@pgaudet We've had the second case in a week where either the EBI server or connection is unstable and we lose a run; like:
I think I'm ready to implement a local cache/mirror to support our runs so that we can at least deal with late-stage errors. @cmungall Do you see any problems with this? For the moment it would be a GO-only resource. |
Noting that grabbing all EBI files, using FTP, takes:
Which is a win over the 12+ hours we're currently at. The upload from local was
So let's say one and a half hours per mirror attempt.
This means that we're doing 3x speed over the old EBI FTP and likely have better stability. |
I'd also note that we could front this with LBL Cloudflare for free, which might be a fun experiment :) |
Currently looking at using pipeline branch |
Currently running a full mirror build test. |
@dustine32 I'm considering a non-breakign schema change for |
@kltm Yeah, I think I'm a fan of |
This would be an additional field. Only new code--namely my mirroring code--would need to "see" it, and even then only if they cared about mirrors. Everything else could safely ignore it. |
Ah, yeah, that sounds good to me! |
Now testing on main pipeline branches. |
Moving to |
Our download phase has gone from 11+h to 15+m. No oddities yet. |
Many downloads are slow to the point of a few megabytes taking hours. Figure out the source/reason and fix. More details to come.
go-ontology-dev
The text was updated successfully, but these errors were encountered: