Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling up runs into throttling/blocks from www.ebi.ac.uk #174

Open
moskalenko opened this issue Mar 31, 2023 · 2 comments
Open

Scaling up runs into throttling/blocks from www.ebi.ac.uk #174

moskalenko opened this issue Mar 31, 2023 · 2 comments

Comments

@moskalenko
Copy link

Hi. I have a client who's trying to run a few hundred ExpansionHunter analyses at the same time. Unfortunately, all ExpansionHunter analyses beyond a relatively small set stop because of hung http requests to www.ebi.ac.uk. Stracing a single test job showed the request below. The user is using a local GRCh38_full_analysis_set_plus_decoy_hla.fa reference file, but the path is different from what's coming up in the strace, which is confusing. We're not sure where "/gpfs/internal/sweng/production/Resources/GRCh38_1000genomes/GRCh38_full_analysis_set_plus_decoy_hla.fa" read attempt is coming from in ExpansionHunter, but the www.ebi.ac.uk hit seems to be done by EH because of a missing reference. If there's a known workaround for preventing a storm out outgoing requests to www.ebi.ac.uk please let me know. I'd be happy to host whatever reference data is needed locally. Alternatively, if there's a way to force ExpansionHunter to skip the ids with no local reference it would work, too.

Thanks,

Alex

stat("/gpfs/internal/sweng/production/Resources/GRCh38_1000genomes/GRCh38_full_analysis_set_plus_decoy_hla.fa", 0x7ffe7a067ff0) = -1 ENOENT (No such file or directory)
write(2, "Failed to populate reference for id 2387\n", 41) = 41
stat("/home/jdoe/.cache/hts-ref/88/49/c9f185b5ae8ed6d60d3b99c6591c", 0x7ffe7a06c120) = -1 ENOENT (No such file or directory)
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=93, ...}) = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 2802
fstat(2802, {st_mode=S_IFREG|0644, st_size=329, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b73ce1aa000
read(2802, "# HEADER: This file was autogenerated at 2022-12-22 06:48:12 -0500\n# HEADER: by puppet. While it can still be managed manually, it\n# HEADER:
is definitely not recommended.\n127.0.0.1\tlocalhost.localdo"..., 4096) = 329
read(2802, "", 4096) = 0
close(2802) = 0
munmap(0x2b73ce1aa000, 4096) = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 2802
setsockopt(2802, SOL_IP, IP_RECVERR, [1], 4) = 0
connect(2802, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.207.246")}, 16) = 0
poll([{fd=2802, events=POLLOUT}], 1, 0) = 1 ([{fd=2802, revents=POLLOUT}])
sendmmsg(2802, [{msg_hdr={msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\325\1\0\0\1\0\0\0\0\0\0\3www\3ebi\2ac\2uk\0\0\1\0\1", iov_len=31}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=31}, {msg_hdr={msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="h\334\1\0\0\1\0\0\0\0\0\0\3www\3ebi\2ac\2uk\0\0\34\0\1", iov_len=31}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=31}], 2, MSG_NOSIGNAL) = 2 poll([{fd=2802, events=POLLIN}], 1, 5000) = 1 ([{fd=2802, revents=POLLIN}]) ioctl(2802, FIONREAD, [136]) = 0 recvfrom(2802, "h\334\201\200\0\1\0\1\0\1\0\0\3www\3ebi\2ac\2uk\0\0\34\0\1\300\f\0\5\0\1\0\0\0\3\0\10\3www\1g\300\20\300/\0\6\0\1\0\0\1\4\0I\7ns-1300\tawsdns-34\3org\0\21awsdns-hostmaster\6amazon\3com\0\0\0\0\1\0\0\34 \0\0\3\204\0\22u\0\0\1Q\200", 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.207.246")}, [28->16]) = 136 poll([{fd=2802, events=POLLIN}], 1, 4999) = 1 ([{fd=2802, revents=POLLIN}]) ioctl(2802, FIONREAD, [381]) = 0 recvfrom(2802, "\325\201\200\0\1\0\2\0\4\0\10\3www\3ebi\2ac\2uk\0\0\1\0\1\300\f\0\5\0\1\0\0\0\3\0\10\3www\1g\300\20\300+\0\1\0\1\0\0\0D\0\4\301>\301P\300/\0\2\0\1\0\0\36\236\0\27\7ns-1300\tawsdns-34\3org\0\300/\0\2\0\1\0\0\36\236\0\26\6ns-434\tawsdns-54\3com\0\300/\0\2\0\1\0\0\36\236\0\27\7ns-1592\tawsdns-07\2co\300\27\300/\0\2\0\1\0\0\36\236\0\26\6ns-953\tawsdns-55"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.207.246")}, [28->16]) = 381
close(2802) = 0

@dennishendriksen
Copy link

Hello @moskalenko, could it be that ExpansionHunter uses samtools under the hood and you are running into the same issue as HKU-BAL/Clair3#180? In that case setting the environment variable REF_PATH=: might prevent the requests. Context: https://www.htslib.org/doc/samtools.html#REFERENCE_SEQUENCES

@moskalenko
Copy link
Author

Hello @moskalenko, could it be that ExpansionHunter uses samtools under the hood and you are running into the same issue as HKU-BAL/Clair3#180? In that case setting the environment variable REF_PATH=: might prevent the requests. Context: https://www.htslib.org/doc/samtools.html#REFERENCE_SEQUENCES

You are right! The samtools was not available in the expansionhunter environment. I've added it and will ask the user to run a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants