Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce GSB endpoint not registered error #1546

Closed
nieznanysprawiciel opened this issue Jul 28, 2021 · 2 comments
Closed

Reproduce GSB endpoint not registered error #1546

nieznanysprawiciel opened this issue Jul 28, 2021 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@nieznanysprawiciel
Copy link
Contributor

golemfactory/yagna-triage#102

What:

  • Setup test network
  • Use mock task with 50 Providers to reproduce the problem
@mfranciszkiewicz
Copy link
Contributor

mfranciszkiewicz commented Aug 10, 2021

Scenarios:

Drone task was executed on a host machine (no Docker), configured with a 30 minute timeout and 1000 jobs.

Yagna (0.7.3) service setup is a combination of the following:

  • network server
    • public
    • private @ 50% of 1 core, 512 MB RAM
    • private @ 2 cores, 1024 MB RAM
    • private uncapped
  • requestor
    • 50% of 1 core, 512 MB RAM
    • 1 core, 1024 MB RAM
    • uncapped
  • providers
    • 50 providers @ 50% of 1 core, 512 MB RAM (some providers fail to start)
    • 36 providers @ 50% of 1 core, 512 MB RAM
    • 20 providers @ 1 core, 1024 MB RAM
    • 20 providers uncapped

Tooling

https://github.com/mfranciszkiewicz/ya-drone-swarm

Environment:

  • Ubuntu 21.04 amd64
  • Docker 20.10.2
  • ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 127768
max locked memory           (kbytes, -l) 4110346
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1048576
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 127768
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

Hardware:

  • Ryzen 5900X (12 cores, 24 logical threads) w/ 32 GB of RAM

The sum of assigned docker container resources exceeded machine resources in some test scenarios.

Findings:

  1. Resource capping the network server and/or the requestor had no impact on the occurrence of endpoint address not found
  2. Resource capped providers (50, 36) were terminating the activity prematurely and thus endpoint address not found occurred frequently.
  • Breaking agreement [5d0e7008ad4628841d5fee5da5dc699adf7b85922502d6dbaf1adc3ff1276e5c], reason: Requestor is unreachable more than 4m
  • activity 0f28c54e350a4ff09960145981406889 inactive for 12s, destroying
  1. In the original issue, only a portion of providers was updated to the latest version.

    No service registered under given address logs exist only in ya-service-bus < 0.4.6

@mfranciszkiewicz
Copy link
Contributor

yapapi and yajsapi will now report whether it's the provider that terminated the activity or a gsb error occurred. These changes were introduced in golemfactory/yapapi#588

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants