-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
too many threads generated till -su: fork: retry: Resource temporarily unavailable #1511
Milestone
Comments
any update? |
Can reproduce using the dog-cat classification workflow example
From heap dump
|
maaquib
added a commit
to maaquib/serve
that referenced
this issue
Mar 23, 2022
when will be released? |
As soon as the PR is merged it needs a day to be added in nightly builds https://pypi.org/project/torchserve-nightly/ For an official release will probably add this in 0.6, still discussing an exact date with the team |
maaquib
added a commit
to maaquib/serve
that referenced
this issue
Apr 6, 2022
maaquib
added a commit
to maaquib/serve
that referenced
this issue
Apr 6, 2022
maaquib
added a commit
to maaquib/serve
that referenced
this issue
Apr 6, 2022
maaquib
added a commit
to maaquib/serve
that referenced
this issue
Apr 6, 2022
maaquib
added a commit
to maaquib/serve
that referenced
this issue
Apr 6, 2022
maaquib
added a commit
to maaquib/serve
that referenced
this issue
Apr 6, 2022
7 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please have a look at FAQ's and Troubleshooting guide, your query may be already addressed.
Your issue may already be reported!
Please search on the issue tracker before creating one.
Context
Your Environment
[If public url then provide link.]:
Expected Behavior
I'm using workflow of torchserve in docker. when inferencing, system generate lots of threads till the system "-su: fork: retry: Resource temporarily unavailable"
Current Behavior
Possible Solution
Steps to Reproduce
3, inferencing, there's about 6000 cases to be inferenced.
Failure Logs [if any]
inference/torchserver# 2022-03-14T15:03:32,999 [ERROR] pool-3-thread-2 org.pytorch.serve.metrics.MetricCollector -
java.io.IOException: Cannot run program "/usr/bin/python3" (in directory "/usr/local/lib/python3.6/dist-packages"): error=11, Resource temporarily unavailable
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1128) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1071) ~[?:?]
at java.lang.Runtime.exec(Runtime.java:592) ~[?:?]
at org.pytorch.serve.metrics.MetricCollector.run(MetricCollector.java:42) ~[model-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.io.IOException: error=11, Resource temporarily unavailable
at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?]
at java.lang.ProcessImpl.(ProcessImpl.java:340) ~[?:?]
at java.lang.ProcessImpl.start(ProcessImpl.java:271) ~[?:?]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1107) ~[?:?]
... 9 more
[15806.571s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 136k, guardsize: 0k, detached.
The text was updated successfully, but these errors were encountered: