Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too many threads generated till -su: fork: retry: Resource temporarily unavailable #1552

Merged
merged 10 commits into from
Apr 8, 2022

Conversation

maaquib
Copy link
Collaborator

@maaquib maaquib commented Apr 7, 2022

Description

When inferencing, system generate lots of java threads till the system "-su: fork: retry: Resource temporarily unavailable"

Fixes #1511

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing

Verified that only netty threads are added to the epollEventLoopGroup. Number of java threads stays constant at 85 for all further inferences

> git clone https://github.com/pytorch/serve
> cd serve/examples/Workflows/dog_breed_classification
> mkdir model_store wf_store
> curl -O https://torchserve.pytorch.org/mar_files/cat_dog_classification.mar
> curl -O https://torchserve.pytorch.org/mar_files/dog_breed_classification.mar
> mv *.mar model_store/
> torch-workflow-archiver -f --workflow-name dog_breed_wf --spec-file workflow_dog_breed_classification.yaml --handler workflow_dog_breed_classification_handler.py --export-path wf_store/
> torchserve --start --model-store model_store/ --workflow-store wf_store/ --ncs

> curl -X POST "http://127.0.0.1:8081/workflows?url=dog_breed_wf.war"
> ps -efT | cat | grep wf_store | grep -v grep | wc -l
52
> curl https://raw.githubusercontent.com/udacity/dog-project/master/images/Labrador_retriever_06457.jpg -o Dog1.jpg
> for i in {1..100}; do curl -s http://127.0.0.1:8080/wfpredict/dog_breed_wf -T Dog1.jpg > /dev/null; done
> ps -efT | cat | grep wf_store | grep -v grep | wc -l
85
> for i in {1..100}; do curl -s http://127.0.0.1:8080/wfpredict/dog_breed_wf -T Dog1.jpg > /dev/null; done
> ps -efT | cat | grep wf_store | grep -v grep | wc -l
85
> for i in {1..100}; do curl -s http://127.0.0.1:8080/wfpredict/dog_breed_wf -T Dog1.jpg > /dev/null; done
> ps -efT | cat | grep wf_store | grep -v grep | wc -l
85

Checklist:

  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@maaquib maaquib requested review from msaroufim and lxning April 7, 2022 21:53
@maaquib maaquib self-assigned this Apr 7, 2022
@maaquib maaquib added this to the v0.6.0 milestone Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@maaquib maaquib force-pushed the issue_1511 branch 2 times, most recently from 110abd2 to 09e34e6 Compare April 7, 2022 22:32
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@maaquib maaquib force-pushed the issue_1511 branch 2 times, most recently from ed4296d to eb98e56 Compare April 7, 2022 22:43
@maaquib maaquib closed this Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@pytorch pytorch deleted a comment from sagemaker-neo-ci-bot Apr 7, 2022
@maaquib maaquib reopened this Apr 7, 2022
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 2d2defd
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: ed4296d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: eb98e56
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 1d17475
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 1d17475
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 1d17475
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 926e342
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 926e342
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 926e342
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: f134ea6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: f134ea6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: f134ea6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: f458fcf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: f458fcf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: f458fcf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 7e0a436
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 7e0a436
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 7e0a436
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@lxning lxning merged commit 2b98375 into pytorch:master Apr 8, 2022
@maaquib maaquib deleted the issue_1511 branch April 8, 2022 23:06
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 7e0a436
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 7e0a436
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

too many threads generated till -su: fork: retry: Resource temporarily unavailable
4 participants