-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for response time out issue during build #420
Conversation
…to issue_templates
fixed resnet152 custom handler to work with GPU
* issue template * Moving issue templates to root folder per pull request template * adding feature request template * Minor changes related to sections and names. * Update feature_template.md * Update bug_template.md * doc template * minor changes * Minor changes to upload custom model or handler Co-authored-by: mycpuorg <mail@mycpu.org>
* pylint fixes and code cleanup * UT fixes, pylint updates per latest version and updated sanity script to run pylint * fixed pylint and wrapt installation issue * freeze pylint version to 2.4.4 due to bug in version 2.5 * removed duplicate statement to install pylint Co-authored-by: Aaqib <maaquib@gmail.com>
* updated code to now log errors only in case when exception is not genreated by unregister api call * java formatting * fixed illigalstateexception * removed the unrequired handling for illegal monitor state exception Co-authored-by: dhaniram kshirsagar <dhaniram_kshirsagar@persistent.com> Co-authored-by: Aaqib <maaquib@gmail.com>
* changes for Homebrew issue 98 * adding changes for path in benchmark.py * CI Build corrected * benchmark changes corrected path * changed PATH for homebrew * Update Readme Made changes for homebrew installation and instruction for GPU based machine instruction for install_dependencies.sh * Updated Readme.md removed unnecessory lines * corrected indentation corrected indentation for new lines added * Update Readme.md fil added separate commands for CPU and GPU based Co-authored-by: Ubuntu <ubuntu@ip-172-31-35-42.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-42-124.ec2.internal> Co-authored-by: harshbafna <harsh_bafna@persistent.co.in> Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-67.ec2.internal> Co-authored-by: Harsh Bafna <harshbafna619@gmail.com> Co-authored-by: Aaqib <maaquib@gmail.com>
* Moved plugins SDK to org.pytorch * Updating the repo for publishing plugins SDK * Enabled bintray maven repo temporarily * Reset tests * Added GPG plugin to pom file to sign the plugins SDK * Update the SDK version * Removed bintray repo * Modify the buildspec to not deploy the SDK Co-authored-by: Vamshidhar Dantu <dantu@amazon.com> Co-authored-by: Aaqib <maaquib@gmail.com> Co-authored-by: mycpuorg <mail@mycpu.org>
Co-authored-by: mycpuorg <mail@mycpu.org> Co-authored-by: Brad Heintz <bradheintz@fb.com>
* Snapshot is now independent of model * fixed checkstyle issue * fixed formatting * doc review: configuration and custom_service * addressed review comments * fixed incorrect merge * fixed review comments and corrected the error messages * adding remaining fixes for string constants Co-authored-by: harshbafna <harsh_bafna@persistent.co.in> Co-authored-by: eslesar-aws <eslesar@amazon.com> Co-authored-by: Aaron Markham <markhama@amazon.com> Co-authored-by: Harsh Bafna <harshbafna619@gmail.com>
#285) * Regression Tests Suite - AWS Code Build Infra - Initial Commit for #57 * Script to clone Pytorch Repo & Run Post Man Scripts * Basic Tests for Inference & Managment APIs * AWS Code Build buildspec.yml * Refactor postman tests, add empty postman collection for todos * Add a readme for adding tests * Add more management tests * Fix typo * Snapshot Regression tests - #57 * Fix typo * Comment out snapshot test for now * Weird typos are showing up, try saving everything * Need to manually save every request in postman... * When reregister don't use s3 * More workarounds for rereg * Fix test messages * Remove test for empty reqs * Change get to post * Fix pytest for subprocess - #57 * Enable pytest - #57 * Inference API Testcases / Rename Management Testcases #57 * Wipe Model MARs in Modelstore between tests #54 * Fix Inference Endpoint Execution #54 * Fix file paths - #57 * Fix Initial Workers #57 * Fix Inference Assertions #57 * Fix DenseNet Inference #57 * Fix Densenet Register #54 * Redirect Logs to File #54 * Fix Test Execution Log #54 * README, Build Badge #57 * Format ReadMe #57 * Update README - Add Instruction to add new tests / Run tests locally * Update README * Update branch for build Codebuild Badge Co-authored-by: alexwong <11878166+alexwong@users.noreply.github.com> Co-authored-by: Aaqib <maaquib@gmail.com> Co-authored-by: mycpuorg <mail@mycpu.org> Co-authored-by: Ubuntu <ubuntu@ip-172-31-12-178.us-west-2.compute.internal>
* documentation: fixed issues in installation and quick start in README.md * documentation: Reverting conda related doc changes from PR#286 Fixes #297 * remove minus y (#311) update pip install instructions * Update instructions for installation from source This commit aims to address Issue #312 Tested on a fresh EC2 P3.2 instance: * Followed instructions under https://github.com/pytorch/serve#install-torchserve-for-development * Started the server manually from command line * Ran a ping query Received a response from server with "Healthy" status Signed-off-by: Manoj Rao <mail@mycpu.org> * add separate files for installation one each for linux and macos * update README and remove relative path from script * Install script - Dont uninstall if not previously installed - #132 * explicitly build the source code with gradle prior to install * remove upgrade instructions in favor of clean reinstall from source * update instructions to contain the correct script name Co-authored-by: eslesar-aws <eslesar@amazon.com> Co-authored-by: Aaqib Ansari <maaquib@gmail.com> Co-authored-by: Geeta Chauhan <4461127+chauhang@users.noreply.github.com> Co-authored-by: Aaron Markham <markhama@amazon.com> Co-authored-by: Manoj Rao <mail@mycpu.org> Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-61.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-7-197.us-west-2.compute.internal>
* updated sanity script to use install_from_src_ubuntu script * added java version check * fixed issue with if statement * fixed variable name * fixed repeated cd command * updated artifacts * removed artifacts from buildspec
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Description
We use Java’s inbuilt
destroyForcibly
method to kill the backend worker and as per the following documentation it doesn’t guarantee an immediate forceful killing of the process and may take long time, which is more than the unregister response time out, whenever the CPU is busy/overloaded.As a fix, instead of using the inbuilt
destroyForcibly
method and waiting on the same, we create another process withkill -9 <worker_pid>
and wait on this process to complete.Fixes #417
Also addresses #416
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Successfully executed
install_from_src_ubuntu
onm4.xlarge EC2
instance andinstall_from_src_mac
on local MacBook.Logs from m4.xlarge EC2 instance for executing
install_from_src_ubuntu
script 10 times in a loop forstaging_0_1_1
andissue_417
branchstaging_install.log
issue_417_install.log
Checklist: