Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add router management functions to SDK #155

Merged

Conversation

deadlycoconuts
Copy link
Contributor

@deadlycoconuts deadlycoconuts commented Jan 18, 2022

Context:

Currently the Turing SDK only supports batch experiments by covering only operations related to ensemblers and ensembling jobs, and we are now expanding its scope to cover online experiments by extending the SDK to cover operations involving other Turing components. Previously we've already implemented the SDK to supporting the listing of routers in PR #148, and the setting up of routers in PR #152. This PR introduces 10 additional SDK methods to support the deletion, retrieval, update, deployment, undeployment of routers and router versions, as well as the listing of deployment events involving a router.

Features:

  • New SDK methods have been created for the Router SDK class to support the following methods:
- router deletion
- router retrieval by `router_id`
- router update
- router deployment
- router undeployment
- listing of router versions
- router version deletion
- router version retrieval
- router version deployment
- listing of router deployment events
  • New SDK child classes RouterVersion and RouterVersionLogConfig (from their parent RouterConfig and LogConfig classes introduced in PR Add create router SDK functionality #152) to support the read-only operations that return RouterVersion objects. In particular the SDK RouterVersion class allows users to extract an SDK RouterConfig object (without the read-only elements) lying beneath, which can be reused and manipulated as any other RouterConfig object

Example:

# retrieving a router's version using the SDK get_version method
latest_version = my_router.get_version(10)

# extract the RouterConfig object beneath the returned RouterVersion object
latest_config = latest_version.get_config()

# manipulate the extracted RouterConfig object directly (from PR #152)
latest_config.routes[0].timeout = "50ms"

# reuse these RouterConfig objects with other SDK methods
my_router.update(latest_config)
  • Tests involving all the newly created components
  • A sample script showing how the SDK methods can be used

Fixes

  • Fix the update instance method of the SDK Router class so that it updates all the attributes of the Router instance (monitoring_url, status, endpoint, etc. as opposed to only config), according to the RouterDetails instance returned by Turing API

Modifications

  • sdk/turing/router/config/router_version.py - new RouterVersion class
  • sdk/turing/router/config/log_config.py - addition of a new RouterVersionLogConfig class
  • sdk/turing/router/config/experiment_config.py - minor modifications to support the creation of OpenAPI ExperimentConfig instances while having config set as None
  • sdk/turing/router/config/router_config.py - addition of a new method to convert a RouterConfig's (private) attributes into a dict, so that RouterVersion can use it to instantiate new RouterConfig objects
  • sdk/turing/router/router.py - addition of new SDK methods corresponding to the 10 new functions mentioned above
  • sdk/turing/router/session.py - addition of new methods to support the 10 new functions mentioned above
  • api/api/openapi-sdk.yaml - addition of all the other API endpoints for routers
  • api/api/specs/routers.yaml - minor modifications to several schemas which were not aligned with the actual Turing API output
  • sdk/turing/generated/model/* - autogenerated API classes changed by the OpenAPI generator given the changes made to the OpenAPI specs (~2000 lines)
  • sdk/tests/* - tests for the new methods and classes
  • sdk/samples/router/general.py - sample file to demonstrate the use of the all the SDK methods developed for routers
  • sdk/samples/router/create_from_existing_router.py - sample file to demonstrate how a router config can be retrieved from an existing router to create a new router

@deadlycoconuts deadlycoconuts requested a review from a team January 18, 2022 07:07
@romanwozniak
Copy link
Contributor

@gojek/turing-dev a friendly reminder to review this PR

sdk/samples/router/general.py Outdated Show resolved Hide resolved
sdk/samples/router/general.py Outdated Show resolved Hide resolved
sdk/samples/router/general.py Outdated Show resolved Hide resolved
sdk/turing/router/config/log_config.py Show resolved Hide resolved
sdk/turing/session.py Outdated Show resolved Hide resolved
except TimeoutError:
raise Exception(f"Turing API is taking too long for router {my_router.id} to get undeployed.")

time.sleep(30)
Copy link
Contributor Author

@deadlycoconuts deadlycoconuts Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call of sleep is a little awkward; it's there to somehow prevent this odd case where the deployment of a router fails when it is 1) undeployed and 2) immediately redeployed. Apparently waiting for the status to turn to undeployed is insufficient to prevent the deployment from crashing.

@deadlycoconuts deadlycoconuts force-pushed the add_router_management_functions branch from c278e49 to 41b10f1 Compare January 20, 2022 05:22
@deadlycoconuts deadlycoconuts force-pushed the add_router_management_functions branch from 24ee1b2 to 158ff88 Compare January 21, 2022 02:36
f"deployed.")

# Wait for the dependencies of the first version to be fully undeployed
time.sleep(15)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it for? Can it be done as: ?

my_router.wait_for_version_status(RouterStatus.UNDEPLOYED, first_ver_no)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is also similar to the issue right below; I was kind of trying to test both 'potential sources' of error (either with something undeploying being undeployed completely/properly, or with something being deployed not being deployed completely/properly) in both places.

@romanwozniak
Copy link
Contributor

@terryyylim @ashwinath any more comments from you? Are you comfortable approving this PR? I haven't reviewed it end-to-end, so I rely on you to get this approved

@ashwinath
Copy link
Contributor

ashwinath commented Jan 24, 2022

@gojek/turing-dev a friendly reminder to review this PR

I'll re-review it soon.

Copy link
Contributor

@ashwinath ashwinath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just address @romanwozniak's comments and it should be good to go

@deadlycoconuts
Copy link
Contributor Author

deadlycoconuts commented Jan 24, 2022

Thanks everyone for the really detailed and in-depth review and comments!

@deadlycoconuts deadlycoconuts merged commit 6b67695 into caraml-dev:main Jan 25, 2022
@deadlycoconuts deadlycoconuts deleted the add_router_management_functions branch January 25, 2022 08:18
@deadlycoconuts deadlycoconuts self-assigned this Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants