Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension sequencing scenario #2933

Closed
wants to merge 24 commits into from
Closed

Conversation

maddieford
Copy link
Contributor

Description

This PR adds the extension sequencing scenario which requires VMSS. It includes changes to the combinator and agent test suite to support running tests on scale set instances.

Issue #


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

@@ -91,3 +91,6 @@ ENV/
# pyenv
.python-version
.vscode/

# resource group txt created for ext_sequencing scenario in e2e pipeline
resource_groups_to_delete.txt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combinator creates a resource group and VMSS for the extension sequencing scenario and writes the resource group name to this file. Later, a task in the pipeline runs to cleanup the resources listed in this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we need to add the file to .gitignore? Probably you were running the tests with cwd within the source code directory. I usually set cwd to a temporary directory outside the source code.

@@ -275,138 +278,147 @@ def _clean_up(self) -> None:
Cleans up any leftovers from the test suite run. Currently just an empty placeholder for future use.
"""

def _setup_node(self, install_test_agent: bool) -> None:
def _setup_node(self, install_test_agent: bool, nodes: List[Dict[str, Any]]) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_setup_node, _collect_node_logs, and _check_agent_log were all updated to take in a list of nodes which indicate which nodes the setup/log actions should be taken.

In the case of extension sequencing scenario, the nodes list represents all the instances of the VMSS. Setup and log collection/checking will happen on each instance. In the case of any other extension, the nodes list will only contain one VM.

@codecov
Copy link

codecov bot commented Sep 27, 2023

Codecov Report

Merging #2933 (40cbd59) into develop (19b970b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #2933   +/-   ##
========================================
  Coverage    72.02%   72.02%           
========================================
  Files          106      106           
  Lines        16063    16063           
  Branches      2304     2304           
========================================
  Hits         11569    11569           
  Misses        3962     3962           
  Partials       532      532           

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combinator was updated to create a VMSS for the extension sequencing scenario. We choose one instance of the VMSS to run the extension sequencing scenario on. We report a test failure if VMSS creation fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template to create virtual machine scale set

"orchestrationMode": "Flexible",
"virtualMachineProfile": {
"extensionProfile": {
"extensions": [
Copy link
Contributor Author

@maddieford maddieford Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This template includes extensions and dependencies which will be added to the scale set at creation.

This is different from the current DCR scenario. The current DCR scenario creates the VMSS, and then repeatedly updates the extensions/dependencies on the scale set in the test to validate the order in which the extensions were enabled. This can be done because the DCR scenario creates a VMSS with 'Uniform' orchestration mode. With Uniform orchestration mode you can update the scale set model and each instance in the scale set will be updated with the new model (including extensions).

We cannot use 'Uniform' orchestration mode for the VMSS in this scenario. Lisa requires that we use 'Flexible' orchestration mode, because while setting up the test environment, LISA looks for the VM in the resource group. In flexible orchestration mode, a VM for each instance is created in the resource group, but for uniform mode there are no VM resources created for the instances.

As a result, this test will look a little different from DCR. We must add the extensions/dependencies at VMSS creation so that each instance will get the extensions/dependencies. We can only validate once, because we cannot update the VM instance extension dependencies after VMSS creation.

Pavan mentioned Flexible VMSS will support UpgradePolicy in the next 1-2 semesters. So I added a TODO to improve this scenario when that happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added job in pipeline to cleanup resources created by the combinator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clients to do operations on ResourceGroups and VirtualMachineScaleSetClients in azure python sdk

try:
vmss.delete_extension(ext_to_remove)
except HttpResponseError as e:
if "(BadRequest)" in e.message and "provisionAfterExtensions" in e.message:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you try to remove an extension with dependencies CRP reports an error which we catch here

{
  "innererror": {
    "internalErrorCode": "MissingDependentExtensionInArgumentDependentExtensionExistsInVMSSInstance"
  },
  "code": "BadRequest",
  "message": "On resource 'lisawalinuxagentextseq20230923164508e0', extension 'RunCommand' specifies 'VmAccess' in its provisionAfterExtensions property, but the extension 'VmAccess' will no longer exist. First, remove the extension 'RunCommand' or remove 'VmAccess' from the provisionAfterExtensions property of 'RunCommand'."
}

name: "ExtSequencing"
tests:
- "ext_sequencing/ext_sequencing.py"
images: "random(endorsed)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original DCR scenario runs on all endorsed images.

I'm choosing to run the scenario on one random endorsed image because VMSS creation is time consuming and test execution cannot start until the combinator is done creating the VMSS resources and environments.

@@ -91,3 +91,6 @@ ENV/
# pyenv
.python-version
.vscode/

# resource group txt created for ext_sequencing scenario in e2e pipeline
resource_groups_to_delete.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we need to add the file to .gitignore? Probably you were running the tests with cwd within the source code directory. I usually set cwd to a temporary directory outside the source code.

@@ -142,6 +161,7 @@ def create_environment_list(self) -> List[Dict[str, Any]]:
Note that if the runbook provides an 'image', 'location', or 'vm_size', those values override any values provided in the
configuration of the test suites.
"""
log: logging.Logger = logging.getLogger("lisa")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create self._log and initialize in init() instead of doing this on every method that needs to log?

}

def create_environment_ext_sequencing(c_env_name: str) -> Dict[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the scale set is just generic and it does not have anything specific to extension sequencing? Let's remove all the references to "ext_sequencing" and use "scale_set" instead. Maybe "create_scale_set_environment"?

@@ -76,13 +95,13 @@ def __init__(self, runbook: AgentTestSuitesCombinatorSchema) -> None:
if self.runbook.vm_name != '' and (self.runbook.image != '' or self.runbook.vm_size != ''):
raise Exception("Invalid runbook parameters: When 'vm_name' is specified, 'image' and 'vm_size' should not be specified.")

self._created_rg_count = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add type annotations to locals as well

env_name = f"{image_name}-{suite_info.name}"
# The ExtSequencing scenario needs to be run on an instance of a scale set, which is not currently
# supported by Lisa. Creation of test resources for this scenario will be handled by the combinator.
if suite_info.name == "ExtSequencing":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding this to a specific test suite, let's add a new property to the test suite yml to indicate it runs on a scale set.

#
sudo find "$BUILD_SOURCESDIRECTORY" -exec chown "$USER" {} \;

az login --service-principal --username "$AZURE_CLIENT_ID" --password "$AZURE_CLIENT_SECRET" --tenant "$AZURE_TENANT_ID"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to login? you are already passing the authentication info in the environment


az login --service-principal --username "$AZURE_CLIENT_ID" --password "$AZURE_CLIENT_SECRET" --tenant "$AZURE_TENANT_ID"
az account set --subscription "$SUBSCRIPTION_ID"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to check KEEP_ENVIRONMENT before deleting the scale set

def create_environment_ext_sequencing(c_env_name: str) -> Dict[str, Any]:
log.info("Creating VMSS for ExtSequencing scenario")
curr_datetime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
rg_name = f"lisa-WALinuxAgent-extseq-{curr_datetime}-e{self._created_rg_count}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use the 'name' variable from the runbook instead of hardcoding "WALinuxAgent"

@@ -0,0 +1,20 @@
#!/usr/bin/env bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to move the cleanup code to somewhere in LISA, otherwise developer runs are not cleaned up

@maddieford
Copy link
Contributor Author

VMSS support was added in this PR: #2954

Extension sequencing scenario will be added in a different PR

@maddieford maddieford closed this Oct 31, 2023
@maddieford maddieford deleted the ext_seq branch November 7, 2023 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants