Improvements in error handling on end-to-end tests #2716

narrieta · 2022-12-15T23:47:44Z

Implemented a few improvements in error handling; I'm pointing them out with comments within the PR

codecov · 2022-12-16T20:44:09Z

Codecov Report

Merging #2716 (f0df49e) into develop (d56a3c7) will increase coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #2716      +/-   ##
===========================================
+ Coverage    71.95%   71.96%   +0.01%     
===========================================
  Files          104      104              
  Lines        15765    15765              
  Branches      2244     2244              
===========================================
+ Hits         11343    11345       +2     
- Misses        3906     3909       +3     
+ Partials       516      511       -5

Impacted Files	Coverage Δ
azurelinuxagent/common/cgroupconfigurator.py	`72.24% <0.00%> (+0.31%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

maddieford · 2022-12-19T18:04:19Z

tests_e2e/orchestrator/lib/agent_test_suite.py

-        # The same orchestrator machine may be executing multiple suites on the same test VM, or
-        # the same suite on one or more test VMs; we use this file to mark the build is already done
-        build_done_path = self._working_directory/"build.done"
-        if build_done_path.exists():


Did we remove this case because we're no longer executing multiple suites on the same VM?

Not quite, I explained the motivation in the PR comment just a couple of lines above.

narrieta · 2022-12-15T23:53:15Z

tests_e2e/orchestrator/lib/agent_test_suite.py

@@ -14,9 +14,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-
+from collections.abc import Callable


The main change in this module is that I stopped using before_case and after_case for setup and cleanup.

The issue is that errors in before_case are not reported as failures, they simply make LISA skip test execution. Tests can be skipped for valid reasons (e.g. an unsupported OS), but skipping tests on setup failures without reporting an error is not correct, since those errors may go unnoticed on daily runs.

I replaced the use of before_case and after_case with custom code (mainly in the execute() method).

narrieta · 2022-12-15T23:56:59Z

tests_e2e/orchestrator/lib/agent_test_suite.py

 )
 # E0401: Unable to import 'lisa.sut_orchestrator.azure.common' (import-error)
 from lisa.sut_orchestrator.azure.common import get_node_context  # pylint: disable=E0401

 from azurelinuxagent.common.version import AGENT_VERSION


-class AgentTestSuite(TestSuite):
+class AgentTestScenario(object):


This class used to be an specialization of LISA's TestSuite in order to use the before_case/after_case protocol, but since I am not using anymore now there is no need for that class relationship. Now this is a plain AgentTestScenario class (it is very similar in concept to a DCR scenario) .

I renamed the class, but will rename the py file until my next PR, otherwise reviewing the actual code changes would be more difficult.

narrieta · 2022-12-15T23:57:47Z

tests_e2e/orchestrator/lib/agent_test_suite.py


-        self._log.info(f"Test Node: {self._vm_name}")


This logic is now in the _setup() method

narrieta · 2022-12-15T23:58:01Z

tests_e2e/orchestrator/lib/agent_test_suite.py

-            self._clean_up()
-
-    def _initialize(self, node: Node, log: Logger) -> None:
-        self._node = node


this is now done in init()

narrieta · 2022-12-16T00:05:16Z

tests_e2e/orchestrator/lib/agent_test_suite.py

@@ -112,18 +108,10 @@ def _build_agent_package(self) -> Path:
        """
        Builds the agent package and returns the path to the package.
        """
-        build_path = self._working_directory/"build"
-
-        # The same orchestrator machine may be executing multiple suites on the same test VM, or


I removed the "*.done" files (they actually never worked, since I am removing the working directory on cleanup).

In the current DCR automation we use a new test VM to run 1 single scenario. I want to be able to run multiple scenarios on the same test VM in the new automation pipeline and the intention of those "*.done" files was to avoid duplicated setup. However, not only the implementation was incorrect, but that approach was cumbersome during test development. I will come up with a better approach when I start combining scenarios.

narrieta · 2022-12-16T00:09:40Z

tests_e2e/orchestrator/lib/agent_test_suite.py


-    def _execute_script_on_node(self, script_path: Path, parameters: str = "", sudo: bool = False) -> int:
+    def execute_script_on_node(self, script_path: Path, parameters: str = "", sudo: bool = False) -> int:
+        """


Made it public, added a docstring and did minor formatting improvements in its logging

narrieta · 2022-12-16T00:11:01Z

tests_e2e/orchestrator/scripts/collect-logs

@@ -15,5 +15,11 @@ tar --exclude='journal/*' --exclude='omsbundle' --exclude='omsagent' --exclude='
    /var/lib/waagent/ \
    /etc/waagent.conf

+# tar exits with 1 on warnings; ignore those


We are hitting a warning about the logs changing while we are creating the tarball. It's just a warning, but tar exits with code 1 (fatal errors are 2, see the man page for details)

narrieta · 2022-12-16T00:12:28Z

tests_e2e/orchestrator/scripts/install-agent

@@ -59,7 +74,7 @@ echo "Restarting service..."
 service $service_name stop

 # Rename the previous log to ensure the new log starts with the agent we just installed
-mv /var/log/waagent.log /var/log/waagent.pre-install.log
+mv /var/log/waagent.log /var/log/waagent."$(date --iso-8601=seconds)".log


Added a timestamp to the previous log. Useful if the scenario is run multiple times on the same VM (e.g. during test development) or if multiple scenarios run on the same VM (in the future).

narrieta · 2022-12-16T00:13:34Z

tests_e2e/scenarios/runbooks/samples/hello_world.py

@@ -0,0 +1,31 @@
+# Microsoft Azure Linux Agent


Added a "sample" LISA test suite that runs on the local machine. Useful to do quick tests/experiments with LISA

narrieta · 2022-12-19T20:24:19Z

tests_e2e/orchestrator/lib/agent_test_suite.py

-        # The same orchestrator machine may be executing multiple suites on the same test VM, or
-        # the same suite on one or more test VMs; we use this file to mark the build is already done
-        build_done_path = self._working_directory/"build.done"
-        if build_done_path.exists():


Not quite, I explained the motivation in the PR comment just a couple of lines above.

nagworld9 · 2022-12-20T20:03:02Z

tests_e2e/orchestrator/lib/agent_test_suite.py

+                rmtree(self._context.working_directory.as_posix())
+            except Exception as exception:
+                self._log.warning(f"Failed to remove the working directory: {exception}")
+        self._context.working_directory.mkdir()


so, continue the setup if we failed to remove? or mkdir raise the exception if already exist?

continue with warning. this is mainly for the dev scenario. the automation run works on fresh vms

I was reading mkkdir() raise exception if directory exists. So, in dev scenario it won't work if failed to remove

My point is we could raise the exception if we failed to remove because anyway in next step mkdir throws exception if folder exist.

yes, it will fail and there will be a warning with the info about the delete failure

the dev will get an error, see the warning in the log and fix the issue

My point is we could raise the exception if we failed to remove because anyway in next step mkdir throws exception if folder exist.

Same difference, with the possibility that the working directory goes away in-between the 2 calls and there is no error

Improvements in error handling on end-to-end tests

3ed992c

narrieta requested review from ZhidongPeng, nagworld9 and maddieford as code owners December 15, 2022 23:47

narrieta added 2 commits December 15, 2022 16:14

Undo changes to daily.yml

60040e3

pylint

6b40fe2

Merge branch 'develop' into error-handling

f0df49e

maddieford approved these changes Dec 19, 2022

View reviewed changes

narrieta commented Dec 19, 2022

View reviewed changes

nagworld9 reviewed Dec 20, 2022

View reviewed changes

nagworld9 approved these changes Dec 20, 2022

View reviewed changes

narrieta merged commit a3a41bd into Azure:develop Dec 20, 2022

narrieta deleted the error-handling branch December 20, 2022 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements in error handling on end-to-end tests #2716

Improvements in error handling on end-to-end tests #2716

narrieta commented Dec 15, 2022

codecov bot commented Dec 16, 2022 •

edited

Loading

maddieford Dec 19, 2022

narrieta Dec 19, 2022

narrieta Dec 15, 2022

narrieta Dec 15, 2022

narrieta Dec 15, 2022

narrieta Dec 15, 2022

narrieta Dec 16, 2022

narrieta Dec 16, 2022

narrieta Dec 16, 2022

narrieta Dec 16, 2022

narrieta Dec 16, 2022

narrieta Dec 19, 2022

nagworld9 Dec 20, 2022

narrieta Dec 20, 2022

nagworld9 Dec 20, 2022 •

edited

Loading

nagworld9 Dec 20, 2022

narrieta Dec 20, 2022

narrieta Dec 20, 2022

Improvements in error handling on end-to-end tests #2716

Improvements in error handling on end-to-end tests #2716

Conversation

narrieta commented Dec 15, 2022

codecov bot commented Dec 16, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nagworld9 Dec 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 16, 2022 •

edited

Loading

nagworld9 Dec 20, 2022 •

edited

Loading