-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[microTVM] Refactor pytest fixtures #12207
Conversation
cdbdc8d
to
12fcaae
Compare
12fcaae
to
0b6e33a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these changes - especially the work to reduce redundancy between Arduino and Zephyr. However, I'm skeptical about the choice to move our testing infrastructure to a pytest
plugin. I could be convinced, but I'd love to better understand the justification.
"relay.ext.cmsisnn.options": {"mcpu": target.mcpu}, | ||
} | ||
): | ||
mod = cmsisnn.partition_for_cmsisnn(mod, params, mcpu=target.mcpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put this into the ExitStack()
block below - remove likes 98-106 and insert:
mod = cmsisnn.partition_for_cmsisnn(mod, params, mcpu=target.mcpu)
in the if use_cmsiss_nn
block
@@ -153,4 +163,4 @@ def evaluate_model_accuracy(session, aot_executor, input_data, true_labels, runs | |||
num_correct = sum(u == v for u, v in zip(true_labels, predicted_labels)) | |||
average_time = sum(aot_runtimes) / len(aot_runtimes) | |||
accuracy = num_correct / len(predicted_labels) | |||
return average_time, accuracy | |||
return average_time, accuracy, predicted_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing this to return predicted_labels
? There are a few cases where we'll want to override evaluate_model_accuracy
(say, with the MLPerf Tiny model for anomaly detection) but there won't be a 1:1 correspondence of samples to predicted_labels (because anomaly detection uses an area of the curve metric). I'd prefer to keep it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change since we were using the predicted labels outside of this function for validity check in some tests. If you think it's out of the scope of this function, I can replicated the whole function in the other repo. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'd forgotten we need the labels for some hardware in the loop tests. That seems fine - for anomaly detection and other AOC metrics using the same "template", we could use confidence values (or even just None
) in place of predicted_labels
. This LGTM now.
from tvm.contrib.utils import tempdir | ||
|
||
|
||
def zephyr_boards() -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we treating zephyr_boards
specially? I'd rather we use get_boards("zephyr")
in all those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was left over, thanks for catching!
|
||
|
||
def zephyr_boards() -> dict: | ||
"""Returns a dict mapping board to target model""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a function for this purpose - get_supported_boards
in tvm/python/tvm/micro/testing/utils.py
. Let's either move that or use it instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed get_boards
return boards_model | ||
|
||
|
||
def get_boards(platform: str) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment. This should be removed, and get_supported_boards
inside tvm/python/tvm/micro/testing/utils.py
should be used instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed get_boards
) | ||
|
||
|
||
@pytest.fixture(scope="module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather if all the microTVM fixtures lived in tests/micro
, where they have existed previously. I don't think they can be reused very easily - if someone wanted to do additional testing elsewhere, they would't need test-build-only
or similar fixtures.
required=True, | ||
choices=ARDUINO_BOARDS.keys(), | ||
help="Arduino board for tests.", | ||
) | ||
parser.addoption( | ||
"--arduino-cli-cmd", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth abstracting this parameter to "build tool path"? I could be convinced either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this way is more clear, specially when used in tvmc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea you’re right, build tool is just too abstract.
# Since these tests are sequential, we'll use the same project/workspace | ||
# directory for all tests in this file | ||
@pytest.fixture(scope="module") | ||
def workspace_dir(request, board): | ||
def workflow_workspace_dir(request, board): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this change!
tests/micro/zephyr/test_zephyr.py
Outdated
@@ -89,7 +89,7 @@ def _make_add_sess(temp_dir, model, zephyr_board, west_cmd, build_config, dtype= | |||
# The same test code can be executed on both the QEMU simulation and on real hardware. | |||
@tvm.testing.requires_micro | |||
@pytest.mark.skip_boards(["mps2_an521"]) | |||
def test_add_uint(temp_dir, board, west_cmd, tvm_debug): | |||
def test_add_uint(workspace_dir, board, west_cmd, tvm_debug): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this and all the other tests - why do we have to use workspace_dir
? If it's going to be reused every time, it would probably be cleaner to use the Python builtin fixutre temp_dir
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workspace_dir
is a custom path directory where it is in the tvm workspace, so user can easily access them even when they are testing in docker. Also, we have tvm-debug
which we can use to keep the project for debugging purposes. I think it's a useful tool for developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I’d forgotten that workspace_dir is kept when debug is passed. That sounds super useful - good call with this!
return request.config.getoption("--board") | ||
|
||
|
||
@pytest.fixture(scope="module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, any fixtures that are only based on command line arguments should be session scoped IMO. They can't change during the session, so I'd argue it is more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. I changed them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guberti thanks for the review!
I addressed your comments. Regarding moving the pytest plugin into python package, I think they are useful for reusing in Hardware in the loop testing for microtvm. Duplications are always a pain to maintain and this PR tries to remove those duplications. In addition, the pytest features that we included here are mostly generic for microtvm testing in any environment.
@@ -153,4 +163,4 @@ def evaluate_model_accuracy(session, aot_executor, input_data, true_labels, runs | |||
num_correct = sum(u == v for u, v in zip(true_labels, predicted_labels)) | |||
average_time = sum(aot_runtimes) / len(aot_runtimes) | |||
accuracy = num_correct / len(predicted_labels) | |||
return average_time, accuracy | |||
return average_time, accuracy, predicted_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change since we were using the predicted labels outside of this function for validity check in some tests. If you think it's out of the scope of this function, I can replicated the whole function in the other repo. thoughts?
from tvm.contrib.utils import tempdir | ||
|
||
|
||
def zephyr_boards() -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was left over, thanks for catching!
|
||
|
||
def zephyr_boards() -> dict: | ||
"""Returns a dict mapping board to target model""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed get_boards
return boards_model | ||
|
||
|
||
def get_boards(platform: str) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed get_boards
def pytest_addoption(parser): | ||
"""Adds more pytest arguments""" | ||
parser.addoption( | ||
"--board", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added more details
return request.config.getoption("--board") | ||
|
||
|
||
@pytest.fixture(scope="module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. I changed them
required=True, | ||
choices=ARDUINO_BOARDS.keys(), | ||
help="Arduino board for tests.", | ||
) | ||
parser.addoption( | ||
"--arduino-cli-cmd", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this way is more clear, specially when used in tvmc
tests/micro/zephyr/test_zephyr.py
Outdated
@@ -89,7 +89,7 @@ def _make_add_sess(temp_dir, model, zephyr_board, west_cmd, build_config, dtype= | |||
# The same test code can be executed on both the QEMU simulation and on real hardware. | |||
@tvm.testing.requires_micro | |||
@pytest.mark.skip_boards(["mps2_an521"]) | |||
def test_add_uint(temp_dir, board, west_cmd, tvm_debug): | |||
def test_add_uint(workspace_dir, board, west_cmd, tvm_debug): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workspace_dir
is a custom path directory where it is in the tvm workspace, so user can easily access them even when they are testing in docker. Also, we have tvm-debug
which we can use to keep the project for debugging purposes. I think it's a useful tool for developers.
87cd1bb
to
d9c774c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your reasoning for moving to a pytest
plugin makes sense to me - this approach now makes sense to me. LGTM!
required=True, | ||
choices=ARDUINO_BOARDS.keys(), | ||
help="Arduino board for tests.", | ||
) | ||
parser.addoption( | ||
"--arduino-cli-cmd", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea you’re right, build tool is just too abstract.
tests/micro/zephyr/test_zephyr.py
Outdated
@@ -89,7 +89,7 @@ def _make_add_sess(temp_dir, model, zephyr_board, west_cmd, build_config, dtype= | |||
# The same test code can be executed on both the QEMU simulation and on real hardware. | |||
@tvm.testing.requires_micro | |||
@pytest.mark.skip_boards(["mps2_an521"]) | |||
def test_add_uint(temp_dir, board, west_cmd, tvm_debug): | |||
def test_add_uint(workspace_dir, board, west_cmd, tvm_debug): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I’d forgotten that workspace_dir is kept when debug is passed. That sounds super useful - good call with this!
@@ -153,4 +163,4 @@ def evaluate_model_accuracy(session, aot_executor, input_data, true_labels, runs | |||
num_correct = sum(u == v for u, v in zip(true_labels, predicted_labels)) | |||
average_time = sum(aot_runtimes) / len(aot_runtimes) | |||
accuracy = num_correct / len(predicted_labels) | |||
return average_time, accuracy | |||
return average_time, accuracy, predicted_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'd forgotten we need the labels for some hardware in the loop tests. That seems fine - for anomaly detection and other AOC metrics using the same "template", we could use confidence values (or even just None
) in place of predicted_labels
. This LGTM now.
|
||
|
||
@pytest.fixture(scope="session") | ||
def tvm_debug(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't love the name tvm_debug
if all this flag does is keep the project directory - IMO --keep-project-dir
or --preserve-project
makes more sense. If it does things besides this, we should document them in the help
string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it used to do other things, but not anymore. I will change the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me correct myself, it also used in project generation config to show more logs in the build/test process. So I would change the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, left a couple of comments around naming
* Refactor micro test fixtures * fix error * fix scope * address @guberti comments * fix help message * rename tvm_debug and added .gitignore * fix bug * fix bug
Refactoring pytest fixtures between multiple directories. Using this change, we can reuse the same fixtures in external repositories
cc @alanmacd @gromero @guberti