Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Introduce permanent result storage and support performance results comparisons between runs #3227

Merged
merged 69 commits into from
Aug 28, 2024

Conversation

vkarak
Copy link
Contributor

@vkarak vkarak commented Jul 1, 2024

This PR enables performance result comparison natively in ReFrame.

New functionality

The --performance-report is extended to accept an optional specifier that controls how the performance results of the current run will be compared to past results.

A new --performance-compare option is provided that allows ReFrame to compare the performance results of two past periods of time.

The syntax of both options is very similar:

--performance-report=<target_period>/<aggr_spec>/<extra_cols>
--performance-compare=<base_period>/<target_period>/<aggr_spec>/<extra_cols>

Each time period is specified in the following simplified syntax:

<time_period> ::= ("now" | <timestamp>) ":" ("now" | <timestamp>) (("+" | "-") [0-9]+ <unit>)?
<timestamp> ::= ([a-z] | [A-Z] | [0-9])+
<unit> ::= "h" | "m" | "s" | "d" | "w"

Allowed timestamps must follow any of the formats below:

%Y%m%d
%Y%m%dT%H%M
%Y%m%dT%H%M%S
%Y%m%dT%H%M%S%z

A time period can also be specified implicitly through a session UUID.
The session UUID is essentially the start timestamp of the run session in the format %Y%m%dT%H%M%S%z and is always printed at the end of the run session.
The session UUID is a unique identifier assigned automatically from ReFrame.
If you want to use a session UUID as a timestamp you should specify it as ^<session_uuid>.

A session UUID can also be passed specified instead of a timestamp with ^<session_uuid>, in which case the test cases of that particular session will be used.

NOTE: Test cases are also assigned a UUID which is essentially an extension of the session UUID: <session_uuid>:<run_index>:<test_index>. run_index and test_index is the position of the test case in the run/testcases matrix of the session.

An aggregation specifier follows the syntax:

<aggr_spec> ::= <aggr_op> ":" <extra_group_by>
<aggr_op> ::= "first" | "last" | "mean" | "median" | "min" | "max"
<extra_group_by> ::= ("+" <column_name>)*
<column_name> ::= ([a-z] | [A-Z] | [0-9])+

Finally, the extra columns can be specified with the following syntax:

<extra_columns> ::= ("+" <column_name>)*
<column_name> ::= ([a-z] | [A-Z] | [0-9])+

Note that the above EBNF forms are simplified in terms of the accepted column names and/or timestamps.
Note also that the column names accepted are the the same as the log format specifiers but without the the check_ prefix.

Results are always aggregated by test name, performance variable name and performance variable unit and this cannot be changed. User can only provide additional columns to group.

Similarly, the output will contain all the group by columns plus any additional columns specified by the user. For non-aggregated columns, the unique values across all samples will be combined as a single string using the | separator.

Examples

  • Compare the current run with the mean performance of the last 7 days grouping also by system partition and environment and listing also the test result.
--performance=now:now-7d/mean:+system+partition+environment/+result
  • Compare the median performance of two session runs and group by results by nodelist and print also the test result
--performance-compare=2df155f4-a6bc-443a-ab95-2b37b4b1b2e7/^d13d0498-ceee-4df2-971c-8a844af71b8a/median:+job_nodelist/+result

Key design concepts

Upon finishing running the tests and after generating the JSON run report, ReFrame stores the run results in a database. The full JSON report blob is stored and the test cases of the run are indexed by their job_completion_time_unix.

Querying results is purposefully kept simple and is fulfilled by a single API two API calls to fetch all the test cases that have run in a certain period of time or in a specific session. Even querying by a session UUID goes through this path, by first obtaining the time span of the session.

The obtained test cases are then grouped by the specified attributes and their performance values are aggregated using the aggregation function.

Finally, a 2D array with the table data is returned to the caller.

New options for querying session and test case data:

  • --delete-stored-session <uuid>: delete a stored session by its UUID
  • --describe-stored-session <uuid>: return a detailed JSON record about a specific session
  • --describe-stored-testcases <session_uuid>|<testcase_uuid>|<time_period>: return a detailed JSON about a specific test case or about the test cases of the specific session or time period
  • --list-stored-sessions: List brief information about all stored sessions
  • --list-stored-testcases <session_uuid>|<testcase_uuid>|<time_period>: List brief information about a specific test case or about the test cases of the specific session or time period

Implementation details

The database file is placed inside the report file folder and is named results.db and it cannot be changed.

Also the report format is changed thus we bump its data version to 4.0. The check_vars and check_params are flattened and are stored inside the testcase.

Code organization

The existing reporting code is considerably refactored and obviously new code is added. Here is a summary of the structure:

  • reframe.frontend.statistics is removed and the TestStats is deprived of any reporting functionality. Its definition is moved to reframe.frontend.executors and it's only purpose is to hold statistics of the run as ReFrame runs the tests.
  • reframe.frontend.printer gets all the responsibility for formatting and presenting the various reports (failure report, retry report, performance report etc.)
  • reframe.frontend.reporting holds everything related to generating a run report and restoring a session from a run report. It also contains all the analytics functionality described above.
  • reframe.frontend.reporting.storage provides an abstraction to the backend storage for storing run reports. It essentially provides the basic API for querying the test cases that have run in a certain period of time.

Finally, this PR fixes some coding style issues to make flake8 happy again.

Other changes

  • Default format of --timestamp-dirs uses %Y instead of %y which was a typo.
  • Reports are only generated if non-empty.
  • Some general coding style fixes.
  • Fixed buggy behaviour of --timestamp-dirs that did not respect the respective environment variable. This was a limitation to the current implementation of ArgumentParser, so we had to enhance our own version also in the aspect to support the scenario of optional arguments with default values (ie., `nargs='?', const='value').
  • Support for using always unqualified hostnames in local scheduler through a new configuration parameter. The reason for adding this option to enable grouping of performance results by nodelists even when the same test has run through a scheduler (which usually uses an unqualified names) or locally.

Todos

  • Add unit tests
  • Better handling of user errors when invalid comparison spec is passed (currently, a stack trace is printed)
  • Add docs and adapt the tutorial
  • Make optional the creation of JSON report files (now everything is stored in the DB and the session JSON can be easily retrieved with --describe-stored-session).
  • Use separate configuration parameter for controlling the location of the results DB
  • Introduce a DB schema version similarly to the report data version to avoid surprises
  • Support "weeks" for period shortcuts as well
  • Do not generate a report if no test cases were run
  • Remove the ^ from session UUID specs in the command-line options
  • Allow --list-stored-sessions to accept an optional argument
  • Job completion time for compile-only tests is reported wrongly
  • Add unit tests for testcase filtering on test names
  • Expand docs about testcase filtering

Open issues to discuss

  • Add the option to not store the json reports at all as now these are stored in the database? This would require us to add an option to list and show old reports from the command line.
    • This is now option (defaults to generating files)
  • We likely need an option to list quickly the stored sessions by UUID so as to be able to easily select two sessions.
    • See added options above

Closes #2733.

@vkarak vkarak added this to the ReFrame 4.7 milestone Jul 1, 2024
@vkarak vkarak requested review from ekouts, victorusu and teojgo July 1, 2024 16:55
@vkarak vkarak self-assigned this Jul 1, 2024
@pep8speaks
Copy link

pep8speaks commented Jul 1, 2024

Hello @vkarak, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2024-08-28 13:30:22 UTC

@ekouts
Copy link
Contributor

ekouts commented Jul 3, 2024

Still testing the PR and going through it. Just some preliminary comment regarding the open issues for discussion:

  • Add the option to not store the json reports at all as now these are stored in the database? This would require us to add an option to list and show old reports from the command line.
  • We likely need an option to list quickly the stored sessions by UUID so as to be able to easily select two sessions.

I think both are good ideas. The second one in particular is useful to not go through the report manually to find it.

I would like to add some more items to discuss:

  • As far as I can tell, testcases that are generated by --distribute are treated as different tests. I think it would be good to be able to aggregate the performance results of testcases that are run in different nodes (but same partitions).

    Results are always aggregated by test name, performance variable name and performance variable unit and this cannot be changed.

    Since the variable has a "special" name it shouldn't be too hard to be able to ignore this, right?

  • Would it make sense to give the options to users to remove certain UUID from the database? If you have some "invalid" results then the only options is to remove the whole database now or do it outside of reframe. If we don't want to support it through reframe, maybe we can have a small explanation of how to manually "change" the db?

Copy link
Contributor

@ekouts ekouts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general issues:

  • Not sure how we would properly address this but let's say you are comparing two runs and using max as aggregator. What would be the way to find the uuid of the session? Currently you have to manually look in the report, right?
  • Same issue of the --distribute comes up with --repeat.
  • In the performance report now you see only one entry for each test (in case of retries). This is a different behaviour than the past one and I think some people may prefer to see the individual results.
  • We are not printing anymore the title
    =======================
    PERFORMANCE REPORT
    ----------------------------
    
  • I tried running two reframe runs in parallel and the database can get silently corrupted. Maybe not an immediate issue, but good to keep in mind.
    From the first run I got:
    ┍━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━━━━━━━━━┯━━━━━━━━━━┑
    │ name                  │ pvar       │     pval │ punit   │ pdiff   │ job_nodelist    │ result   │
    ┝━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━━━━━━━━━┿━━━━━━━━━━┥
    │ Foorun %$repeat_no=1  │ extract_bw │ 0.175468 │ MB/s    │ -63.21% │ vpn-110.cscs.ch │ fail     │
    ├───────────────────────┼────────────┼──────────┼─────────┼─────────┼─────────────────┼──────────┤
    │ Foorun %$repeat_no=0  │ extract_bw │ 2.33529  │ MB/s    │ -60.05% │ vpn-110.cscs.ch │ fail     │
    ├───────────────────────┼────────────┼──────────┼─────────┼─────────┼─────────────────┼──────────┤
    │ Foorun2 %$repeat_no=1 │ copy_bw    │ 2.33805  │ MB/s    │ -74.89% │ vpn-110.cscs.ch │ fail     │
    ├───────────────────────┼────────────┼──────────┼─────────┼─────────┼─────────────────┼──────────┤
    │ Foorun2 %$repeat_no=0 │ copy_bw    │ 3.36739  │ MB/s    │ -23.22% │ vpn-110.cscs.ch │ fail     │
    ┕━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━━━━━━━━━┷━━━━━━━━━━┙
    Current session stored with UUID: 20240703T161944
    (printed the run_report as well: run-report-601.json)
    
    The second:
    ┍━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━━━━━━━━━┯━━━━━━━━━━┑
    │ name    │ pvar       │     pval │ punit   │ pdiff    │ job_nodelist    │ result   │
    ┝━━━━━━━━━┿━━━━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━━━━━━━━━┿━━━━━━━━━━┥
    │ Foorun2 │ copy_bw    │ 0.387238 │ MB/s    │ -95.21%  │ vpn-110.cscs.ch │ fail     │
    ├─────────┼────────────┼──────────┼─────────┼──────────┼─────────────────┼──────────┤
    │ Foorun  │ extract_bw │ 5.07053  │ MB/s    │ +521.14% │ vpn-110.cscs.ch │ pass     │
    ┕━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━━━━━━━━━┷━━━━━━━━━━┙
    Current session stored with UUID: 20240703T161944
    run report file: run-report-600.json
    
    And then when I run with --performance-compare="^20240703T161944/^20240703T161944/mean:+system+partition+environ/+result" I get:
    ┍━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━━━━━┑
    │ name                 │ pvar       │     pval │ punit   │ pdiff   │ system   │ partition   │ environ   │ result   │
    ┝━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━━━━━┿━━━━━━━━━━━┿━━━━━━━━━━┥
    │ Foorun2              │ copy_bw    │ 0.387238 │ MB/s    │ +0.00%  │ generic  │ default     │ builtin   │ fail     │
    ├──────────────────────┼────────────┼──────────┼─────────┼─────────┼──────────┼─────────────┼───────────┼──────────┤
    │ Foorun               │ extract_bw │ 5.07053  │ MB/s    │ +0.00%  │ generic  │ default     │ builtin   │ pass     │
    ├──────────────────────┼────────────┼──────────┼─────────┼─────────┼──────────┼─────────────┼───────────┼──────────┤
    │ Foorun %$repeat_no=1 │ extract_bw │ 0.175468 │ MB/s    │ +0.00%  │ generic  │ default     │ builtin   │ fail     │
    ┕━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━━━━━┙
    

reframe/frontend/reporting/utility.py Show resolved Hide resolved
reframe/frontend/cli.py Outdated Show resolved Hide resolved
reframe/frontend/cli.py Outdated Show resolved Hide resolved
reframe/frontend/cli.py Show resolved Hide resolved
reframe/frontend/cli.py Outdated Show resolved Hide resolved
reframe/schemas/runreport.json Outdated Show resolved Hide resolved
reframe/schemas/runreport.json Outdated Show resolved Hide resolved
@vkarak
Copy link
Contributor Author

vkarak commented Jul 8, 2024

As far as I can tell, testcases that are generated by --distribute are treated as different tests. I think it would be good to be able to aggregate the performance results of testcases that are run in different nodes (but same partitions).

I think the --distribute option has two problems:

  • The parameters added added by the test generation mechanism alter the test name; the same problem is shared with all other test generation options.
  • For the --distribute option we also modify the base test name adding the partition name to it, which also makes it different from the same test that has run on the same partition and same node.

I think the solution to this naming problem should probably be independent to how we just treat it in the results DB, but I agree that it needs fixing in this context.

Would it make sense to give the options to users to remove certain UUID from the database? If you have some "invalid" results then the only options is to remove the whole database now or do it outside of reframe. If we don't want to support it through reframe, maybe we can have a small explanation of how to manually "change" the db?

I think it makes sense in general.

This is now possible with the --delete-stored-session option.

Not sure how we would properly address this but let's say you are comparing two runs and using max as aggregator. What would be the way to find the uuid of the session? Currently you have to manually look in the report, right?

If you add +session_uuid to the extra columns argument, it should show all the unique session UUIDs that the aggregation came from. The same happens for any column that you select. I have tried this with test attributes but not with the session UUID, so if it doesn't work, I should have a look at it.

In the performance report now you see only one entry for each test (in case of retries). This is a different behaviour than the past one and I think some people may prefer to see the individual results.

You should still be getting now different entries as each repeated test has a different name, which is something we want to change though. But in any case I don't believe we should offer this sort of customization for the performance report at the moment given that the raw performance results are available in the json reports, in the database and the perflogs.

Nonetheless, I believe you can still get the same if you added +pval to the extra columns (haven't tried it).

Maybe we could get this by grouping also by +run_index (again we should try that as I haven't :-) )

Grouping by +runid gives the expected result now.

We are not printing anymore the title

I have fixed that but didn't push yet :-)

Fixed.

I tried running two reframe runs in parallel and the database can get silently corrupted. Maybe not an immediate issue, but good to keep in mind.

Interesting will have a 👀 I thought that sqlite3 was handling multiple processes... Will try to reproduce.

Fixed, we are now using file locks to guard against race conditions.

@ekouts
Copy link
Contributor

ekouts commented Jul 9, 2024

If you add +session_uuid to the extra columns argument, it should show all the unique session UUIDs that the aggregation came from. The same happens for any column that you select. I have tried this with test attributes but not with the session UUID, so if it doesn't work, I should have a look at it.

Maybe we could get this by grouping also by +run_index (again we should try that as I haven't :-) )

Hm I am getting a key error for both +run_index and +session_uuid. If you try let me know.

@vkarak vkarak force-pushed the feat/compare-past-results branch from 7be3dd2 to e9f34ef Compare July 9, 2024 16:32
@vkarak
Copy link
Contributor Author

vkarak commented Jul 9, 2024

Hm I am getting a key error for both +run_index and +session_uuid. If you try let me know.

Indeed, since these refer only to test case attributes at the moment.

@vkarak
Copy link
Contributor Author

vkarak commented Jul 10, 2024

Not sure how we would properly address this but let's say you are comparing two runs and using max as aggregator. What would be the way to find the uuid of the session? Currently you have to manually look in the report, right?

A follow up comment regarding this. Currently, extra columns are aggregated only for the base period and not from the target one. I see it as analyzing the current period and comparing it to a target one, so the extra information should refer to the base one. Users could simply swap the period arguments to --performance-comparison to get this extra information for the target period.

@vkarak vkarak force-pushed the feat/compare-past-results branch from 63285ba to fa078f4 Compare July 11, 2024 06:59
@vkarak
Copy link
Contributor Author

vkarak commented Jul 11, 2024

Hm I am getting a key error for both +run_index and +session_uuid. If you try let me know.

Now this is supported with +run_index and +session_uuid.

@vkarak vkarak changed the title [feat] Support performance results comparisons [feat] Introduce permanent result storage and support performance results comparisons between current and past runs Jul 12, 2024
Copy link
Contributor

@ekouts ekouts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to test it a bit more before we merge, but a couple of small comments.

requirements.txt Outdated Show resolved Hide resolved
reframe/schemas/runreport.json Show resolved Hide resolved
@vkarak vkarak force-pushed the feat/compare-past-results branch from 97e1719 to d2e0844 Compare August 12, 2024 09:36
docs/manpage.rst Outdated Show resolved Hide resolved
@vkarak vkarak force-pushed the feat/compare-past-results branch from c2a35ce to dd37f86 Compare August 23, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Comparing performance reports across different runs
5 participants