Improve tests performance #1027

MattiSG · 2021-07-27T15:38:56Z

It would be useful for all country packages to improve the speed of tests, and to improve the speed of computations in general. Currently, running all tests in the most complex tax and benefits system (OpenFisca-France) takes 10 to 20 minutes in CI, and users report the same duration locally (#1025).

This issue aims at consolidating potential venues for performance improvements, associated costs, risks and expected benefits, in order to define the best implementation strategy for a team with limited resources that cannot tackle all at once.

Option	Implementation	Gain	Risk
Tests parallelisation	#1025	Only locally, depends on cores count	Decrease in test output readability
Parameters caching	openfisca/openfisca-france#1311	1 load instead of 4 in OF-FR, so about 5 seconds out of 12 minutes	Cache mysteries
Test files caching	openfisca/openfisca-france#1311	None, as they are loaded once in a test run	Cache mysteries
Native YAML parser	Already in place for most platforms with `libyaml`	None in most cases	Platform-specific incompatibilities
Install `libyaml` in CI	Already in place	—	—

Details

Native YAML parser

The “Native YAML parser” option consists of assessing whether libyaml, currently suggested as an optional dependency as it speeds up parameter and tests loading by using a native compiled extension instead of the native Python YAML reader, could not be added as a systematic dependency. The last assessment dates back from a few years and revealed platform-specific incompatibilities. Maybe this has been fixed in upstream. There might be also be an alternative option that fits the same purpose with less incompatibilities.

Strategy & contributions

@HAEKADI will consolidate here results of exploring the “Native YAML parser” and “Install libyaml in CI” options.
All other suggestions to explore are welcome!
Benchmarks for current tests durations in different country packages are welcome!
Declarations of interest for benchmarking your own country package against potential options are welcome!
Analyses assessing which option(s) might be most impactful are welcome.

The text was updated successfully, but these errors were encountered:

bonjourmauko · 2021-07-28T20:06:00Z

Great initiative! 🙌

I share with you a couple of learnings in the hope it helps.

I think what's good about the options proposed is that they not mutually exclusive.

Tests parallelisation

Seems to me one of the the most impactful in terms of raw performance, and scales horizontally.

However, I'm still blocked in trying to make it work, as there's some black magic:

xdist launches independent processes to collect and run the tests, which are then invisible to the one that launched pytest in the first place, and
it replaces openfisca test tests collectors with its own.

To make it work, we need to find a way to tell xdist to not ignore OpenFiscaPlugin.

At this point I'm yet not sure xdist it is compatible with the way we use pytest (pytest.run).

There are other options like https://github.com/browsertron/pytest-parallel that could be worth the test.

Parameters caching

Parameters seem to be loaded just once per test suite.

If that's the case, it could humbly improve performance by a fixed amount, independent of the number of tests to be run, at load time (right after launching openfisca test and before collecting the tests).

If it's not, then it could be actually interesting.

Still, big questions for me would then be:

how to cache/expire —for example atomically vs the whole of the parameters each time
when to cache/expire —after a commit? when instantiating a tax benefit system?

Native YAML parser

Currently, if you have libyaml installed, it is being used for parameter loading and for test collection.

Performance gain is substantial, however, as with the caching, by an almost fixed amount, as those seem to be operations that are run just once per test suite.

However having a native strategy as a default could indeed simplify things.

Install libyaml in CI

It seems to already be the case, at least for core: we're using Docker's Python 3.7 container, which depends on buildpack-deps's, which installs libyaml.

All other suggestions to explore are welcome!

Out of my head some things we've at some point considered, that could need of course further exploring.

All of them are based on the hypothesis that the most expensive operation is calculate.

Optimise calculations
- Reduce non-native operations (Numpy's optimised for +-*/, with object arrays is not much better that Python)
- Extract all native and non-native operations to separate functions
- Opmise native functions —some libraries help here like Numba, Parakreet, Numexpr...
Reduce calculations complexity:
- Create lookup tables or do binary search for all expensive operations, when possible —extending/refactoring Holders
- Reduce recursion —for example doing Split-Apply-Combine as explored here, here, and here
Use a Python interpreter faster than CPython, like PyPy
Use a Python compiler faster than CPython, like Cython

Hope it helps! 😃

HAEKADI · 2021-08-03T08:42:59Z

Native YAML parser

I concur with @maukoquiroga, libyaml is already used for parsing in OpenFisca-Core.
Currently, if you have pyYAML installed, libYAML bindings are used (CLoader in our case).
No warning is generated locally nor on the CI leading me to believe that libYAML is loaded correctly.
I've tried running the tests with and without CLoader (for parameter loading and test collection) on the same code base (OpenFisca-France):

Without CLoader:

Average = 12m 19s

With CLoader:

Average = 11m 46s

The outcome is not what I expected, consequently I tested SafeLoader and CLoader on a simpleYAML test file:

from yaml import CLoader, SafeLoader
import yaml
import os
import time


def benchmark_strategy(loader):
    file_size = os.path.getsize("aah.yaml")
    iterations = 1000

    start_time = time.time()

    for rep in range(iterations):
        with open("aah.yaml", 'r') as stream:
            file_parsed = yaml.load(stream, Loader=loader)

    duration = time.time() - start_time

    print('--------------------------------------------------------------------------------------------')
    print('                                         ')
    print(f"******** YAML file size: {file_size / (10 ** 3)} Kb ********")
    print('                                         ')

    print('--------------------------------------------------------------------------------------------')
    print('                                         ')
    print(f"******** %s seconds using {loader}: ******** {duration} seconds ******** ")

    print('                                         ')
    print('--------------------------------------------------------------------------------------------')
    print('                                         ')

    flow = (file_size * iterations) / (duration * 10 ** 3)
    print(f"******** Flow using {loader}: ******** {flow} Kb/seconds ******** ")


benchmark_strategy(CLoader)
benchmark_strategy(SafeLoader)

I got the following :

--------------------------------------------------------------------------------------------

******** YAML file size: 18.002 kb ********

--------------------------------------------------------------------------------------------

******** %s seconds using <class 'yaml.cyaml.CLoader'>: ******** 8.838051080703735 seconds ********

--------------------------------------------------------------------------------------------

******** Flow using <class 'yaml.cyaml.CLoader'>: ******** 2036.8744008850624 kb/seconds ********
--------------------------------------------------------------------------------------------

******** YAML file size: 18.002 kb ********

--------------------------------------------------------------------------------------------

******** %s seconds using <class 'yaml.loader.SafeLoader'>: ******** 80.71783518791199 seconds ********

--------------------------------------------------------------------------------------------

******** Flow using <class 'yaml.loader.SafeLoader'>: ******** 223.0238206722362 kb/seconds ********

CLoader is almost ten times faster than SafeLoader. Obviously this test is far from being statistically significant, however CLoader did show superior performance each and every time.

CircleCi parallelism

Currently, with the free plan, we have 4 parallel runs available on CircleCI.
In OpenFisca-France, we currently use 2 out of 4 (it used to be 3 out of 4) runs for the build job to speed up the testing and the remaining 2 runs for the other builds.
I tested giving 4, 3 and then 2 containers to the build job.

With 4 containers :

Average : 6m 12s

With 3 containers :

Average : 8m 52s

With 2 containers :

Average : 10m 44s

Please note that these tests ran on almost perfect conditions with no other builds running simultaneously.

Obviously the more containers we use, the shorter the test time. However, the bandwidth can only be stretched so far and all the other builds will be waitlisted. A possible high cost low effort solution would be to upgrade the CircleCI plan adding more levels of parallelism

Tests caching:

Inspired by PR #1311.

It might be interesting, as a first step, to try caching the tests before caching the parameters, since the parameter usage is more complex than that of the tests.

To see if has any effect, I tested using Pickle and Marshal to cache the same YAML test file:

from yaml import CLoader, SafeLoader
import yaml
import time
import pickle
import os
import marshal


def benchmark_strategy(loader):
    file_size = os.path.getsize("aah.yaml")
    iterations = 1000

    modules = [pickle, marshal]

    for module in modules:

        with open("aah.yaml", 'r') as stream:
            file_parsed = yaml.load(stream, Loader=loader)

        module_name = 'pickle' if module == pickle else 'marshal'

        stream_file = f"aah.{module_name}"

        with open(stream_file, "wb+") as module_handle:
            module.dump(file_parsed, module_handle)  # protocol=pickle.HIGHEST_PROTOCOL

        start_time = time.time()

        for rep in range(iterations):
            with open(stream_file, "rb") as module_handle:
                file_parsed = module.load(module_handle)

        duration = time.time() - start_time

        print('--------------------------------------------------------------------------------------------')
        print('                                         ')
        print(f"******** YAML file size : {file_size / (10 ** 3)} Kb  ********")
        print('                                         ')

        print('--------------------------------------------------------------------------------------------')
        print('                                         ')
        print(f"******** %s seconds using {module_name} with {loader} : ******** {duration} seconds ******** ")

        print('                                         ')
        print('--------------------------------------------------------------------------------------------')
        print('                                         ')

        flow = (file_size * iterations) / (duration * 10 ** 6)
        print(f"******** Flow using {module_name} with {loader} : ******** {flow} Mb/seconds ******** ")
        print('                                         ')
        print('--------------------------------------------------------------------------------------------')
        print('                                         ')


benchmark_strategy(CLoader)
# benchmark_strategy(SafeLoader)

And I got:

--------------------------------------------------------------------------------------------

******** YAML file size : 18.002 Kb  ********

--------------------------------------------------------------------------------------------

******** %s seconds using pickle with <class 'yaml.cyaml.CLoader'> : ******** 0.19371914863586426 seconds ********

--------------------------------------------------------------------------------------------

******** Flow using pickle with <class 'yaml.cyaml.CLoader'> : ******** 92.92834563215293 Mb/seconds ********

--------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------

******** YAML file size : 18.002 Kb  ********

--------------------------------------------------------------------------------------------

******** %s seconds using marshal with <class 'yaml.cyaml.CLoader'> : ******** 0.8459770679473877 seconds ********

--------------------------------------------------------------------------------------------

******** Flow using marshal with <class 'yaml.cyaml.CLoader'> : ******** 21.279536623467393 Mb/seconds ********

--------------------------------------------------------------------------------------------

Pickle performs better than Marshal and they are both faster than reloading the YAML file every time.

Additional profiling is needed to determine how much testing time is actually spent loading the test files.

MattiSG · 2021-08-04T08:56:57Z

Thanks a lot @maukoquiroga for your insights and references, super useful! And thank you so much @HAEKADI, this is great data!

Conclusions

Libyaml's CLoader is used systematically, so there is no performance gain to be expected.
pyyaml's doc states that CLoader is “available only if you build LibYAML bindings”, so we have to keep the warning and alternative loading code. A way to outsource that would be to use pylibyaml, but our code is already written, let's not add an external dependency for something so small.
We should assess the monetary cost of increasing parallelism, and potentially assess other CI providers.
Caching is an efficient strategy, but will impact tests only if parameters and / or tests are loaded repeatedly in a test session. This has to be confirmed.
Test parallelisation would only improve performance for local testing, not on CI. It would in any case need a major refactor of the way tests are executed, since parallelisation provided by third-party tools depend on standard ways to execute Python tests, which we currently don't; an alternative would be to write our own parallelisation code, but I believe this is a slippery slope.

Uncertainties left to explore

How many times parameters are loaded in a test run?
How many times test files are loaded in a test run?
How much would it cost to add 2 / 4 / 8 / 16 containers from CircleCI?
How long do the tests take when executed with GitHub Actions?
Explore simplifying the test runner code so that standard parallelisation libraries can be used.
What is the performance impact of using a Python interpreter faster than CPython?
What is the performance impact of using a Python compiler faster than CPython?

I set aside all the calculations optimisation to focus on Core-level actionable improvements, as I believe such solutions forward complexity onto contributors. It is however clear that at some point this will be the way forward, and providing guidelines for performance would be useful. This would need a strong investment in profiling.

Non-technical options

Paying for additional containers.
Create separate organisations for each country system. This way, each of these organisations could benefit from CircleCI's free plan for open-source and get 4 containers each. It seems however unlikely that this would impact massively performance, since overlap on active work between countries and Core is rare; what is most observed in CI queues is active teams from one country using resources in a continuous streak. This setup, however, would also simplify governance, as each country would be responsible for its privileges management and team maintenance.

MattiSG · 2021-08-04T09:16:32Z

@HAEKADI could you please have a look at these first 4 items? 🙂
Exploring the test runner code refactor, interpreter and compiler seem too big an investment at this stage, we'll reconvene after this first series of uncertainties are clarified 🙂

HAEKADI · 2021-08-10T09:30:53Z

How many times parameters are loaded in a test run?

In an OpenFisca France test run, the parameters are loaded 4 times, averaging 1.15 seconds to load each time.
I used a logger to detect each time the parameters are loaded.

 1 WARNING:root:***************************************************Parameters loaded in 1.196788 seconds
 1 WARNING:root:***************************************************Parameters loaded in 1.3190979999999999 seconds
 1 WARNING:root:***************************************************Parameters loaded in 0.9405570000000054 seconds
 1 WARNING:root:***************************************************Parameters loaded in 1.139006000000002 seconds

How many times test files are loaded in a test run?

With openfisca test in both the Country-Template and OpenFisca-France, the test files are loaded once. However, the make test command in Country-Template executes the tests once, whereas the same command in OpenFisca-Franceruns the Python tests twice, and that is due to a second call to pytest (see issue).

How much would it cost to add 2 / 4 / 8 / 16 containers from CircleCI?

With our current free plan, we run 4 concurrent job runs. The next tier plan is the performance plan, which offers 80 concurrent job runs. With a monthly fee of 30$ (minimum), we get 50,000 credits. I have contacted the support team at CircleCI to ask for more details pertaining to our case. I am still awaiting their reply.

How long do the tests take when executed with GitHub Actions?

I've tried running a simple configuration on GitHub Actions to test OpenFisca-France, without any parallelisation. Caching the environment has also sped up the run time.

I only ran the build and test jobs, since the latter is the most problematic and time consuming. I did not add the linting jobs.

The resulting times:

15m 11s
10m 35s
10m 52s

Average: 12min 12s

GitHub Actions does not keep a history of successful runs, so I can only link the last run.

To compare, CircleCi ran the build job in 8m 50s.

With the free GitHub Actions plan, we get 20 concurrent jobs.

GitHub Actions does not offer a parallelism option like CircleCI. However, jobs run in parallel by default. Consequently, in order to optimise performance, we need many jobs running simultaneously.

Since yaml testing is the black sheep of the bunch, I used a strategy matrix to divide the yaml tests into 10 and then 20 jobs (20 being the limit offered by the free plan).

The resulting run times:

With 10 jobs:

3min 1s
3min 25s
2min 51s

Average: 3min 5s

Just to compare, on CircleCI the build run took 8min 15s.

With 20 jobs:

2min 42s
2min 35s
2min 43s

Average: 2min 40s

On CircleCI the build run took 7min 49s.

The downside of this split is that the yaml tests will not be split based on test duration, which means that some instances run longer than the others. CircleCI does store historical data of each tests duration and splits the test files based on running duration.

From this initial benchmark, the results are promising. It does seem that GitHub Actions runs faster than CircleCI. And so there is a real decision to be made as to which one should be chosen.

GitHub Actions hasn't been around as long as CircleCI, consequently the level of documentation and community size is not the same.

Obviously, there is a lot to be said for having the code and the CI on the same platform.

However we should keep in mind that with the free plan we are limited to:

500 MB of storage
2000 min per month
I don't know if it could potentially be an issue, but the log files are only kept up to 90 days, and the history of successful runs is not kept, only the last run, contrarily to CircleCI.

MattiSG · 2021-08-11T09:35:23Z

Thank you very much @HAEKADI! I updated the table at the beginning of this discussion.

Caching parameters could thus save us 4 to 5 seconds out of a 10 minutes run in a large codebase such as OpenFisca France.

If I understand correctly, parallelisation seems the most relevant option. GitHub Actions would enable more parallelisation for free, but with a loss of history after 90 days, and a maximum of 2000 free minutes per month. It is also not extendable beyond 20 workers, even if paying. On the other hand, CircleCI has the advantages of years of accumulated experience and practically no limits to expansion as long as we pay accordingly. Both services seem comparable in terms of raw performance.

In order to present the arbitrage properly, we need to know:

How much would CircleCI cost for 16 containers.
How many minutes tests run for on a typical month.
If we store any data, and if so roughly how much.

@HAEKADI can please you collect this information? 🙂

HAEKADI · 2021-08-19T13:39:25Z

The following table depicts the main differences, that I could gather, between CircleCI and GitHub Actions:

		CircleCI	GitHub Actions
Free Plan	Parallelism	1 for free plan 4 for open source	20
	Storage	No available data.	500 MB No limit for public repos
	Computation	2500 credits / week (about 10 credits per minute so 250 minutes/week for free plan) For open source: 400 000 credits/ month	2,000 / month => 500 minutes/week No limit for public repos We are consequently not limited by the minute usage.
	Persisting data	Artifacts are stored for up to 30 days. Workspace for 15 days Caches created via the save_cache step are stored for up to 15 days See here.	Artifacts and logs are stored for up to 90 days
	History runs	Not clear. But there appears to be no limit.	Not clear on the docs. I asked the question on GitHub community. The workflow run and status are kept but not the details. Example.
	Current average time for a test run	With 4 runners: 8min 15s	With 10 runners: *3min 5s*
Paid Plan	Name	Performance	Team
	Price	30$/month (I still have not received an answer from the CircleCI support team regarding additional details for a custom plan.)	4$ per user
	Parallelism	Up to 80 concurrency	60
	Computation	Starts at 50,000 credits for $30	3,000 minutes/month No limit for public repos
Current usage	Computation	5,669 minutes / month 57,530 credits/ month in the last 3 months	0 (as seen with @sandcha) Most likely because we have unlimited usage
Current usage	Storage	For July 2021 : 220 GB (mostly France) See the screenshot below.	0

The current free CircleCI plan details are not updated to the open source plan on the plan overview page, and consequently do not show the actual included credits and runners.
July 2021 data usage on CircleCI:

Comparing the two free plans, GitHub Actions seems to offer more if we were willing to overlook losing the logs after 90 days.

MattiSG · 2021-08-19T14:35:42Z

Thank you so much for this detailed, sourced comparison @HAEKADI! This is excellent work and the best decision basis we could wish for 👏

Based on it, the only downside I see with GitHub Actions is that we lose logs after 3 months. We regularly reference build runs in issues or pull requests, and some decisions are taken based on them, most of all regarding performance.

I believe it can be acceptable to have only worflow-level metadata (which is kept indefinitely), and to learn to copy-paste in the issue body the subset of the logs when we make a decision based on them. This would allow us to increase parallelism (and thus test speeds) five fold. This would also increase vendor lock-in with GitHub, but this seems acceptable to me considering we can always revert to some other provider if we ever decide to migrate away from GitHub, and that this is not currently planned.

I'll open an RFC for switching from CircleCI to GitHub Actions.

Meanwhile, do you feel like investigating the two remaining elements:

What is the performance impact of using a Python interpreter faster than CPython?
What is the performance impact of using a Python compiler faster than CPython?

If this is not obvious and you need more context, I unfortunately don't have enough technical knowledge to provide much more, so we can also wait for @maukoquiroga to handle this at some point in the future 🙂

MattiSG · 2022-05-05T14:37:29Z

The switch to GitHub Actions seems to have alleviated the concern with tests speed on CI. Further possible improvements, i.e. changing Python compiler and interpreter, seem to have way stronger technical consequences. While they might be considered in the future in case they are deemed necessary, the issue of tests performance seems to have been enough under control that no community member complained about tests speed since implementation of #1030. I will thus close this issue 🙂 Thanks to all the people who made this dramatic improvement a reality 👏

MattiSG assigned HAEKADI Jul 27, 2021

MattiSG mentioned this issue Aug 11, 2021

Doublons de tests Python avec Make test openfisca/openfisca-france#1644

Closed

bonjourmauko added the kind:discovery Issue requires discovery: value, ux and tech label Aug 11, 2021

MattiSG mentioned this issue Aug 19, 2021

Switch from CircleCI to GitHub Actions #1030

Closed

bonjourmauko mentioned this issue Aug 20, 2021

Run tests in a parallel matrix #1031

Closed

10 tasks

MattiSG mentioned this issue Aug 24, 2021

Introduce a way to run test in parallel #1025

Open

This was referenced Sep 1, 2021

Passage de CircleCI vers GitHub Actions openfisca/openfisca-france#1663

Merged

Switch from CircleCI to GitHub Actions openfisca/country-template#115

Merged

bonjourmauko added this to the Improved testing milestone Sep 29, 2021

MattiSG closed this as completed May 5, 2022

MattiSG mentioned this issue Jun 9, 2022

Partial formula execution PolicyEngine/openfisca-tools#65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve tests performance #1027

Improve tests performance #1027

MattiSG commented Jul 27, 2021 •

edited

Loading

bonjourmauko commented Jul 28, 2021

HAEKADI commented Aug 3, 2021 •

edited

Loading

MattiSG commented Aug 4, 2021 •

edited

Loading

MattiSG commented Aug 4, 2021

HAEKADI commented Aug 10, 2021 •

edited

Loading

MattiSG commented Aug 11, 2021 •

edited by HAEKADI

Loading

HAEKADI commented Aug 19, 2021 •

edited

Loading

MattiSG commented Aug 19, 2021

MattiSG commented May 5, 2022

Improve tests performance #1027

Improve tests performance #1027

Comments

MattiSG commented Jul 27, 2021 • edited Loading

Details

Native YAML parser

Strategy & contributions

bonjourmauko commented Jul 28, 2021

Tests parallelisation

Parameters caching

Native YAML parser

Install libyaml in CI

All other suggestions to explore are welcome!

HAEKADI commented Aug 3, 2021 • edited Loading

Native YAML parser

CircleCi parallelism

Tests caching:

MattiSG commented Aug 4, 2021 • edited Loading

Conclusions

Uncertainties left to explore

Non-technical options

MattiSG commented Aug 4, 2021

HAEKADI commented Aug 10, 2021 • edited Loading

How many times parameters are loaded in a test run?

How many times test files are loaded in a test run?

How much would it cost to add 2 / 4 / 8 / 16 containers from CircleCI?

How long do the tests take when executed with GitHub Actions?

MattiSG commented Aug 11, 2021 • edited by HAEKADI Loading

HAEKADI commented Aug 19, 2021 • edited Loading

MattiSG commented Aug 19, 2021

MattiSG commented May 5, 2022

MattiSG commented Jul 27, 2021 •

edited

Loading

HAEKADI commented Aug 3, 2021 •

edited

Loading

MattiSG commented Aug 4, 2021 •

edited

Loading

HAEKADI commented Aug 10, 2021 •

edited

Loading

MattiSG commented Aug 11, 2021 •

edited by HAEKADI

Loading

HAEKADI commented Aug 19, 2021 •

edited

Loading