perf: memoise DataSerializer.read_file results #543

huonw · 2021-08-20T01:46:02Z

Description

This applies the functools.lru_cache memoising decorator to the low level amber.DataSerializer.read_file method, which dramatically improves performance by avoiding re-reading a single snapshot file dozens of times, thus eliminating the quadratic behaviour observed in #539 in the common case of read-only interaction with amber snapshots.

For the test in #539, with a single snapshot test parametrised with SIZE instances (note that doubling the size now doubles the time):

SIZE	before (1.43) time (s)	after (this PR) time (s)
100	0.09	0.07
500	0.75	0.29
1000	2.42	0.51
2000	8.24	0.98
10000	201	4.64

For our real-world test suite, with 453 snapshots across 25 files:

syrupy version	time (s)
1.4.1 (original)	68
1.4.2	51
1.4.3	35
this PR	30

This has to be on this DataSerializer class method, rather than the higher level methods in AmberSnapshotExtension because the methods there are instance methods, with a self argument, and it's called on different instances (and thus end up as different lru_cache entries).

This optimisation is limited to only reading the amber file types, but I think images (and other single-file snapshots) will not have the same quadratic issues, and writing is arguably much less common than reading.

Related Issues

Snapshot discovery and reading takes quadratic time #539

Checklist

This PR has sufficient documentation.
This PR has sufficient test coverage.
This PR title satisfies semantic convention.

Additional Comments

No additional comments.

noahnu · 2021-08-20T03:18:51Z

@all-contributors add @huonw for code, bug

allcontributors · 2021-08-20T03:18:59Z

@noahnu

I've put up a pull request to add @huonw! 🎉

## [1.4.4](v1.4.3...v1.4.4) (2021-08-20) ### Performance Improvements * memoise DataSerializer.read_file results ([#543](#543)) ([df5b516](df5b516))

tophat-opensource-bot · 2021-08-20T03:21:27Z

🎉 This PR is included in version 1.4.4 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

perf: memoise DataSerializer.read_file results

0311150

huonw mentioned this pull request Aug 20, 2021

Snapshot discovery and reading takes quadratic time #539

Closed

isort, 3.6/3.7 support

93e7f52

noahnu approved these changes Aug 20, 2021

View reviewed changes

noahnu merged commit df5b516 into syrupy-project:master Aug 20, 2021

allcontributors bot mentioned this pull request Aug 20, 2021

docs: add huonw as a contributor for code, bug #544

Merged

huonw deleted the perf/539-cache-read branch August 20, 2021 03:19

tophat-opensource-bot pushed a commit that referenced this pull request Aug 20, 2021

chore(release): 1.4.4 [skip ci]

151a859

## [1.4.4](v1.4.3...v1.4.4) (2021-08-20) ### Performance Improvements * memoise DataSerializer.read_file results ([#543](#543)) ([df5b516](df5b516))

tophat-opensource-bot added the released label Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: memoise DataSerializer.read_file results #543

perf: memoise DataSerializer.read_file results #543

huonw commented Aug 20, 2021 •

edited

Loading

noahnu commented Aug 20, 2021

allcontributors bot commented Aug 20, 2021

tophat-opensource-bot commented Aug 20, 2021

perf: memoise DataSerializer.read_file results #543

perf: memoise DataSerializer.read_file results #543

Conversation

huonw commented Aug 20, 2021 • edited Loading

Description

Related Issues

Checklist

Additional Comments

noahnu commented Aug 20, 2021

allcontributors bot commented Aug 20, 2021

tophat-opensource-bot commented Aug 20, 2021

huonw commented Aug 20, 2021 •

edited

Loading