Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize startup time using lazy imports #543

Closed
wants to merge 9 commits into from

Conversation

osma
Copy link
Member

@osma osma commented Dec 17, 2021

Fixes #514

The goal of this PR is to reduce CLI startup time by avoiding useless work, especially imports that are not necessary for the requested operation.

It makes the following changes to the import statements within the Annif codebase:

  • complete rewrite of annif/backend/__init__.py; the end result is that backends (and the libraries they require, e.g. fasttext, omikuji and tensorflow) are only imported when they are actually used
  • avoid importing NLTK, sklearn, joblib and NumPy unless actually required, by moving import statements inside functions and methods

I tried to craft the changes to have minimal impact on the code so I only chose to make imports local in cases where there were very few uses within the same module. For example annif.suggestion uses NumPy a lot, so I didn't try to use local imports within that module, but instead avoided importing the whole annif.suggestion module in other parts of the codebase.

Startup time for simple commands such as annif --help and annif --version has been reduced by more than two thirds.

Before:

$ time annif --version
0.56.0.dev0

real	0m4,052s
user	0m4,001s
sys	0m0,568s

After:

$ time annif --version
0.56.0.dev0

real	0m1,302s
user	0m1,250s
sys	0m0,050s

As explained in #514, I also used tuna to visualize where the remaining import time is spent after this PR:

image

The main culprits are now connexion (with most of the time spent initializing openapi_spec_validator!) and flask. Those are core libraries and I don't think we can avoid importing them even for the simplest CLI commands.

@osma osma added this to the 0.56 milestone Dec 17, 2021
@codecov
Copy link

codecov bot commented Dec 17, 2021

Codecov Report

Merging #543 (01f068f) into master (fbd1f92) will decrease coverage by 0.11%.
The diff coverage is 91.17%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #543      +/-   ##
==========================================
- Coverage   99.48%   99.37%   -0.12%     
==========================================
  Files          80       80              
  Lines        5282     5296      +14     
==========================================
+ Hits         5255     5263       +8     
- Misses         27       33       +6     
Impacted Files Coverage Δ
annif/project.py 99.39% <ø> (-0.01%) ⬇️
annif/backend/__init__.py 88.67% <88.46%> (-11.33%) ⬇️
annif/analyzer/analyzer.py 100.00% <100.00%> (ø)
annif/analyzer/snowball.py 100.00% <100.00%> (ø)
annif/cli.py 99.62% <100.00%> (+<0.01%) ⬆️
annif/corpus/skos.py 98.21% <100.00%> (+0.06%) ⬆️
annif/corpus/subject.py 100.00% <100.00%> (ø)
annif/rest.py 100.00% <100.00%> (ø)
annif/util.py 98.24% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fbd1f92...01f068f. Read the comment docs.

@sonarcloud
Copy link

sonarcloud bot commented Dec 17, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@osma
Copy link
Member Author

osma commented Dec 17, 2021

I also looked at the option of implementing truly lazy imports, for example as described in the importlib standard module. But that didn't seem very appealing - it takes a nontrivial amount of code and can be hard to debug if the imports fail for some reason. So I chose to use local imports instead, which are easier to understand if not always so elegant.

@osma
Copy link
Member Author

osma commented Dec 17, 2021

Looking at CI test results and especially Codecov, I think we need additional tests for the ImportError/ValueError clauses in annif/backend/__init__.py

@osma
Copy link
Member Author

osma commented Dec 20, 2021

I will try again with a less invasive approach that doesn't try to avoid importing NumPy. It's not worth complicating the code with many local imports just to speed up the startup by less than 0.1 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize startup time with lazy imports
1 participant