Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve load time #85

Merged
merged 11 commits into from
Jul 23, 2022
Merged

Improve load time #85

merged 11 commits into from
Jul 23, 2022

Conversation

banesullivan
Copy link
Owner

Helps #79 to speed up scooby import time since distutils is imported for a lesser used method get_standard_lib_modules()

We can remove/deprecate in a follow-up PR. This change speeds up the import time from 0.378s to 0.044s for me

cc @prisae

@banesullivan
Copy link
Owner Author

I lied. This doesn't do anything after merging with main .... will fix

@banesullivan banesullivan marked this pull request as draft July 21, 2022 21:31
@banesullivan banesullivan requested a review from prisae July 21, 2022 21:41
@banesullivan banesullivan marked this pull request as ready for review July 21, 2022 21:41
@banesullivan
Copy link
Owner Author

@prisae, some of the logic with psutil and mkl in knowledge.py also slows down the import a good bit but not nearly as much

scooby/knowledge.py Outdated Show resolved Hide resolved
@prisae
Copy link
Collaborator

prisae commented Jul 22, 2022

Thanks @banesullivan for jumping quick on this! Yes, now that distutils is gone the picture looks very different.
2022-07-22-02

Zooming in to scooby:
2022-07-22-03

I pushed to also lazy-import multiprocessing, resulting in
2022-07-22-04

Overall, this is a massive difference. The big pieces are now:

  • psutil, the "elephant": However, it is an optional import and required in two places, for the RAM and for the filesystem type.
  • platform: is required in several places over report.py and knowledge.py.
  • pathlib and mkl could be changed relatively easy.

These changes would be already very small compared to the changed achieved so far. I wonder if we should do them (hide all the imports in the functions), or leave it as is for know. It might be worth holding on for a while, there are "things in the works" I think:

@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2022

Codecov Report

Merging #85 (7d62f3f) into main (999d9ca) will increase coverage by 0.08%.
The diff coverage is 79.45%.

@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
+ Coverage   87.15%   87.23%   +0.08%     
==========================================
  Files           4        4              
  Lines         366      384      +18     
==========================================
+ Hits          319      335      +16     
- Misses         47       49       +2     

@prisae
Copy link
Collaborator

prisae commented Jul 22, 2022

Okay, I also made psutil, mkl and numexpr a lazy load. It is not necessarily "beautiful", but effective. Maybe we should revisit these things if lazy-loading becomes more embedded within Python itself.

The scooby load time is now down to 22% of the whole load process, so 3/4 are out of our hands, which I think is more than sufficient. The speed-up of the pure scooby-share we achieved with this is almost 20x (from ~0.288 s to 0.015s)!
2022-07-22-05

@prisae
Copy link
Collaborator

prisae commented Jul 22, 2022

This brings an interesting question. Should we add a benchmark testing load time, so we do not accidentality introduce a slow load at a later point?

Not sure how to do that exactly. Here a very ugly way, probably very error prone, but at least an idea ;-) It asserts that the import time is less than 0.1 seconds (maybe we could aim at 0.05 s).

import subprocess

out = subprocess.run(
    ["python", "-X", "importtime", "-c", "import scooby"], capture_output=True
)
itime = int(out.stderr.decode("utf-8").split("\n")[-2].split('|')[1].strip())
assert itime < 100_000  # maybe even 50_000?

Maybe a bit better (although not sure if cross-platform)

import subprocess

out = subprocess.run(
    ["time", "-f", "%U", "python", "-c", "import scooby"], capture_output=True
)
assert float(out.stderr.decode("utf-8")[:-1]) < 0.1

@prisae
Copy link
Collaborator

prisae commented Jul 22, 2022

It is fantastic, thanks @banesullivan - in my application the load time spent on scooby went from 0.077s (5%!) to 0.001s (irrelevant), which is fantastic!

(Note the difference, not from 0.288s to 0.015s as reported here; I assume when importing a package together with many other there are some shared benefits, so importing scooby alone is different than importing scooby within another package.)

@prisae prisae mentioned this pull request Jul 22, 2022
3 tasks
@prisae
Copy link
Collaborator

prisae commented Jul 22, 2022

I just realized that your solution here @banesullivan will get into merge conflict with the solution of @akaszynski in #83 - can we get first #83 in?

@akaszynski
Copy link
Collaborator

I just realized that your solution here @banesullivan will get into merge conflict with the solution of @akaszynski in #83 - can we get first #83 in?

Give me a minute.

@prisae prisae changed the title lazy import of disutils lazy import of distutils Jul 23, 2022
@prisae prisae changed the title lazy import of distutils Improve load time Jul 23, 2022
@prisae prisae mentioned this pull request Jul 23, 2022
@akaszynski
Copy link
Collaborator

@prisae, I'll leave it to you to resolve the conflicts. Consider this approved.

@akaszynski
Copy link
Collaborator

This brings an interesting question. Should we add a benchmark testing load time, so we do not accidentality introduce a slow load at a later point?

Not sure how to do that exactly. Here a very ugly way, probably very error prone, but at least an idea ;-) It asserts that the import time is less than 0.1 seconds (maybe we could aim at 0.05 s).

import subprocess

out = subprocess.run(
    ["python", "-X", "importtime", "-c", "import scooby"], capture_output=True
)
itime = int(out.stderr.decode("utf-8").split("\n")[-2].split('|')[1].strip())
assert itime < 100_000  # maybe even 50_000?

Maybe a bit better (although not sure if cross-platform)

import subprocess

out = subprocess.run(
    ["time", "-f", "%U", "python", "-c", "import scooby"], capture_output=True
)
assert float(out.stderr.decode("utf-8")[:-1]) < 0.1

Enforcement of execution times is a bit hard, especially on CI where we have no control over the hardware.

I honestly wish we could record the total number of cpu instructions rather than execution time. That's (probably) less variable.


Regardless, I see no reason why you couldn't warn for execution times. Just use CI/CD as the worst case.

@prisae
Copy link
Collaborator

prisae commented Jul 23, 2022

Good point @akaszynski - but this was indeed my main idea. Just to have a simply test that would warn you if you add by accident a new dependency that takes, say 1s to load. There should go a flag up somewhere.

@akaszynski
Copy link
Collaborator

Good point @akaszynski - but this was indeed my main idea. Just to have a simply test that would warn you if you add by accident a new dependency that takes, say 1s to load. There should go a flag up somewhere.

1s is worse than even the original implementation. I'd say tack on 50% to the import time on our CI/CD with this implementation and then warn if we exceed.

Add more lazy imports; down to 3 ms / ~7%.
This is basically 1/100 of before.
@prisae
Copy link
Collaborator

prisae commented Jul 23, 2022

Down to roughly 3 ms - I think that is good, given that scooby should be a minimalistic library without footprint/burden on the importing package!

2022-07-23-01

@prisae
Copy link
Collaborator

prisae commented Jul 23, 2022

I will add now a small test and double-check everything before merging.

@prisae
Copy link
Collaborator

prisae commented Jul 23, 2022

OK, CI/CD is quite slower than local. Imports took between 0.08s and 0.11s; I set the test for now to 0.15s. That should hopefully work as red flag. If it starts to fail we have to either increase the limit or search for the culprit.

@prisae
Copy link
Collaborator

prisae commented Jul 23, 2022

Thanks both for the feedback! Would anyone mind making a release (@banesullivan, @akaszynski) - or tell me, what I would have to do for a release? I don't think I have ever done one for scooby.

@prisae prisae merged commit 48d0a7f into main Jul 23, 2022
@prisae prisae deleted the faster-import branch July 23, 2022 15:15
@akaszynski
Copy link
Collaborator

Thanks both for the feedback! Would anyone mind making a release (@banesullivan, @akaszynski) - or tell me, what I would have to do for a release? I don't think I have ever done one for scooby.

I can do it. We don't actually document our release approach, but you can glance over https://github.com/pyvista/pyvista/blob/main/CONTRIBUTING.rst to see how we do it for PyVista. I'm just going to implement that.

@banesullivan, I hope you're fine with this. I never really liked having a non-dev version on main.

@Erotemic
Copy link

FWIW because this is Python 3.7+ Using module level __getattr__ would have also been a valid way to do module lazy loading.

@prisae
Copy link
Collaborator

prisae commented Sep 16, 2022

Thanks @Erotemic! Would you mind giving an example how one could use __getattr__ in that context?

@banesullivan
Copy link
Owner Author

Perhaps something like this? https://anvil.works/blog/lazy-modules

If so, (I also did not read that in detail) I don't see why you'd do this... having from package import module_or_func nested in a function for lazy loading seems more than sufficient to me.

@Erotemic
Copy link

@prisae

def __getattr__(key):
    if key == 'platform':
        import platform
        return platform
    else:
        raise AttributeError(key)

Including the above code in the module would allow you to just access platform as an attribute.

There is good discussion of how this can be used more broadly in a scikit-image PR.

@banesullivan I think the current way is also sufficient. It just looks weird to me to include the () at the end of what looks like a module name, which is why I brought it up. Either solution is valid. Contrary to popular belief, there does not need to be only one obvious way to do something.

@prisae
Copy link
Collaborator

prisae commented Sep 16, 2022

That is interesting @Erotemic and good to know - A lot seems to go on at the moment with regards to lazy loading modules in the scientific python stack. I also like the way that SciPy will load newly all submodules only "on-demand", but do not have to be imported explicitly any longer.

@Erotemic
Copy link

FYI: I have a package mkinit that makes it really easy to expose your entire top-level API via lazy or explicit imports. It works by statically parsing your code and then autogenerating boilerplate for __init__.py. The lazy variant for a package looks like this:

    def lazy_import(module_name, submodules, submod_attrs):
        """
        Boilerplate to define PEP 562 __getattr__ for lazy import
        https://www.python.org/dev/peps/pep-0562/
        """
        import importlib
        import os
        name_to_submod = {
            func: mod for mod, funcs in submod_attrs.items()
            for func in funcs
        }

        def __getattr__(name):
            if name in submodules:
                attr = importlib.import_module(
                    '{module_name}.{name}'.format(
                        module_name=module_name, name=name)
                )
            elif name in name_to_submod:
                submodname = name_to_submod[name]
                module = importlib.import_module(
                    '{module_name}.{submodname}'.format(
                        module_name=module_name, submodname=submodname)
                )
                attr = getattr(module, name)
            else:
                raise AttributeError(
                    'No {module_name} attribute {name}'.format(
                        module_name=module_name, name=name))
            globals()[name] = attr
            return attr

        if os.environ.get('EAGER_IMPORT', ''):
            for name in submodules:
                __getattr__(name)

            for attrs in submod_attrs.values():
                for attr in attrs:
                    __getattr__(attr)
        return __getattr__

    
    __getattr__ = lazy_import(
        __name__,
        submodules={
            'submod',
            'subpkg',
        },
        submod_attrs={
            'submod': [
                'submod_func',
            ],
            'subpkg': [
                'nested',
                'nested_func',
            ],
        },
    )

    def __dir__():
        return __all__

    __all__ = ['nested', 'nested_func', 'submod', 'submod_func', 'subpkg']

Which is the lazy version of:

    from mkinit_demo_pkg import submod
    from mkinit_demo_pkg import subpkg

    from mkinit_demo_pkg.submod import (submod_func,)
    from mkinit_demo_pkg.subpkg import (nested, nested_func,)

    __all__ = ['nested', 'nested_func', 'submod', 'submod_func', 'subpkg']

Now you could maintain this yourself. Or you could just run mkinit mkinit_demo_pkg --recursive --lazy --write to regenerate it if there is an API change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants