Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce single file format overhead by lazily importing modules and loading .gitignore / --exclude #3211

Merged
merged 3 commits into from
Aug 27, 2022

Conversation

ichard26
Copy link
Collaborator

@ichard26 ichard26 commented Aug 5, 2022

Description

`black.reformat_many` depends on a lot of slow-to-import modules. When
formatting simply a single file, the time paid to import those modules
is totally wasted. So I moved `black.reformat_many` and its helpers
to `black.concurrency` which is now *only* imported if there's more
than one file to reformat. This way, running Black over a single file
is snappier

Here are the numbers before and after this patch running `python -m
black --version`:

- interpreted: 411 ms +- 9 ms -> 342 ms +- 7 ms: 1.20x faster
- compiled: 365 ms +- 15 ms -> 304 ms +- 7 ms: 1.20x faster
Loading .gitignore and compiling the exclude regex can take more than
15ms. We shouldn't and don't need to pay this cost if we're simply
formatting files given on the command line directly.

I would've loved to lazily import pathspec, but the patch won't be clean
until the file collection and discovery logic is refactored first.

Co-authored-by: Fabio Zadrozny <fabiofz@gmail.com>

Fixes #2564 and closes #2656.

Checklist - did you ...

  • Add a CHANGELOG entry if necessary?
  • Add / update tests if necessary?
  • Add new / update outdated documentation? -> n/a

@ichard26 ichard26 added the C: performance Black is too slow. Or too fast. label Aug 5, 2022
@ichard26
Copy link
Collaborator Author

ichard26 commented Aug 5, 2022

For what it is worth, I'd prefer a rebase merge for this PR given the commits are mostly independent (other than the changelog entry).

@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26
Copy link
Collaborator Author

ichard26 commented Aug 5, 2022

Coolio, diff-shades is failing and I don't know why at all ...

@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
@ichard26 ichard26 marked this pull request as draft August 5, 2022 21:34
Copy link

@Doyhenc99 Doyhenc99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadang bib

src/black/__init__.py Outdated Show resolved Hide resolved
src/black/__init__.py Outdated Show resolved Hide resolved
ichard26 and others added 3 commits August 13, 2022 13:49
`black.reformat_many` depends on a lot of slow-to-import modules. When
formatting simply a single file, the time paid to import those modules
is totally wasted. So I moved `black.reformat_many` and its helpers
to `black.concurrency` which is now *only* imported if there's more
than one file to reformat. This way, running Black over a single file
is snappier

Here are the numbers before and after this patch running `python -m
black --version`:

- interpreted: 411 ms +- 9 ms -> 342 ms +- 7 ms: 1.20x faster
- compiled: 365 ms +- 15 ms -> 304 ms +- 7 ms: 1.20x faster

Co-authored-by: Fabio Zadrozny <fabiofz@gmail.com>
Loading .gitignore and compiling the exclude regex can take more than
15ms. We shouldn't and don't need to pay this cost if we're simply
formatting files given on the command line directly.

I would've loved to lazily import pathspec, but the patch won't be clean
until the file collection and discovery logic is refactored first.

Co-authored-by: Fabio Zadrozny <fabiofz@gmail.com>
os.cpu_count() can return None (sounds like a super arcane edge case
though) so the type annotation for the `workers` parameter of
`black.main` is wrong. This *could* technically cause a runtime
TypeError since it'd trip one of mypyc's runtime type checks so we
might as well fix it.

Reading the documentation (and cross-checking with the source code),
you are actually allowed to pass None as `max_workers` to
`concurrent.futures.ProcessPoolExecutor`. If it is None, the pool
initializer will simply call os.cpu_count() [^1] (defaulting to 1 if it
returns None [^2]). It'll even round down the worker count to a level
that's safe for Windows.

... so theoretically we don't even need to call os.cpu_count()
ourselves, but our Windows limit is 60 (unlike the stdlib's 61) and I'd
prefer not accidentally reintroducing a crash on machines with many,
many CPU cores.

[^1]: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor
[^2]: https://github.com/python/cpython/blob/a372a7d65320396d44e8beb976e3a6c382963d4e/Lib/concurrent/futures/process.py#L600
@ichard26 ichard26 force-pushed the reduce-single-file-format-overhead branch from aafd5b9 to 2c0d427 Compare August 13, 2022 17:50
@ichard26 ichard26 requested a review from JelleZijlstra August 13, 2022 17:50
@ichard26 ichard26 marked this pull request as ready for review August 13, 2022 17:50
@ichard26
Copy link
Collaborator Author

PTAL. Please note I'd prefer a rebase merge here if you approve and chose to merge right after.

workers: Optional[int],
) -> None:
"""Reformat multiple files using a ProcessPoolExecutor."""
maybe_install_uvloop()
Copy link
Collaborator Author

@ichard26 ichard26 Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to keep the helper function instead of always attempting an import of uvloop in the module scope just as insurance if we ever accidentally break the invariant black.concurrency should only be imported via black.main if there's more than one file to format.

@github-actions
Copy link

diff-shades reports zero changes comparing this PR (2c0d427) to main (4ebf14d).


What is this? | Workflow run | diff-shades documentation

@ichard26
Copy link
Collaborator Author

This works on Windows with PyInstaller (just tested locally) so I'm merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: performance Black is too slow. Or too fast.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Black slow startup to format a single file
3 participants