Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix formatting performance problems with .formatter.exs in subdirectories #609

Merged
merged 4 commits into from
Dec 4, 2021

Conversation

jonleighton
Copy link
Contributor

When dealing with .formatter.exs files that are in subdirectories of the
main project, the behaviour was incorrect:
#361

And also caused performance problems:
#381

A good example is a project I currently work on. We have a top-level
.formatter.exs, as well as a test/.formatter.exs.

The one in test/ has the :inputs option set to ["**/*.{ex,exs}"].
Before this commit, should_format?/3 would take this pattern and
concatenate it with the project directory. This would result in
Path.wildcard/2 globbing for all source files in the entire project, rather
than just under test/. This causes formatting to take about 5 seconds on
our project.

This commit fixes the problem by taking the file to be formatted, and
finding the .formatter.exs file that exists in the closest parent
directory (so long as that directory is still within the project root). We
then perform the Path.wildcard/2 relative to that directory.

I’ve removed the hack that was in place to support umbrella projects,
because this is a more general solution to the problem.

On our project, I am now able to format a file in 100ms or so, which isn’t
a bad speed-up!

This function is really an implementation detail, so let’s test the
public API.

The function also relies on interaction with the filesystem, so testing
without fixtures is a bad idea because it will lead to incorrect
results.
@jonleighton
Copy link
Contributor Author

Thanks for the review @lukaszsamson 🙇

When dealing with `.formatter.exs` files that are in subdirectories of
the main project, the behaviour was incorrect:
elixir-lsp#361

And also caused performance problems:
elixir-lsp#381

A good example is a project I currently work on. We have a top-level
`.formatter.exs`, as well as a `test/.formatter.exs`.

The one in `test/` has the `:inputs` option set to `["**/*.{ex,exs}"]`.
Before this commit, should_format?/3 would take this pattern and
concatenate it with the project directory. This would result in
Path.wildcard/2 globbing for all source files in the entire project,
rather than just under test/. This causes formatting to take about 5
seconds on our project.

This commit fixes the problem by taking the file to be formatted, and
finding the `.formatter.exs` file that exists in the closest parent
directory (so long as that directory is still within the project root).
We then perform the Path.wildcard/2 relative to that directory.

I’ve removed the hack that was in place to support umbrella projects,
because this is a more general solution to the problem.

On our project, I am now able to format a file in 100ms or so, which
isn’t a bad speed-up!
@jonleighton jonleighton force-pushed the fix-formatter-performance branch from 7580f18 to 053539d Compare October 2, 2021 01:36
@jonleighton
Copy link
Contributor Author

I've updated the branch to address your comments @lukaszsamson

The call to Path.wildcard/2 can cause use to traverse a large number of
files, which can be very slow depending on the project and the globs
used:

elixir-lsp#381

I wrote a library that implements a glob parser and allows us to check
whether a glob matches a path without touching the filesystem:

https://github.com/jonleighton/path_glob

This commit switches out the Path.wildcard/2 call to use this library.
@jonleighton jonleighton changed the title Check .formatter.exs :inputs relative to its directory Fix formatting performance problems with .formatter.exs in subdirectories Oct 3, 2021
@jonleighton
Copy link
Contributor Author

Ok, I've added one more commit which avoids the filesystem traversal entirely:

  • I wrote a library called path_glob that implements matching for globs without interacting with the filesystem
  • I've implemented it by writing a parser with nimble_parsec, so I believe it to be fairly robust. I wrote a lot of test cases and fixed lots of edge cases.
  • That said, it's a new library so more testing/review is welcome. But I'm pretty sure it will handle the most common use cases without trouble.
  • The new commit just drops in this library.

If there is a concern about maintenance, I'd be happy for the library to be hosted under the elixir-lsp org.

@arnodirlam
Copy link

Thank you for all the effort you already put into this proper solution, @jonleighton!

Looking forward to having this released, and have migrations formatted on Save with the phoenix default .formatter.exs files (#361) 🙌 🚀

axelson added a commit to axelson/mix_task_archive_deps that referenced this pull request Oct 29, 2021
Specifically will be used to not distribute nimble_parsec after merging
elixir-lsp/elixir-ls#609
@axelson
Copy link
Member

axelson commented Oct 29, 2021

Thanks! This approach looks like a serious improvement over what we were doing previously 🎉

I haven't taken an in-depth look at the code yet (but I plan to soon!) but we will want to vendor path_glob (similar to what we did with jason) in case a user tries to use a different version of path_glob in their project. And since I'd like to avoid vendoring nimble_parsec as well (since it is larger and more widely used) I'm submitting a couple PR's:

@jonleighton
Copy link
Contributor Author

jonleighton commented Oct 30, 2021

@axelson Thanks for that, I've merged your PR and released v0.1.1 with it.

I'll just note for completeness that I had a chat with José Valim about adding a Path.match?() function to Elixir's standard library. He was keen on the idea but said that because Path.wildcard() is built on Erlang's filelib, it would need to be added to Erlang first. It turns out that there is a private function in filelib that provides a parse tree for a glob expression so it wouldn't require implementing a parser, just converting the parse tree into a regex. (I didn't realise this when writing path_glob... but it was a fun exploration of nimble_parsec anyway.)

I put that in the "maybe one day" basket. I don't think I'm going to make time to do that in the near future but it would be a good addition to the standard library I think and I understand why he doesn't want to just include/maintain a whole new parser in Elixir. In any case, elixir_ls couldn't use it until the minimum Elixir version supported includes this function.

axelson added a commit to elixir-lsp/mix_task_archive_deps that referenced this pull request Nov 18, 2021
Specifically will be used to not distribute nimble_parsec after merging
elixir-lsp/elixir-ls#609
@axelson
Copy link
Member

axelson commented Nov 18, 2021

@jonleighton Thanks for that write-up and additional detail. I just opened #628 once we've tested and merged that I think this PR should be good to go!

Copy link
Member

@axelson axelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks! ❤️

@axelson axelson merged commit befc60e into elixir-lsp:master Dec 4, 2021
lukebakken pushed a commit to rabbitmq/mix_task_archive_deps that referenced this pull request Dec 13, 2022
Specifically will be used to not distribute nimble_parsec after merging
elixir-lsp/elixir-ls#609

Update workflow
lukebakken pushed a commit to rabbitmq/mix_task_archive_deps that referenced this pull request Dec 13, 2022
Specifically will be used to not distribute nimble_parsec after merging
elixir-lsp/elixir-ls#609

Update workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants