Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache data fetched from GitHub between builds #12

Open
squidfunk opened this issue May 28, 2022 · 20 comments
Open

Cache data fetched from GitHub between builds #12

squidfunk opened this issue May 28, 2022 · 20 comments

Comments

@squidfunk
Copy link
Collaborator

First of all, great plugin! I'm evaluating adding native support for this plugin to Material for MkDocs. The main challenge is the increase in build times – they are pretty significant. For instance, the build time for Material for MkDocs went up from 10s to 77s. Would it be possible to cache results?

The plugin would only need to cache the list of contributors per file + the information obtained for each contributor. If this data is normalized, fetching contributor information should only be necessary when there's a new contributor that hasn't been seen before. Other than that, the data is pretty much static – login, name, avatar URL. The name could change, but I'd suspect this is rarely the case. Also, for a page, the data would only need to be re-fetched when the commit SHA changed.

Happy to help.

@squidfunk squidfunk changed the title Cache results between builds Cache data fetched from GitHub between builds May 28, 2022
@ojacques
Copy link
Collaborator

@squidfunk, I hate to hijack this issue, but performance is one big reason I had to fork this project to create my own (I really dislike forking, but had to).
The fork adds cache, tries to get as much info as possible from local git repo and tries hard to reduce hits on GitHub's API. It also uses GitHub's GraphQL API for efficiency.

https://github.com/ojacques/mkdocs-git-committers-plugin-2

I have happily used it with mkdocs-material, although an old version. Happy to help. Also happy to get my changes in this upstream repo.

@squidfunk
Copy link
Collaborator Author

@ojacques thanks! I've test-driven your project, but unfortuntely it produced inconsistent results for me. For example, my account (@squidfunk) was not correctly determined (login was empty). Maybe you can try it on the Material for MkDocs repository and track it down. I'm currently working on a tighter GitHub integration for this plugin and the git-revision-date(-localized) plugin.

@ojacques
Copy link
Collaborator

Maybe you can try it on the Material for MkDocs repository and track it down. I'm currently working on a tighter GitHub integration for this plugin and the git-revision-date(-localized) plugin.

Yes, I'll have a look. If this could be part of material proper, it would help me a ton. The short video is awesome (although I loved having contributors on top and not at the bottom).

@ojacques
Copy link
Collaborator

@squidfunk - after testing against mkdocs-material, I pushed a new version (0.4.0), which:

  • handles more (all?) cases
  • remove duplicated authors even if GIT authentication was different

It does not cache between builds though, but makes sure to only query info for a user once. To be precise, it will try first to get the info with the email then the name if not found, then the GitHub username if still not found.

Caching between builds is not there yet, but I can work on it by saving the dictionary file at the end of site generation and attempting to load it if it exists at the beginning.

@ojacques
Copy link
Collaborator

Caching between builds is not there yet, but I can work on it by saving the dictionary file at the end of site generation and attempting to load it if it exists at the beginning.

Done in 0.4.1. Although I'm not sure this is the best way/place to save this cache file. There is a risk for it to be committed if not added as part of .gitignore.

But hey, mkdocs-material build with authors is now 7 seconds for me, after the first run!

@squidfunk
Copy link
Collaborator Author

Awesome, thanks for investing the time! I'll check it out the next time I work on this enhancement. I just put files to cache into a folder called .cache when I write plugins, e.g. for the built-in privacy plugin.

Yes, I'll have a look. If this could be part of material proper, it would help me a ton. The short video is awesome (although I loved having contributors on top and not at the bottom).

You will be able to easily change the location by overriding the content partial which controls the content area of the document. You'd just need to move the section to the top.

@squidfunk
Copy link
Collaborator Author

squidfunk commented Jun 18, 2022

I can confirm that the build time went waaaaay down, awesome job 🚀 I'll now consider it safe to recommend this plugin! The improved git plugin integration will be released in the coming weeks, I'll put a link to your plugin in our documentation.

One more thing: could we maybe make the cache folder configurable? I'd like to keep my .gitignore as short as possible, which is why I'd want to save authors.json to .cache. Could we maybe add a config option for that?

@byrnereese sorry for hi-jacking this issue, we should really have discussed this over at the fork.

Edit: Material for MkDocs' plugins will now write their caches into subfolders. Maybe this is also something to consider for the mkdocs-git-committers-plugin. We're currently using:

  • .cache/plugins/privacy
  • .cache/plugins/social

We recommend caching on our docs, which makes use of the .cache folder. To limit confusion, it might be a good idea to settle on that name, but it's certainly not necessary. If a setting would be available, we can recommend setting it to .cache/plugin/git-committers on our documentation 😊

@byrnereese
Copy link
Owner

Why don't you issue a PR so we can stay in sync. No need to fork if we are all aligned.

@ojacques
Copy link
Collaborator

@byrnereese Absolutely. I will issue a PR against this repo. It may be big / hard to review. I apologize in advance.
Also, I got contributions lately against my fork improving this even further.
@squidfunk - I will expose the cache location as config, defaulting to the one you suggested above.

@squidfunk
Copy link
Collaborator Author

squidfunk commented Jun 22, 2022

@byrnereese your plugin looks a little unmaintained, as there are three open issues without any answers and there's not much commit activity. I guess this is also why the fork exists. If we could gravitate towards one solution, that would be awesome, but we'd need to be able to push out bug fixes fast.

I'd expect some increase on this or @ojacques' repository, depending on what we recommend using on our docs, so the moment this natively is supported by Material for MkDocs, there's also likely to be more exposure for your plugins, more users, and with that potentially more bugs and edge cases. The question is whether you're up for maintaining it.

However, consolidation of efforts would be something to strive for.

@byrnereese
Copy link
Owner

I can certainly help when I am able. Happy to make you maintainer on this repository if that would give you more access, control and autonomy.

This is embarrassing, but for some reason I don't always get notified via email of new issues or PRs. Never figured out why. I will look into that so that I can be more aware of issues that come in.

@squidfunk
Copy link
Collaborator Author

@byrnereese you can customize notification settings per repository here:

Bildschirmfoto 2022-06-23 um 07 56 48

@squidfunk
Copy link
Collaborator Author

@ojacques @byrnereese insiders-4.19.0 adds native support for the git-authors and git-committers plugins. We currently recommend using @ojacques' fork because it implements caching. Otherwise, build times would skyrocket. If in the future, you both decide to merge efforts, please consider PR'ing a change to our documentation 😊 Here's how it looks:

Bildschirmfoto 2022-06-24 um 15 33 13

@ojacques
Copy link
Collaborator

This is so cool! Getting contributors listed in pages has been a big incentive for readers to be turned into contributors.

I will look at the first issue reported and continue to submit changes back to upstream.

@byrnereese - once / if you agree, happy to be made a maintainer of this repo and continue working with you.

@timvink
Copy link
Contributor

timvink commented Jun 24, 2022

Awesome work @squidfunk and @ojacques , really cool stuff!

@byrnereese
Copy link
Owner

@ojacques I have not been a good partner in maintaining this project and would like to finally resolve the fact that we have outstanding features in a fork. I have made you a collaborator on this project, and can give you rights to the pypi project as well. Let me know how best to proceed - as I would love to fold your many improvements into the plugin.

@ojacques
Copy link
Collaborator

Thanks @byrnereese! Although fulfilling the same goal, the plugin has now a very different implementation, directly getting contributors from GitHub (although not through an API). Along the way, I am not sure I did not broke other usage of this plugin (for example, it will not work with anything other than GitHub).
We will have to think how to best merge the two, without breaking others.

@byrnereese
Copy link
Owner

@ojacques I think that is a fine direction to be honest. My plugin similarly was tied to github - although it also leveraged plain git. But the linking to people's profiles was very much github-centric. Would it make sense to do a hard merge? I think the risk of breaking people is minor as the utility of the plugin is specifically for mkdocs. But I think it would be good to canonize one of these plugins as the official one.

@timvink
Copy link
Contributor

timvink commented Apr 12, 2023

While we're discussing the set of git-based mkdocs plugins, it might also be relevant to discuss 'plain git'.

I feel there's potential to build a standard, lower-level mkdocs-git-info-plugin that adds meta data to page.file objects that other plugins can use (if you add an on_startup() method the plugin state is preserved and accessible for other plugins). There is a surprising amount of edge cases in retrieving git info, as I've learned from maintaining mkdocs-git-revision-date-localized-plugin and mkdocs-git-authors-plugin. It could also take care of caching and such.

I've started an implementation at https://github.com/timvink/mkdocs-git-info-plugin that has some basic working code, but it definitely needs a lot more time and careful design before I can integrate into my existing plugins.

@byrnereese
Copy link
Owner

@timvink I am happy to lend a hand in getting the plugin fleshed out more. What specifically will this plugin do and make available to mkdocs developers?

Taking a step back, perhaps what all of us on this thread can do is come up with a logic structure for these plugins so that they are designed to work in a standalone manner or in concert with one another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants