Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent loading all GH data on reset #245

Merged
merged 5 commits into from
Feb 17, 2023
Merged

Conversation

dunkmann00
Copy link
Contributor

The way this was setup, when running build or serve with --watch, inspect was called and was forcing everything to load. This could cause a very large number of API calls to GitHub. With these changes GitHubMetadata will no longer trigger all of those calls on reset.

The other scenario where a bunch of calls were being forced to be made was when putting the github key with any sub-keys in _config.yml. This went through every key in GitHubMetadata and loaded it, once again potentially making a very large number of calls. These changes take care of this scenario as well.

I was running into this problem when trying to make changes to a theme using this. Every time I ran tests on the theme it would incur a bunch of API calls and I would quickly get a rate limit error and have to wait some time for it to go away. I saw this was originally brought up as a problem in #237, there was a fix but it didn't totally fix it so this bug was brought up again in #242. I hope these changes are acceptable because it would be really nice to get rid of this problem.

Let me know if there is anything else I can do to help fix this!

@dunkmann00
Copy link
Contributor Author

Hello! Has anyone had a chance to take a look at this?

@parkr I see you have historically been the one to work on this repo...my apologies in advance if you no longer do!

@ashmaroli ashmaroli requested a review from parkr January 22, 2023 16:57
lib/jekyll-github-metadata/repository.rb Show resolved Hide resolved
@@ -10,6 +10,11 @@ class MetadataDrop < Jekyll::Drops::Drop

mutable true

def initialize(obj, original_config = nil)
super(obj)
@original_config = original_config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second argument doesn't seem to do anything? Or is the issue that #inspect is being called during logging and #inspect is therefore triggering all the API calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are two places that will always trigger loading of all the keys. The first place is the part you commented on below, when the Drop is merged with site.config. But another spot is when when are locally running serve, a change is made, and the site it rebuilt. This is the behavior described in #242.

That is happening because jekyll-watch reprocesses the site each time it observes a change:
https://github.com/jekyll/jekyll-watch/blob/9c9d2ab3c995784343c7791bc6f02f7c6b2dc1e5/lib/jekyll/watcher.rb#L64 ->
https://github.com/jekyll/jekyll-watch/blob/9c9d2ab3c995784343c7791bc6f02f7c6b2dc1e5/lib/jekyll/watcher.rb#L128

This then gets into Jekyll site.process where reset is called:
https://github.com/jekyll/jekyll/blob/ff60d51c9b59d3bb5cc89f6a4f493602b42c5cac/lib/jekyll/site.rb#L77

In reset, Jekyll::Cache.clear_if_config_changed is called:
https://github.com/jekyll/jekyll/blob/ff60d51c9b59d3bb5cc89f6a4f493602b42c5cac/lib/jekyll/site.rb#L118

And in clear_if_config_changed is where config.inspect is getting called:
https://github.com/jekyll/jekyll/blob/ff60d51c9b59d3bb5cc89f6a4f493602b42c5cac/lib/jekyll/cache.rb#L35

And that is the second location that forces loading of all the keys.

So by storing @original_config, we can always pass that back out whenever inspect is called:
https://github.com/dunkmann00/github-metadata/blob/8819b50b5ab143889de75f05204df0cd33e88376/lib/jekyll-github-metadata/metadata_drop.rb#L28

This then avoids all the API calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @dunkmann00.
What version of this plugin are you using? To my knowledge, the issue with Jekyll Cache calling config.inspect was fixed once the plugin stopped depending on site.config["github"] from v2.15.0 via a771d0d.
Therefore, only the Drop being cast into a Hash should trigger the resolution of all drop keys.

Just asking for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ashmaroli! I am using 2.15.0. The change via the commit you reference fixes the problem on the initial load and render of the site. However, when the site is rebuilt automatically while running jekyll serve, the problem presents itself again.

The original fix waits to inject site.config["github"] until after process and therefore reset is called. But when jekyll-watch notices a change and calls process, site.config["github"] has already been injected. Since it is already injected, we see the original problem occurring once again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    Jekyll::Hooks.register :site, :pre_render do |_site, payload|
      SiteGitHubMunger.global_munger.inject_metadata!(payload)
    end

Is there a post_render hook where we could un-inject this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳 That seems like a great idea! I think at one point this thought crossed my mind but I didn't understand enough of how the library worked so I didn't pursue it. But looking again this totally seems doable. I'll update the PR with this change.

@@ -35,14 +35,14 @@ def github_namespace
when nil
drop
when Hash
Jekyll::Utils.deep_merge_hashes(drop, site.config["github"])
drop.merge(site.config["github"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real fix seems to be this change, since #deep_merge_hashes uses merge! which resolves all the keys in the Drop. Perhaps instead of this, we could modify the Drop to fall back to make a copy of the original site.config["github"] and use it instead of the computed Drop value. So when a user puts site.github.repo in their Liquid template, it queries the Drop, which first looks at site.config["github"]["repo"], then at Drop#repo as a fallback. Drop#repo would then make the necessary API calls only if called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? But that's mostly what is happening with this approach, or so I am pretty sure.

Drop.merge uses merge! ultimately as well, so you are getting the same merge behavior, but instead of running it on every key from the drop, it gets run on every key from site.config["github"].

And that merge puts everything in the mutations hash, and checks that first when the drop is queried.

@dunkmann00 dunkmann00 requested a review from parkr January 26, 2023 21:29
The way this was setup, when running `build` or `serve` with `--watch`,
`inspect` was called and was forcing everything to load. This could
cause a very large number of API calls to GitHub. We now uninject the
MetadataDrop after rendering is complete. With this change
`GitHubMetadata` will no longer trigger all of those calls on reset.

The other scenario where a bunch of calls were being forced to be made
was when putting the `github` key with any sub-keys in `_config.yml`.
This went through every key in `GitHubMetadata` and loaded it, once
again potentially making a very large number of calls. These changes
take care of this scenario as well.
@parkr
Copy link
Member

parkr commented Feb 5, 2023

Running the tests. This LGTM otherwise.

In your tests locally, it prevents the issue?

@dunkmann00
Copy link
Contributor Author

@parkr Yep, when running jekyll serve --verbose I can see from the Github Metadata log message it is only loading things that are actually used. And this is true when there are github variables set in _config.yml and also when the page is regenerated after it detects a change.

Copy link
Member

@parkr parkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok last question – can we add a test for this? It looks like the code for the --watch flag calls site.process each time: https://github.com/jekyll/jekyll-watch/blob/9c9d2ab3c995784343c7791bc6f02f7c6b2dc1e5/lib/jekyll/watcher.rb#L128

@dunkmann00
Copy link
Contributor Author

Sure I'd be happy to add a test. I'm not sure how to test this though? What did you have in mind?

@parkr
Copy link
Member

parkr commented Feb 11, 2023

Thanks for jumping through these hoops, a test will be really helpful if we can set it up properly. I'd want to:

  1. Run site.process once on the fixture site
  2. Assert that each API call happens, and that it happens once (or the proper # of times)
  3. Run site.process a second time, asserting that no more API calls occur.

@dunkmann00
Copy link
Contributor Author

I'm not very familiar with ruby testing and I can't find anything similar to what you're asking in other tests to use as a kind of guide. I can try to put something together but I really don't know how to make a test to do this. Would this test go in site_github_munger_spec.rb? I think it needs its own context? How do I go about intercepting API calls to count them?

@parkr
Copy link
Member

parkr commented Feb 15, 2023

Run process twice and use expect_api_call time ensure things are called once. https://github.com/jekyll/github-metadata/blob/main/spec/spec_helpers/web_mock_helper.rb

I'm on mobile now but can take a closer look soon!

This commit adds a new fixture site that doesn't load all of
`site.github`. It just loads a few items. We then check that those items
are loaded once after the first and second times the site is processed.

We also use a new spec helper function `not_expect_api_call` to make
sure other API calls are not made. This test checks a few enpoints that
get loaded when the entire `site.github` is loaded. Before adding the
uninject code, these API endpoints would have gotten called, despite not
being used by the fixture site. With the uninject changes, they are no
longer called.
@dunkmann00
Copy link
Contributor Author

Okay made some good progress!

I realized that the tests you were describing weren't going to completely test this new functionality because the point of this isn't to make sure API calls only happen once. That was already happening, so that is useful to test but not the whole picture. The point of these changes was to make sure we don't load everything just because github was used in _config.yml or reset is triggered while jekyll serve is running.

So in this new commit I:

  1. Create a new fixture site that only uses a few site.github attributes
  2. Process the site and check that those endpoints are loaded once
  3. Process the site again and check that those endpoints are still only loaded once
  4. Using a new web_mock_helper.rb function, not_expect_api_call, we check to make sure other endpoints that should not have been called are not called

Let me know what you think.

Copy link
Member

@parkr parkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done, @dunkmann00! I have one comment requesting a comment and another asking your thoughts about ensuring the HTML output is correct after the second site.process call. Other than those two things, this LGTM to ship!

lib/jekyll-github-metadata/repository.rb Show resolved Hide resolved
spec/site_github_munger_spec.rb Show resolved Hide resolved
spec/site_github_munger_spec.rb Outdated Show resolved Hide resolved
This adds a check to make sure the files in the rendered site match
fixture files that are known to have the correct `site.github` data,
after running `site.process` twice.

This also adds code comments explaining the rationale and purpose
behind the different parts of the `uninject` test.
@dunkmann00
Copy link
Contributor Author

@parkr I made those changes! I also added another fixture site, this time it is the rendered version of the new fixture site. I used it for checking if the rendered site is correct.

@ashmaroli ashmaroli dismissed their stale review February 17, 2023 17:15

Not relevant..

@parkr
Copy link
Member

parkr commented Feb 17, 2023

@jekyllbot: merge +minor

(I'm going minor here as a reminder to bump the minor version. It's a bug fix in a way but it's also a slight change in behavior that folks might find surprising.)

@jekyllbot jekyllbot merged commit 2c4f37f into jekyll:main Feb 17, 2023
jekyllbot added a commit that referenced this pull request Feb 17, 2023
@dunkmann00
Copy link
Contributor Author

Awesome! Thanks @parkr and @ashmaroli for helping with and reviewing this! And for all that you do for Jekyll ❤️

@dunkmann00 dunkmann00 deleted the repos-fix branch February 17, 2023 18:06
@parkr
Copy link
Member

parkr commented Feb 17, 2023

Thanks for working through this with us!

@jekyll jekyll locked and limited conversation to collaborators Feb 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants