Prevent loading all GH data on reset #245

dunkmann00 · 2023-01-07T01:30:51Z

The way this was setup, when running build or serve with --watch, inspect was called and was forcing everything to load. This could cause a very large number of API calls to GitHub. With these changes GitHubMetadata will no longer trigger all of those calls on reset.

The other scenario where a bunch of calls were being forced to be made was when putting the github key with any sub-keys in _config.yml. This went through every key in GitHubMetadata and loaded it, once again potentially making a very large number of calls. These changes take care of this scenario as well.

I was running into this problem when trying to make changes to a theme using this. Every time I ran tests on the theme it would incur a bunch of API calls and I would quickly get a rate limit error and have to wait some time for it to go away. I saw this was originally brought up as a problem in #237, there was a fix but it didn't totally fix it so this bug was brought up again in #242. I hope these changes are acceptable because it would be really nice to get rid of this problem.

Let me know if there is anything else I can do to help fix this!

dunkmann00 · 2023-01-22T15:32:02Z

Hello! Has anyone had a chance to take a look at this?

@parkr I see you have historically been the one to work on this repo...my apologies in advance if you no longer do!

lib/jekyll-github-metadata/repository.rb

parkr · 2023-01-22T04:23:37Z

lib/jekyll-github-metadata/metadata_drop.rb

@@ -10,6 +10,11 @@ class MetadataDrop < Jekyll::Drops::Drop

      mutable true

+      def initialize(obj, original_config = nil)
+        super(obj)
+        @original_config = original_config


This second argument doesn't seem to do anything? Or is the issue that #inspect is being called during logging and #inspect is therefore triggering all the API calls?

So there are two places that will always trigger loading of all the keys. The first place is the part you commented on below, when the Drop is merged with site.config. But another spot is when when are locally running serve, a change is made, and the site it rebuilt. This is the behavior described in #242.

That is happening because jekyll-watch reprocesses the site each time it observes a change:
https://github.com/jekyll/jekyll-watch/blob/9c9d2ab3c995784343c7791bc6f02f7c6b2dc1e5/lib/jekyll/watcher.rb#L64 ->
https://github.com/jekyll/jekyll-watch/blob/9c9d2ab3c995784343c7791bc6f02f7c6b2dc1e5/lib/jekyll/watcher.rb#L128

This then gets into Jekyll site.process where reset is called:
https://github.com/jekyll/jekyll/blob/ff60d51c9b59d3bb5cc89f6a4f493602b42c5cac/lib/jekyll/site.rb#L77

In reset, Jekyll::Cache.clear_if_config_changed is called:
https://github.com/jekyll/jekyll/blob/ff60d51c9b59d3bb5cc89f6a4f493602b42c5cac/lib/jekyll/site.rb#L118

And in clear_if_config_changed is where config.inspect is getting called:
https://github.com/jekyll/jekyll/blob/ff60d51c9b59d3bb5cc89f6a4f493602b42c5cac/lib/jekyll/cache.rb#L35

And that is the second location that forces loading of all the keys.

So by storing @original_config, we can always pass that back out whenever inspect is called:
https://github.com/dunkmann00/github-metadata/blob/8819b50b5ab143889de75f05204df0cd33e88376/lib/jekyll-github-metadata/metadata_drop.rb#L28

This then avoids all the API calls.

Hello @dunkmann00.
What version of this plugin are you using? To my knowledge, the issue with Jekyll Cache calling config.inspect was fixed once the plugin stopped depending on site.config["github"] from v2.15.0 via a771d0d.
Therefore, only the Drop being cast into a Hash should trigger the resolution of all drop keys.

Just asking for clarity.

Hello @ashmaroli! I am using 2.15.0. The change via the commit you reference fixes the problem on the initial load and render of the site. However, when the site is rebuilt automatically while running jekyll serve, the problem presents itself again.

The original fix waits to inject site.config["github"] until after process and therefore reset is called. But when jekyll-watch notices a change and calls process, site.config["github"] has already been injected. Since it is already injected, we see the original problem occurring once again.

Jekyll::Hooks.register :site, :pre_render do |_site, payload| SiteGitHubMunger.global_munger.inject_metadata!(payload) end

Is there a post_render hook where we could un-inject this?

🥳 That seems like a great idea! I think at one point this thought crossed my mind but I didn't understand enough of how the library worked so I didn't pursue it. But looking again this totally seems doable. I'll update the PR with this change.

parkr · 2023-01-23T05:28:05Z

lib/jekyll-github-metadata/site_github_munger.rb

@@ -35,14 +35,14 @@ def github_namespace
        when nil
          drop
        when Hash
-          Jekyll::Utils.deep_merge_hashes(drop, site.config["github"])
+          drop.merge(site.config["github"])


The real fix seems to be this change, since #deep_merge_hashes uses merge! which resolves all the keys in the Drop. Perhaps instead of this, we could modify the Drop to fall back to make a copy of the original site.config["github"] and use it instead of the computed Drop value. So when a user puts site.github.repo in their Liquid template, it queries the Drop, which first looks at site.config["github"]["repo"], then at Drop#repo as a fallback. Drop#repo would then make the necessary API calls only if called.

Maybe? But that's mostly what is happening with this approach, or so I am pretty sure.

Drop.merge uses merge! ultimately as well, so you are getting the same merge behavior, but instead of running it on every key from the drop, it gets run on every key from site.config["github"].

And that merge puts everything in the mutations hash, and checks that first when the drop is queried.

The way this was setup, when running `build` or `serve` with `--watch`, `inspect` was called and was forcing everything to load. This could cause a very large number of API calls to GitHub. We now uninject the MetadataDrop after rendering is complete. With this change `GitHubMetadata` will no longer trigger all of those calls on reset. The other scenario where a bunch of calls were being forced to be made was when putting the `github` key with any sub-keys in `_config.yml`. This went through every key in `GitHubMetadata` and loaded it, once again potentially making a very large number of calls. These changes take care of this scenario as well.

parkr · 2023-02-05T15:47:27Z

Running the tests. This LGTM otherwise.

In your tests locally, it prevents the issue?

dunkmann00 · 2023-02-06T00:27:16Z

@parkr Yep, when running jekyll serve --verbose I can see from the Github Metadata log message it is only loading things that are actually used. And this is true when there are github variables set in _config.yml and also when the page is regenerated after it detects a change.

parkr

Ok last question – can we add a test for this? It looks like the code for the --watch flag calls site.process each time: https://github.com/jekyll/jekyll-watch/blob/9c9d2ab3c995784343c7791bc6f02f7c6b2dc1e5/lib/jekyll/watcher.rb#L128

dunkmann00 · 2023-02-11T00:42:34Z

Sure I'd be happy to add a test. I'm not sure how to test this though? What did you have in mind?

parkr · 2023-02-11T16:59:11Z

Thanks for jumping through these hoops, a test will be really helpful if we can set it up properly. I'd want to:

Run site.process once on the fixture site
Assert that each API call happens, and that it happens once (or the proper # of times)
Run site.process a second time, asserting that no more API calls occur.

dunkmann00 · 2023-02-13T21:43:17Z

I'm not very familiar with ruby testing and I can't find anything similar to what you're asking in other tests to use as a kind of guide. I can try to put something together but I really don't know how to make a test to do this. Would this test go in site_github_munger_spec.rb? I think it needs its own context? How do I go about intercepting API calls to count them?

parkr · 2023-02-15T01:50:17Z

Run process twice and use expect_api_call time ensure things are called once. https://github.com/jekyll/github-metadata/blob/main/spec/spec_helpers/web_mock_helper.rb

I'm on mobile now but can take a closer look soon!

This commit adds a new fixture site that doesn't load all of `site.github`. It just loads a few items. We then check that those items are loaded once after the first and second times the site is processed. We also use a new spec helper function `not_expect_api_call` to make sure other API calls are not made. This test checks a few enpoints that get loaded when the entire `site.github` is loaded. Before adding the uninject code, these API endpoints would have gotten called, despite not being used by the fixture site. With the uninject changes, they are no longer called.

dunkmann00 · 2023-02-16T05:00:41Z

Okay made some good progress!

I realized that the tests you were describing weren't going to completely test this new functionality because the point of this isn't to make sure API calls only happen once. That was already happening, so that is useful to test but not the whole picture. The point of these changes was to make sure we don't load everything just because github was used in _config.yml or reset is triggered while jekyll serve is running.

So in this new commit I:

Create a new fixture site that only uses a few site.github attributes
Process the site and check that those endpoints are loaded once
Process the site again and check that those endpoints are still only loaded once
Using a new web_mock_helper.rb function, not_expect_api_call, we check to make sure other endpoints that should not have been called are not called

Let me know what you think.

parkr

Nicely done, @dunkmann00! I have one comment requesting a comment and another asking your thoughts about ensuring the HTML output is correct after the second site.process call. Other than those two things, this LGTM to ship!

lib/jekyll-github-metadata/repository.rb

spec/site_github_munger_spec.rb

This adds a check to make sure the files in the rendered site match fixture files that are known to have the correct `site.github` data, after running `site.process` twice. This also adds code comments explaining the rationale and purpose behind the different parts of the `uninject` test.

dunkmann00 · 2023-02-16T20:54:20Z

@parkr I made those changes! I also added another fixture site, this time it is the rendered version of the new fixture site. I used it for checking if the rendered site is correct.

spec/test-site-uninject-rendered/index.html

Not relevant..

parkr · 2023-02-17T17:46:41Z

@jekyllbot: merge +minor

(I'm going minor here as a reminder to bump the minor version. It's a bug fix in a way but it's also a slight change in behavior that folks might find surprising.)

dunkmann00 · 2023-02-17T18:06:00Z

Awesome! Thanks @parkr and @ashmaroli for helping with and reviewing this! And for all that you do for Jekyll ❤️

parkr · 2023-02-17T18:22:34Z

Thanks for working through this with us!

Make sure repos is always an array

9fa02ad

ashmaroli requested a review from parkr January 22, 2023 16:57

parkr reviewed Jan 23, 2023

View reviewed changes

dunkmann00 requested a review from parkr January 26, 2023 21:29

dunkmann00 force-pushed the repos-fix branch from 8819b50 to 2c265a5 Compare January 31, 2023 21:37

parkr reviewed Feb 6, 2023

View reviewed changes

dunkmann00 force-pushed the repos-fix branch from 20bbc52 to eced31d Compare February 16, 2023 04:45

parkr approved these changes Feb 16, 2023

View reviewed changes

lib/jekyll-github-metadata/repository.rb Show resolved Hide resolved

spec/site_github_munger_spec.rb Show resolved Hide resolved

spec/site_github_munger_spec.rb Outdated Show resolved Hide resolved

Fix for Rubocop

39e44c3

This was referenced Feb 16, 2023

Allow Rubocop versions above 1.18.x? jekyll/rubocop-jekyll#23

Closed

Release v0.13.0 jekyll/rubocop-jekyll#30

Closed

ashmaroli previously requested changes Feb 17, 2023

View reviewed changes

spec/test-site-uninject-rendered/index.html Show resolved Hide resolved

jekyllbot merged commit 2c4f37f into jekyll:main Feb 17, 2023

jekyllbot added the enhancement label Feb 17, 2023

jekyllbot added a commit that referenced this pull request Feb 17, 2023

Update history to reflect merge of #245 [ci skip]

388a7fe

dunkmann00 deleted the repos-fix branch February 17, 2023 18:06

jekyll locked and limited conversation to collaborators Feb 17, 2024

jekyllbot added the frozen-due-to-age label Feb 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent loading all GH data on reset #245

Prevent loading all GH data on reset #245

dunkmann00 commented Jan 7, 2023

dunkmann00 commented Jan 22, 2023

parkr Jan 22, 2023

dunkmann00 Jan 23, 2023

ashmaroli Jan 23, 2023

dunkmann00 Jan 23, 2023

parkr Jan 30, 2023

dunkmann00 Jan 31, 2023

parkr Jan 23, 2023

dunkmann00 Jan 23, 2023

parkr commented Feb 5, 2023

dunkmann00 commented Feb 6, 2023

parkr left a comment

dunkmann00 commented Feb 11, 2023

parkr commented Feb 11, 2023

dunkmann00 commented Feb 13, 2023

parkr commented Feb 15, 2023

dunkmann00 commented Feb 16, 2023

parkr left a comment

dunkmann00 commented Feb 16, 2023

parkr commented Feb 17, 2023

dunkmann00 commented Feb 17, 2023

parkr commented Feb 17, 2023

Prevent loading all GH data on reset #245

Prevent loading all GH data on reset #245

Conversation

dunkmann00 commented Jan 7, 2023

dunkmann00 commented Jan 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parkr commented Feb 5, 2023

dunkmann00 commented Feb 6, 2023

parkr left a comment

Choose a reason for hiding this comment

dunkmann00 commented Feb 11, 2023

parkr commented Feb 11, 2023

dunkmann00 commented Feb 13, 2023

parkr commented Feb 15, 2023

dunkmann00 commented Feb 16, 2023

parkr left a comment

Choose a reason for hiding this comment

dunkmann00 commented Feb 16, 2023

parkr commented Feb 17, 2023

dunkmann00 commented Feb 17, 2023

parkr commented Feb 17, 2023