-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent loading all GH data on reset #245
Conversation
Hello! Has anyone had a chance to take a look at this? @parkr I see you have historically been the one to work on this repo...my apologies in advance if you no longer do! |
@@ -10,6 +10,11 @@ class MetadataDrop < Jekyll::Drops::Drop | |||
|
|||
mutable true | |||
|
|||
def initialize(obj, original_config = nil) | |||
super(obj) | |||
@original_config = original_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second argument doesn't seem to do anything? Or is the issue that #inspect
is being called during logging and #inspect
is therefore triggering all the API calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are two places that will always trigger loading of all the keys. The first place is the part you commented on below, when the Drop
is merged with site.config
. But another spot is when when are locally running serve
, a change is made, and the site it rebuilt. This is the behavior described in #242.
That is happening because jekyll-watch
reprocesses the site each time it observes a change:
https://github.com/jekyll/jekyll-watch/blob/9c9d2ab3c995784343c7791bc6f02f7c6b2dc1e5/lib/jekyll/watcher.rb#L64 ->
https://github.com/jekyll/jekyll-watch/blob/9c9d2ab3c995784343c7791bc6f02f7c6b2dc1e5/lib/jekyll/watcher.rb#L128
This then gets into Jekyll
site.process
where reset
is called:
https://github.com/jekyll/jekyll/blob/ff60d51c9b59d3bb5cc89f6a4f493602b42c5cac/lib/jekyll/site.rb#L77
In reset
, Jekyll::Cache.clear_if_config_changed
is called:
https://github.com/jekyll/jekyll/blob/ff60d51c9b59d3bb5cc89f6a4f493602b42c5cac/lib/jekyll/site.rb#L118
And in clear_if_config_changed
is where config.inspect
is getting called:
https://github.com/jekyll/jekyll/blob/ff60d51c9b59d3bb5cc89f6a4f493602b42c5cac/lib/jekyll/cache.rb#L35
And that is the second location that forces loading of all the keys.
So by storing @original_config
, we can always pass that back out whenever inspect
is called:
https://github.com/dunkmann00/github-metadata/blob/8819b50b5ab143889de75f05204df0cd33e88376/lib/jekyll-github-metadata/metadata_drop.rb#L28
This then avoids all the API calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @dunkmann00.
What version of this plugin are you using? To my knowledge, the issue with Jekyll Cache calling config.inspect
was fixed once the plugin stopped depending on site.config["github"]
from v2.15.0 via a771d0d.
Therefore, only the Drop being cast into a Hash should trigger the resolution of all drop keys.
Just asking for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @ashmaroli! I am using 2.15.0
. The change via the commit you reference fixes the problem on the initial load and render of the site. However, when the site is rebuilt automatically while running jekyll serve
, the problem presents itself again.
The original fix waits to inject site.config["github"]
until after process
and therefore reset
is called. But when jekyll-watch
notices a change and calls process
, site.config["github"]
has already been injected. Since it is already injected, we see the original problem occurring once again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jekyll::Hooks.register :site, :pre_render do |_site, payload|
SiteGitHubMunger.global_munger.inject_metadata!(payload)
end
Is there a post_render hook where we could un-inject this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳 That seems like a great idea! I think at one point this thought crossed my mind but I didn't understand enough of how the library worked so I didn't pursue it. But looking again this totally seems doable. I'll update the PR with this change.
@@ -35,14 +35,14 @@ def github_namespace | |||
when nil | |||
drop | |||
when Hash | |||
Jekyll::Utils.deep_merge_hashes(drop, site.config["github"]) | |||
drop.merge(site.config["github"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real fix seems to be this change, since #deep_merge_hashes
uses merge!
which resolves all the keys in the Drop
. Perhaps instead of this, we could modify the Drop
to fall back to make a copy of the original site.config["github"]
and use it instead of the computed Drop value. So when a user puts site.github.repo
in their Liquid template, it queries the Drop, which first looks at site.config["github"]["repo"]
, then at Drop#repo
as a fallback. Drop#repo
would then make the necessary API calls only if called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? But that's mostly what is happening with this approach, or so I am pretty sure.
Drop.merge
uses merge! ultimately as well, so you are getting the same merge behavior, but instead of running it on every key from the drop, it gets run on every key from site.config["github"]
.
And that merge puts everything in the mutations hash, and checks that first when the drop is queried.
The way this was setup, when running `build` or `serve` with `--watch`, `inspect` was called and was forcing everything to load. This could cause a very large number of API calls to GitHub. We now uninject the MetadataDrop after rendering is complete. With this change `GitHubMetadata` will no longer trigger all of those calls on reset. The other scenario where a bunch of calls were being forced to be made was when putting the `github` key with any sub-keys in `_config.yml`. This went through every key in `GitHubMetadata` and loaded it, once again potentially making a very large number of calls. These changes take care of this scenario as well.
Running the tests. This LGTM otherwise. In your tests locally, it prevents the issue? |
@parkr Yep, when running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok last question – can we add a test for this? It looks like the code for the --watch
flag calls site.process
each time: https://github.com/jekyll/jekyll-watch/blob/9c9d2ab3c995784343c7791bc6f02f7c6b2dc1e5/lib/jekyll/watcher.rb#L128
Sure I'd be happy to add a test. I'm not sure how to test this though? What did you have in mind? |
Thanks for jumping through these hoops, a test will be really helpful if we can set it up properly. I'd want to:
|
I'm not very familiar with ruby testing and I can't find anything similar to what you're asking in other tests to use as a kind of guide. I can try to put something together but I really don't know how to make a test to do this. Would this test go in |
Run process twice and use I'm on mobile now but can take a closer look soon! |
This commit adds a new fixture site that doesn't load all of `site.github`. It just loads a few items. We then check that those items are loaded once after the first and second times the site is processed. We also use a new spec helper function `not_expect_api_call` to make sure other API calls are not made. This test checks a few enpoints that get loaded when the entire `site.github` is loaded. Before adding the uninject code, these API endpoints would have gotten called, despite not being used by the fixture site. With the uninject changes, they are no longer called.
20bbc52
to
eced31d
Compare
Okay made some good progress! I realized that the tests you were describing weren't going to completely test this new functionality because the point of this isn't to make sure API calls only happen once. That was already happening, so that is useful to test but not the whole picture. The point of these changes was to make sure we don't load everything just because So in this new commit I:
Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done, @dunkmann00! I have one comment requesting a comment and another asking your thoughts about ensuring the HTML output is correct after the second site.process
call. Other than those two things, this LGTM to ship!
This adds a check to make sure the files in the rendered site match fixture files that are known to have the correct `site.github` data, after running `site.process` twice. This also adds code comments explaining the rationale and purpose behind the different parts of the `uninject` test.
@parkr I made those changes! I also added another fixture site, this time it is the rendered version of the new fixture site. I used it for checking if the rendered site is correct. |
@jekyllbot: merge +minor (I'm going minor here as a reminder to bump the minor version. It's a bug fix in a way but it's also a slight change in behavior that folks might find surprising.) |
Awesome! Thanks @parkr and @ashmaroli for helping with and reviewing this! And for all that you do for Jekyll ❤️ |
Thanks for working through this with us! |
The way this was setup, when running
build
orserve
with--watch
,inspect
was called and was forcing everything to load. This could cause a very large number of API calls to GitHub. With these changesGitHubMetadata
will no longer trigger all of those calls on reset.The other scenario where a bunch of calls were being forced to be made was when putting the
github
key with any sub-keys in_config.yml
. This went through every key inGitHubMetadata
and loaded it, once again potentially making a very large number of calls. These changes take care of this scenario as well.I was running into this problem when trying to make changes to a theme using this. Every time I ran tests on the theme it would incur a bunch of API calls and I would quickly get a rate limit error and have to wait some time for it to go away. I saw this was originally brought up as a problem in #237, there was a fix but it didn't totally fix it so this bug was brought up again in #242. I hope these changes are acceptable because it would be really nice to get rid of this problem.
Let me know if there is anything else I can do to help fix this!