Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document.open() URL updates can cause recursive loading problems on history traversal #6556

Open
domenic opened this issue Apr 5, 2021 · 7 comments · May be fixed by #6649
Open

document.open() URL updates can cause recursive loading problems on history traversal #6556

domenic opened this issue Apr 5, 2021 · 7 comments · May be fixed by #6649

Comments

@domenic
Copy link
Member

domenic commented Apr 5, 2021

In #3885 we noticed that, back then, document.open() had the strange behavior that it updated document's URL to the entry document's URL, but it did not update the corresponding session history entry (SHE). This meant that there could be situations where the document's URL is not equal to its corresponding SHE's URL, which is very strange and led to some bad behaviors detailed in that issue.

/cc @rniwa @TimothyGu @smaug---- who were involved at the time.

We changed the spec in #3946. At the time we believed Safari matched the updated spec, and Firefox and Chromium were happy to follow the updated spec.

Chromium only recently implemented this (/cc @natechapin). @csreis found that this causes a problem, however, in crbug.com/1189026:

  1. Visit a page with an iframe
  2. frames[0].document.write("foo") will update that iframe's URL to the outer page
  3. Navigate the tab (outer frame) to a new URL
  4. Go back.

The result is that you load the outer page, with itself in an iframe. In Chromium this is particularly bad because our recursive iframe prevention doesn't work on history traversals, and also somehow our bfcache doesn't kick in. You can thus set up scenarios where for every history traversal that visits the entry, you get yet another nested iframe.

Note that Firefox and Safari do not seem to implement this part of the spec; they restore the original URL, not the URL that was set by document.open() updating the document's URL/SHE's URL. So this means our testing of Safari back in 2018 was incomplete.

This seems like an unintended, but somewhat predictable, consequence of our efforts to align document's URL with SHE's URL. Ultimately if you set an iframe's SHE's URL to the entry document's URL, you are setting yourself up for recursion upon history traversal (or session restore, e.g. closing and reopening the tab).

So, we have a few choices on where to go from here:

  • Accept this problem as the cost of keeping SHE's URL and document's URL in sync. We're not yet sure if this is feasible in Chromium; in particular, the change hasn't hit stable yet, and once it does, we might be forced to mitigate (somehow) if these recursive frames break enough sites.

  • Accept this problem as the cost of keeping SHE's URL and document's URL in sync, but beef up recursive frame detection in Chromium. (And possibly in other browsers, if they have similar limitations.) It's not clear this would be much better; in particular recursive frame detection just refuses to load the iframe, which might be just as breaking to sites.

  • Revert back to a world where SHE's URL is not always equal to the corresponding document's URL. Maybe try to solve some of the issues identified in document.open() can make a document's URL go out of sync with its history entry #3885 in another way.

@domenic
Copy link
Member Author

domenic commented Apr 7, 2021

Another idea:

  • Stop document.open() from running the URL and history change steps entirely. Instead, introduce some "base URL override" concept which document.open() sets, which only affects base URL computation. (Kind of an "invisible <base>".)

This is based on the guess that the reason document.open() modifies the URL is so that when you do

<iframe></iframe>
<script>
frames[0].document.write('<img src="foo.jpg">');
</script>

this resolves foo.jpg relative to the caller URL, instead of relative to about:blank. We could make this case work by only monkeying with the base URL computation for the iframe document, instead of overwriting the document's URL itself.

Not sure if this is a good idea...

@csreis
Copy link

csreis commented Apr 7, 2021

I would be a big fan of not updating the document's URL at all after document.open. In this case where one frame calls document.open on another frame, it seems like the target frame is never actually expected to load the calling frame's URL, either at the time or when you leave and come back. That has always made the URL change puzzling to me.

Your suggestion to somehow override the base URL might be a nice way to solve the relative URL loading case you mention. I agree that there may be other implications we haven't considered yet, though.

Do others know if there are additional reasons the URL changes?

@domenic
Copy link
Member Author

domenic commented Apr 8, 2021

I agree that there may be other implications we haven't considered yet, though.

Do others know if there are additional reasons the URL changes?

So this largely comes down to "when does the spec check URL vs. base URL". The short answer is: it uses base URL to resolve relative URLs, and uses URL otherwise.

Here are some examples:

  • location.href uses the document's URL. So, with the current spec, after document.write()ing into the iframe, iframe.contentWindow.location.href will equal to the caller URL. With the base URL-change spec, it will stay on its original value (e.g. about:blank).

  • history.pushState() uses the document's URL to determine if the URL change is allowed. So, with the current spec, after document.write()-ing into the iframe, it can call history.pushState("/foo.html") and that will work. With the base URL-change spec, it would not work, since "/foo.html" is not same-origin to about:blank.

  • <meta refresh="5"> (with no URL=) uses the URL. So, with the current spec, inserting <meta refresh="5"> would refresh the iframe to the caller's URL after 5 seconds. With the base URL-change spec, it would refresh the iframe to its original URL after 5 seconds.

  • <form> with no action="" uses the URL. So, with the current spec, inserting <form> and submitting it would submit to the caller's URL. With the base URL-change spec, it would submit to the iframe's original URL.

  • COOP reporting uses the document's URL. So, with the current spec, a document.write()-en iframe accessing a popup that COOP prevents access to would generate a COOP report with the caller's URL as the culprit. With the base URL-change spec, it would generate a COOP report with the iframe's original URL.

My tentative conclusion is that changing to a base URL-override model would have potential compat issues, but perhaps they wouldn't be terrible, and so they might be worth the benefit of having a more reasonable model.

@csreis
Copy link

csreis commented Apr 9, 2021

Thanks, and hmm. I could imagine getting bug reports for the pushState and form submission cases. It seems plausible that sites are using pushState within document.write content, and that pages might put a form in an about:blank iframe and expect to receive the form submission. Maybe it's possible to collect data on whether that's actually happening?

The other cases seem less concerning at first glance. location.href seems more sensible to keep as the original URL, at least to me (and it seems somewhat unlikely that sites would depend on it changing after document.open). I would be surprised if authors used document.write to use meta refresh to intentionally load the caller's URL. Not sure whether the COOP reporting change matters or not; it doesn't seem like there's an obvious answer for which of the two URLs should be reported, so it may not matter.

@domenic
Copy link
Member Author

domenic commented May 4, 2021

We've run into a compat problem with this change and are working on a revert. Given the lack of engagement here probably the best path is to just revert to the previous spec and let the history entry's URL and the document's URL get out of sync.

What are the consequences of this? Here's what I've found so far:

  • Usages of session history entry's URL:

    • It's of course used in "traverse the history"

    • We should add a general warning that this can get out of sync with the corresponding document's URL, and probably discourage specs from using it except in the specific context of session history traversal.

    • history.pushState() and history.replaceState() with no URL use the current session history entry's URL (not the document's URL). We should test which browsers do and either update the spec to use document URL, or add a note that we're violating the above guidance.

    • This may impact Should navigating to the current URL preserve history.state? #6213

  • Reloads

    • Reloads generally seem to be specified by calling the navigate algorithm with the document's current URL, and historyHandling set to "reload".

    • This means that we'll create a new document by fetching the document's current URL, but then overwrite that document's URL with the session history entry's URL. I.e. the mismatch will carry over.

    • Maybe this is how browser behave, maybe not. To be tested.

@domenic
Copy link
Member Author

domenic commented May 4, 2021

OK, I wrote some tests:

domenic added a commit to web-platform-tests/wpt that referenced this issue May 4, 2021
domenic added a commit that referenced this issue May 4, 2021
Closes #6556. In particular, reverts document.open() to only update the document's URL, and not the session history entry's URL, like it did before ae7cf0c. Now that they can mismatch, we need to audit the cases where this might be important, which leads to the following changes:

* Changes location.reload() to reload the current session history entry's URL, instead of the document's URL. This ensures that post-document.open() reload behavior is aligned with WebKit and Gecko, as tested by https://wpt.fyi/results/html/webappapis/dynamic-markup-insertion/opening-the-input-stream/reload.window.html.

* Changes history.pushState()/history.replaceState() with no URL argument to default to the document's URL, instead of the current session history's URL. This ensures that post-document.open() pushState()/replaceState() behavior is aligned with all engines, as tested by web-platform-tests/wpt#28826.

This also modernizes and makes a bit more precise the location.reload() method steps. The user-initiated reload steps remain vague; #6600 will tackle those.
@domenic domenic linked a pull request May 4, 2021 that will close this issue
3 tasks
@hsivonen
Copy link
Member

hsivonen commented Sep 7, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants