-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meta: Improve splitting of multipage output #2552
Conversation
Woah, this is a big change and will cause a lot of incoming links to redirect. I'm not against it necessarily, but my original thinking was that we've have special dev-edition-only splits, since the dev edition is less about "load as much as we can without hanging your browser" (my interpretation of today's multipage) and is more about "give me a reasonable split as if this were a book chapter" (see today's https://developers.whatwg.org/). What do you think? So excited you're taking on the overall issue, BTW!! |
Whichever way we go, I do think that we might want to ape the table of contents at https://developers.whatwg.org/ more closely, e.g. keeping "Introduction" as a single page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked how developers.whatwg.org currently splits, but offer a knee-jerk reaction.
source
Outdated
@@ -24788,7 +24788,7 @@ interface <dfn>HTMLModElement</dfn> : <span>HTMLElement</span> { | |||
|
|||
|
|||
|
|||
<h3 split-filename="embedded-content" id="embedded-content">Embedded content</h3> | |||
<h3 split-filename="img-picture-source" id="embedded-content">Embedded content</h3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not change this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- <h3 split-filename="embedded-content" id="embedded-content">Embedded content</h3> + <h3 split-filename="img-picture-source" id="embedded-content">Embedded content</h3>
I'd prefer to not change this one.
OK, have restored it
source
Outdated
@@ -101535,7 +101537,7 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> { | |||
|
|||
<div w-nodev> | |||
|
|||
<h4><dfn>Tokenization</dfn></h4> | |||
<h4 split-filename="tokenization"><dfn>Tokenization</dfn></h4> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to split up the parsing section? Having named character references split out seems OK but I think the rest should probably be one page, so it's easier to find things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ <h4 split-filename="tokenization"><dfn>Tokenization</dfn></h4>
Do we really want to split up the parsing section? Having named character references split out seems OK but I think the rest should probably be one page, so it's easier to find things.
OK, yeah agreed. Have changed it back to that. I think for the case of the parsing section it makes sense to keep it in one file, even though it ends up being very large.
source
Outdated
@@ -31531,7 +31532,7 @@ interface <dfn>HTMLTrackElement</dfn> : <span>HTMLElement</span> { | |||
</div> | |||
|
|||
<!--TOPIC:Video and Audio--> | |||
<h4>Media elements</h4> | |||
<h4 split-filename="media">Media elements</h4> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this better together with video-and-audio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ <h4 split-filename="media">Media elements</h4>
Isn't this better together with video-and-audio?
That’s another case where the file size excessive with them combined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
@@ -25611,7 +25611,7 @@ the time Maria had stuck her tongue out...</p></pre> | |||
</div> | |||
|
|||
|
|||
<h4>Images</h4> | |||
<h4 split-filename="images">Images</h4> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better together with embedded-content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ <h4 split-filename="images">Images</h4>
Why? With those two together in the same file, the size is 1MB+. Do we not care about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's still about half of the average web page. 😁
I guess we do care but a conflicting goal is to have related things together. I could go either way here though.
Yeah it’s not good to add disruption unless there’s real value to it. In this case, spending time looking at the existing splits, it clear to me at least that they’re not adequate and never really have been. At least not if our goal here is to optimize for the needs of the people choosing to read and use the multipage spec.
It seems strange to me to have the dev spec split out differently than the full spec, but if that’s what we want I am happy to hack something in to wattsi to handle it
Yeah, I personally think that’s not a super-admirable choice we’ve been making for the readers of the spec.
I guess I’m at the point where after having seen a lot of hours in the past put into producing the dev version but not seeing very many people really seeming to make it their version of choice but instead just reading the full spec, I am at the point where I would sorta prefer to make the full spec more usable to everybody (by improving the information design as we did in the link-element section, and by trying harder to not end up in the multipage with multiple 1MB+ files among the split files)—instead of trying to solve some problems for some people by making a secondary version. |
Can you say more? To me most implementers and readers of the full spec are better served by single-page, but that hangs browsers, so it's only usable by those who are willing to take a hit (e.g. people like us who leave it open in a tab pretty constantly). For that audience, the multipage is then basically "singlepage split minimally so that it doesn't hang browsers". So from this point of view, fewer splits are better.
Well, let's definitely discuss a bit more. It may be my perspective on the multipage is off. To me the dev edition is a pretty different beast. It's meant to be more like an online "book", IMO; something like https://doc.rust-lang.org/book/. So to me the splits there represent logical "chapters". Indeed, I have fond memories of reading some HTML 4.01 reference books; if we had a nice developer edition of the spec at that time, I'm sure I would have loved that. Whereas, per my above perspective, the splits on the multipage version are more just about mitigating the file size. I'm curious if anyone else sees things that way.
I think there's room for both. I remember the dev edition being very active, especially around about when I was getting in to HTML. It was definitely where I linked people for many years, until I came to realize it was lagging. It contains just the right content for a web developer to really read it straight though (again, like a book), especially in parts like the HTML element definitions, where it hides away all the implementation algorithms but gives central focus to the semantics and examples and such. I think there's a good niche for us to work on here where we make the dev edition something we're proud to point people to, and people reference often. But yes, we should definitely make the original spec better too!! The link element PR is a great example of that. And the fact that we're building from the same source should help benefit both goals at once. That leaves us with the issue discussed above: IMO 1 MB+ files are not really a problem for original-spec multipage readers. But I'm curious to hear what others think. |
My main perspective on multipage at the moment is that I like that it's stable (the redirects tend to break somehow) and also that I mostly don't use it because dfn.js doesn't work across pages. |
Yeah, agreed that’s a big priority. I’ll make time this weekend to look at finally implementing it |
We seem to be making our own conclusions about who the readers of the spec are, and what they want—without any clear evidence to support the conclusions. Lacking clear evidence, I believe it would be more friendly and accommodating to all readers—in the potentially very broad set of readers we have for the spec—to make the spec follow some general usability best practices for web documents. We know one common best practice is to keep file sizes of documents lower if possible—for a number of reasons, including to make the documents more usable on mobile and in general to reduce the page-load times instead of making readers wait. And we know another common best practice is to when possible, split documents in logical somewhat-discrete chunks that are easier to consume than larger documents with disparate parts.
I think there are lot more readers of the spec than just us and the implementors and other people we know and interact with. I personally think many other people would be much better off reading MDN rather than trying to make their way through the spec to understand the requirements, but from what I can see no matter how great the information at MDN is, there are still a lot of people who turn the reading the spec to get information and so who we should be considering in trying to make the user experience as good as we possibly can for that broad range of people using it.
I don’t know what evidence we have that the audience of people who see multipage as being "singlepage split minimally so that it doesn't hang browsers" is bigger or more important than the audience of general readers who want the spec to be more aligned with common best practices for usability of web documents.
I think it’s reasonable to not see it as a problem but I wonder what the cutoff point on file size actually is. Clearly we consider a file size of 8MB to be too big to not provide an alternative for, because otherwise we wouldn’t be investing in making the multipage version. That said, I’m not sure what the should otherwise be considered too big in multipage. But the split in this PR is proof that if we want, we can get the spec into logical somewhat-discrete chunks that are easier to consume, with the size of most files below 500KB. Below is what the file-size results are after the splits in this PR.
|
|
FWIW, I think you have convinced me, but I also think that maybe we should prioritize fragment-links.js and dfn.js over splitting disruption since they'll become even more of a bottleneck. |
I want to respond to the rest of the thread soon, but before I forget,
I don't think this is super-accurate. We had a brief one-week-ish period where they were broken as we were re-jiggering fragment-links.json, which may have damaged your trust in the redirect system, but apart from that period it seems 100% reliable to me. |
Maybe it's just something in Firefox. https://html.spec.whatwg.org/multipage/#the-worker's-lifetime for instance doesn't work. It's probably something between the address bar doing weird things with special URL code points and the lookup script not decoding them or some such. |
Yeah if we are going to have multipage at all, I think the lack of dfn.js in multipage is a bigger pain point for readers who are trying to actually use multipage as an information tool, closer to the way they can with the single-page version. And thinking about the fact the even-more-common use-case for the single-page version is to do full-text search of the whole spec, I realized it would be pretty useful to also add some kind of full-text search utility to multipage output. So #2565 is an attempt at doing that. |
I guess I am also convinced of the general idea of splitting more. But let me suggest a slightly less aggressive split. I will do so via uploading a patch onto this branch since that's easier to illustrate things, but please feel free to debate it. |
With my patch here is the split:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
2120ca3
to
91a5084
Compare
Some offline discussion with @annevk concluded that we should not merge this until we have multipage dfn.js working (which is in progress!! whatwg/wattsi#46). As such, tagging it as "do not merge yet". |
Now that we got multipage dfn.js merged, seems like this is ready to merge at this point too? |
The current link (https://html.spec.whatwg.org/multipage/forms.html#attr-input-disabled) does not point to an anchor that exists on the page. The term 'disabled' does not appear within `forms.html` at all. I think the correct destination should be https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#attr-fe-disabled. I think this has been incorrect since the multipage splitting was changed in whatwg/html#2552 when `forms.html` was split up and `form-control-infrastructure.html` was introduced.
The current link (https://html.spec.whatwg.org/multipage/forms.html#attr-input-disabled) does not point to an anchor that exists on the page. The term 'disabled' does not appear within `forms.html` at all. I think the correct destination should be https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#attr-fe-disabled. I think this has been incorrect since the multipage splitting was changed in whatwg/html#2552 when `forms.html` was split up and `form-control-infrastructure.html` was introduced.
This addresses whatwg/wattsi#27
wattsi already supports splitting out a separate multipage file from a heading
element at any level, for any heading element with a
split-filename
attribute.So this change adds the
split-filename
attribute to a bunch more headings inthe interest of making the splits more logical and usable and also in the
interest of reducing the file size of some of the larger splits (before this the
change, the forms.html file was 1.1MB, and there were several other files that
were larger than 500KB).