-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/html: add Node methods Ancestors, ChildNodes, Descendants #62113
Comments
CC @neild |
I've been using an external library with this, and it's nice. It's also handy to have Node.Children(), which only returns the immediate children of a Node. |
We should be careful with names here and stick to the DOM as closely as possible (if only to avoid confusing anyone hopping back and forth between Go and JS). Specifically, https://developer.mozilla.org/en-US/docs/Web/API/Element/children is all child nodes that are elements. It sounds like this would be closer to https://developer.mozilla.org/en-US/docs/Web/API/Node/childNodes I would be happy to have both |
Reading the docs, it seems like the difference between Another iterator I use is |
Sorry, for being unclear. I have no issue with |
I opened golang/net#215. |
Change https://go.dev/cl/594195 mentions this issue: |
This proposal has been added to the active column of the proposals project |
To bikeshed a little:
|
In JavaScript, this is called "closest" and takes a query selector. Having it as an iterator lets you build other tools on top of it that take selectors and return the first matching node or all matching nodes. I don't think x/net/html needs to be that kind of opinionated library, but it's nice to have the obvious iterators already written in it, so you don't have to jump back and forth as much. |
The APIs being suggested are:
I see the distinction in Mozilla docs about Children vs ChildNodes, but really, we already called the field 'FirstChild' not 'FirstChildNode', so we don't need to introduce 'Nodes' in the name. It should be Children. But Children is immediate children (not grandchildren etc), while what is called Parents is all ancestors (not just the immediate parent), and that inconsistency is more confusing. So Parents should probably be Ancestors. And then All can be Descendants. So that would be:
Thoughts? |
Modulo casing, |
@jimmyfrasche To be clear, are you suggesting that |
Correct. |
I think for some reason "child nodes" to me is less ambiguous and just obviously direct children, while "children" is ambiguous and could be just direct children or it could be all descendants. I don't know I feel this way since semantically it's not very different. So I am also in favor of Ancestors(), ChildNodes(), Descendants(). |
Change https://go.dev/cl/606595 mentions this issue: |
@alandonovan started a separate CL, I think because he didn't notice that I had already opened golang/net#215. I noticed one important implementation difference which is worth discussing. I have
Alan has
MDN says In my experience working with node iterators in both JavaScript and Go, starting with the node itself is more consistent and useful than behavior which starts with the parent. I also return empty sequences when doing nil.ChildNodes() or nil.Descendants(), whereas Alan panics. I think an empty sequence is more useful. |
Indeed; sorry about that. Though perhaps at least it raised some useful questions.
"Ancestors" in its ordinary meaning is not a reflexive relation, so I think the reflexive behavior could potentially be confusing. Perhaps the behavior is right but the name is not; but I don't have a better naming suggestion. We could amend the doc to:
Is there precedent for treating a nil *Node in this way? None of the the existing three methods on *Node accept nil, but nor do they demonstrate a position on whether to quietly swallow nil. My inclination would be to panic. |
As I think about it more, I think panicking makes sense because it's like a deference and usually a sign of a programmatic error. It does make filtering code simpler because you don't have to do "if n != nil { filter(n.Whatever() }", but it also seems like it could cover up things where n is nil but you didn't realize it. |
Can you give some examples of when it's useful, just to help us understand this intuition? Thanks! It looks like Dom Children does not include the current node, so would you argue for Descendants not being reflexive but Ancestors being reflexive? |
I've used el.closest to for example get labels on a node and parents to report to analytics. So you might click on a button and it sees For |
I changed my project to panic on iterating nil Nodes and the tests still passed with very minor changes (some ToString functions needed a check to return ""), so I think it's fine in the real world. |
I think it's easier just to say none of the three are reflexive. // Closest traverses the node and its parents until it finds a node that matches.
func Closest(n *html.Node, match func(*html.Node) bool) *html.Node {
if match(n) {
return n
}
for p := range n.Ancestors() {
if match(p) {
return p
}
}
return nil
} |
Great, making them non-reflexive seems fine. So it seems like we are back to #62113 (comment): Ancestors, Children (all kinds of nodes, not just element nodes), and Descendants. Do I have that right? |
I strongly object to I do agree that it is a bad name—but it is simply the name of the thing being named here. Giving this a name that exists in the same domain but means a related but different thing is going to cause confusion. It would be better to drop the iterator entirely than to name it
|
I don't feel as strongly about it, but I do still feel that "ChildNodes" is more obviously just direct children to me, for some reason. Either is fine though. |
At the very least, there should be something in the docs but even that's hard to articulate. You'd have to find some way to say that "Go(Children) = DOM(childNodes); if you want DOM(children), use Go(Children) then xiter.Filter out the nodes that aren't elements". |
The symmetry of the Go names is nice, but the DOM is the spec here, and Children does correspond exactly to childNodes. So I would be ok with ChildNodes, Ancestors, Descendants, all non-reflexive. |
In addition to what @adonovan said, given that the DOM has both
Where none of the functions return |
I think you meant to say ChildNodes here. Otherwise yes. |
Whoops, yes. Edited my above post. |
Have all remaining concerns about this proposal been addressed? The proposal is to add the following methods to x/net/html: func (n *Node) Ancestors() iter.Seq[*Node]
func (n *Node) ChildNodes() iter.Seq[*Node]
func (n *Node) Descendants() iter.Seq[*Node] Where none of the methods return n itself and all panic if n == nil. |
Do we need to put in a recursion limit to protect against malicious HTML or just assume it will crash on its own? |
The recursion in |
Based on the discussion above, this proposal seems like a likely accept. The proposal is to add the following methods to x/net/html: func (n *Node) Ancestors() iter.Seq[*Node]
func (n *Node) ChildNodes() iter.Seq[*Node]
func (n *Node) Descendants() iter.Seq[*Node] Where none of the methods return n itself and all panic if n == nil. |
Seems like the CL got merged before the waiting period was over. |
Sorry, that was probably my fault. I didn't double check to see whether the proposal was complete. If there are complications with the proposal we will revert. |
No change in consensus, so accepted. 🎉 The proposal is to add the following methods to x/net/html: func (n *Node) Ancestors() iter.Seq[*Node]
func (n *Node) ChildNodes() iter.Seq[*Node]
func (n *Node) Descendants() iter.Seq[*Node] Where none of the methods return n itself and all panic if n == nil. |
Proposal is accepted and was already implemented, so closing. |
This is assuming #61405 is accepted.
Iterating through all the children of an html.Node is tedious. The current code has this example of a recursive function printing out links:
It would be much nicer with an iterator:
The text was updated successfully, but these errors were encountered: