-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: depth limited refs -r #5337
Conversation
Note: i am missing some sharness tests still. Will do that when we have a final approach. |
core/commands/refs.go
Outdated
// true otherwise. The second return argument indicates whether the Cid was seen | ||
// before. | ||
func (rw *RefWriter) visit(c *cid.Cid, depth int) (bool, bool) { | ||
if rw.seen == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe disabling unique by default is a memory optimization. If it's disabled, we should avoid using memory linear in the number of keys. We should probably have check for the Unique
flag up-top and, in that case, only check the depth (don't store anything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also be simpler to have one boolean indicate that we should continue traversing and the other indicate that we should return the CID to the user. That may simplify some of the other logic. It'll also allow us to say "return this but don't traverse it which, unless I'm mistaken, should save us a bit of work (possibly saving us from fetching a node we don't need to traverse).
I believe this command came before the EnumerateChildren functions. We could probably re-use that logic (although we should probably use the non-async one unless the user passes some |
0a0ae24
to
f2c84c3
Compare
|
core/commands/refs.go
Outdated
return 0, nil | ||
// visit returns two values: | ||
// - first indicates if we should keep traversing the DAG. | ||
// - second indicates if the given Cid should be printed to the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"indicates" is ambiguous. In the first case, it means "is true" and, in the second case, "is false".
Personally, I'd rather:
- Say "is true" (explicitly).
- Invert the second case (i.e., true means print).
core/commands/refs.go
Outdated
nc := n.Cid() | ||
|
||
var count int | ||
for i, ng := range ipld.GetDAG(rw.Ctx, rw.DAG, n) { | ||
lc := n.Links()[i].Cid | ||
if rw.skip(lc) { | ||
continue | ||
unexplored, written := rw.visit(lc, depth+1) // The children are at depth+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"written" isn't quite correct. Really, it means "don't write" (we may not have already written it, unless I'm mis-reading the code).
Note: if we invert this case, this'll obviously become "shouldWrite" or something like that. I'm just dropping a comment here so we don't miss it.
core/commands/refs.go
Outdated
// We do not track a set of visited nodes in this case. | ||
// We do not print anything too deep though. | ||
if !rw.Unique { | ||
return !overMaxDepth, overMaxDepth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to not explore nodes at max depth as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overMaxDepth is > MaxDepth, so we explore at maxDepth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I misread the double negation. Right now I think it works as it should i.e.
--max-depth 1 prints just the direct children of the given CID, without fetching them (as they're links from the parent and we know we can't go deeper). Thus, items at maxdepth are never dag.Get(), but they are visited and potentially printed.
core/commands/refs.go
Outdated
// Never explore over max-depth. Never print nodes over | ||
// max depth. | ||
if overMaxDepth { | ||
return false, false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about exploring nodes at max depth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shouldn't this check be higher up? No need to do anything if we're over the max depth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct.
} | ||
|
||
if !recursive { | ||
maxDepth = 1 // write only direct refs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 0
? If I'm not mistaken, this patch will currently explore two levels when recursive is disabled. If that's the case, can we also write a sharness test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will write extra sharness tests. But refs
, as it is, prints the references from the root, that is, it prints things with maxDepth = 1. Root is 0, its children are 1. Refs doesn't print the root CID. That's why the existing tests pass, behaviour hasn't changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This looks correct.
We can implement new features later (e.g., when we need them). |
core/commands/refs.go
Outdated
rw.seen = cid.NewSet() | ||
// Never explore over max-depth. Never print nodes over | ||
// max depth. | ||
if overMaxDepth { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put this just below if !rw.Unique {..}
core/commands/refs.go
Outdated
// - We saw it higher (smaller depth) in the DAG (means we must have | ||
// explored deep enough before) | ||
if ok && (rw.MaxDepth < 0 || oldDepth <= depth) { | ||
return false, true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be return false, false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, docs are not clear. true means that the CID was printed before, which is the case. I will take @Stebalien proposal though and invert.
4d49f37
to
6052f33
Compare
I have clarified the doc the comments and inverted the second return value (and moved up a check). As said, it does print items at MaxDepth, but not "over max depth". If we're are good with the code so far, I'll proceed to create some sharness tests (already did a fair amount of manual testing). |
Cluster would benefit from async, faster |
Sorry, it's late. It does dag.Get() items at MaxDepth. But I'm thinking this is what we want, because we want to use refs -r not only to print but to fetch blocks. So if it's printed it should be fetched (?) |
I was under the impression that we didn't get all nodes (e.g., we don't need to fetch raw leaves as we know they won't have links). However, it turns out this isn't the case. Given how we tend to use this, I think it's reasonable to actually fetch all the nodes. In the future, we can add an option that avoids this. |
Actually, I'm pretty sure it'll fetch the children as well.
|
6e0f86d
to
ac5bea2
Compare
@Stebalien ok, another round. I realized I was ignoring the optimization that justified using promises: we do not need to do
So if I'm not wrong (again) this should:
fingers crossed |
ac5bea2
to
d1be006
Compare
core/commands/refs.go
Outdated
return count, err | ||
// Avoid "Get()" on the node. We did a Get on it before | ||
// (we printed it) and must not go deeper. This is an | ||
// optimization for pruned branches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is technically incorrect, unless I'm mistaken. We have already printed the node and/or we've pruned the branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to rewrite it but I don't fully understand and I think it's AND only. We have already printed the node AND we've pruned the branch.
If we have not printed the node, we need to Get() it because it means we haven't seen it before, even if we are not going deeper (because we hit the MaxDepth for example).
If we are going deeper but we already printed the node, we need to Get() it to be able to make the recursive call (I think this is the case when, given a depth limit, we encounter an already explored branch higher in the tree, thus we can explore it deeper despite part of it already being printed).
So it has to be !shouldPrint && !goDeeper
.
core/commands/refs.go
Outdated
} | ||
|
||
// We must write it because it's new, or go deeper. In any case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/write/get
core/commands/refs.go
Outdated
nd, err := ng.Get(rw.Ctx) | ||
if err != nil { | ||
return count, err | ||
} | ||
count++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we only count it if we print it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sgtm
// or is lower than last time. | ||
// We print if it was not seen. | ||
rw.seen[key] = depth | ||
return !atMaxDepth, !ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments in this function are awesome.
❤️
@Stebalien another round. I think I'll write the sharness tests next. |
LGTM! (modulo tests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too (-tests)
ae896f4
to
35a02ff
Compare
This adds --max-depth to the "refs" commands and allows limiting the fetching of refs per depth. Other than that, it works as before. Note that clever branch pruning is only made when the --unique flag is passed. Otherwise, we re-explore branches to the given depth. This means that --unique costs memory, but may save time when the DAGs contain the same sub-DAGs in several places (specially if they are big). On the other side, not using --unique saves memory but may involve re-exploring large sub-DAGs. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
35a02ff
to
fe89e2e
Compare
Thanks @Stebalien @magik6k . I have added some sharness tests now (last commit). |
Jenkins passes and @magik6k has already reviewed (modulo tests). 🚅 |
This adds --max-depth to the "refs" commands and allows limiting
the fetching of refs per depth. Other than that, it works as before.
Note that clever branch pruning is only made when the --unique flag
is passed. Otherwise, we re-explore branches to the given depth.
First minimal approach. I wonder if we could utilize the EnumerateChildren functions here instead, @Stebalien ? Doing the printing inside the custom visit function. I'm not sure if there are reasons why the DAG traversal logic was re-implemented separately for this command.
Also, now or later, I would like to do an
--async
version of this, so it would be very easy if we re-use EnumerateChildrenAsync().