-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fresh_seq on lazy_sequence #263
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, Ambrose. Intricate changes, so I've been quite particular about these. The sequence stuff is very easy to mess up, so we need to be extra careful with it.
@@ -62,18 +70,16 @@ namespace jank::runtime::obj | |||
|
|||
lazy_sequence_ptr lazy_sequence::next_in_place() | |||
{ | |||
resolve_seq(); | |||
if(sequence) | |||
// sequence is non-nullptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments are documentation, so please always use /* */
and proper capitalization and grammar. See the deleted comment on line 68 for an example.
@@ -62,18 +70,16 @@ namespace jank::runtime::obj | |||
|
|||
lazy_sequence_ptr lazy_sequence::next_in_place() | |||
{ | |||
resolve_seq(); | |||
if(sequence) | |||
// sequence is non-nullptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clarify this comment. I'm a fan of explaining the WHY things clearly, in documentation: https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/src/cpp/jank/codegen/llvm_processor.cpp#L129
/* We know that `sequence` is non-null here because we can only use `next_in_place`
* on a fresh sequence and we'll return null from `fresh_seq` if we don't have a valid
* `sequence`. However, outside of this fn, `sequence` may definitely be null. */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick tangent:
Is it important that lazy_sequence::fresh_seq
returns a lazy_sequence_ptr
? For example, if we returned (a properly nulled) runtime::fresh_seq(sequence)
, we'd only need to visit 1 sequence on each next_in_place
call instead of also the lazy_sequence
itself.
Then this code we're explaining would be unreachable (if we copied the same approach to next
and seq
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, where is the second visit coming from? I'm not seeing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation, say we run this:
(loop [c (fresh-seq (lazy-seq [...]))]
(recur (next-in-place c)))
On the second line, we first visit lazy_sequence::next_in_place
, which then recursively visits persistent_vector::next_in_place
. This happens every time around the loop AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this isn't really a problem in a C++ for
loop using templates, because you directly call ->next_in_place()
on the known sequence type, eliminating the "first" visit. AFAICT this only works if fresh_seq
and next_in_place
have a concrete return type, which conflicts with my idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, it looks like there are some places that visit twice because it doesn't call the member functions directly. e.g., changing to it->next_in_place()
here would skip straight to lazy_sequence::next_in_place()
when iterating a lazy-seq instead of having to visit.
jank/compiler+runtime/src/cpp/jank/hash.cpp
Line 171 in fe6f1dd
for(auto it(fresh_seq(typed_sequence)); it != nullptr; it = next_in_place(it)) |
Maybe seq/next
is more relevant to my observation than the in-place versions, as it's more common in Clojure code, and hence less likely to be able to infer the sequence type to call a member function directly.
For example, if lazy sequence implemented seq/next like this:
object_ptr lazy_sequence::seq() const
{
resolve_seq();
return sequence;
}
object_ptr lazy_sequence::next() const
{
resolve_seq();
return runtime::next(sequence));
}
then code like this wouldn't need to do the double visit at (next c)
because the lazy-seq is gone.
(defn foo [c]
(recur (next c))
(foo (lazy-seq [...]))
I'm not sure whether that would affect C++ template code, but presumably you'd reach for the in-place operations instead.
Even with the optimal iterator, you'd still need 1 visit_seqable
at every step to visit the vector seq if s
is a (lazy-seq [...])
:
for(auto i(s->fresh_seq()); i != nullptr; i = i->next_in_place())
I wonder if tightening the scope of the in-place operations would yield even better results for the optimal C++ case, similar in spirit to transients. For example, if the return of fresh_seq
and next_in_place
were always a non-empty concrete data structure like SeqIterator
, but in this case only exposed member functions first
, next-in-place
, and seq!
that produced a seq (analogous to persistent!
).
;; clojure pseudocode for C++ classes
(deftype SeqIterator [^:unsynchronized-mutable s next-specialized]
(first [_] (first s))
(next-in-place [this] (when-some [s' (next-specialized s)]
(set! s s')
this))
(seq! [_] s))
(deftype LazySeq [sequence]
;; sequence->next-in-place-fn is a `visit_seqable` call that returns
;; [](T sequence) { sequence->next_in_place()};
;; for the given sequence type. This caches the member function
;; so we don't need to visit during next-in-place.
(fresh-seq [_] (->SeqIterator sequence (sequence->next-in-place-fn sequence))))
All this of course assumes this is a problem worth solving, which I'm not sure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can pick this up another time (or maybe this is what you had originally with whatever sequenceptr
was). It's a pretty big change.
Back to your feedback, I went with a different approach for the documentation, more in line with the rest of jank (for example persistent_string_sequence::next_in_place
).
- I went into depth about ownership in the documentation of
sequenceable_in_place
and generally the optimizations you can use innext_in_place
and why. - I added an assertion that
sequence
is non-empty. - I removed all comments in
lazy_sequence::next_in_place
.
It felt off to specifically explain (1) in any one next_in_place
implementation. Is this something you'd accept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #265 for the concern about extra visiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, it looks like there are some places that visit twice because it doesn't call the member functions directly. e.g., changing to
it->next_in_place()
here would skip straight tolazy_sequence::next_in_place()
when iterating a lazy-seq instead of having to visit.
There's no visit in that case. We have an overload for next_in_place
which does the job without visiting: https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/include/cpp/jank/runtime/core/seq.hpp#L41
// seq is consumed, no need to update | ||
//sequence = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not leave commented code in to explain why we don't need it. The comment alone will suffice, if we make it clear enough. This comment is nowhere near clear enough, since the removal of this one line has huge implications.
It's safe to not set sequence
to null, we're saying, because we know that we're in a fresh sequence. Nobody else has a pointer to this sequence. Thus, if we fully consume sequence
and we're done with our current lazy seq, we return null and expect that this sequence will never be used again.
Closes #262
fresh_seq
now returnsnullptr
iflazy-seq
is empty.with-meta
also needed to change since it didn't handlefresh_seq
returningnullptr
.Perf: Takes ownership of inner sequence in the result of
fresh_seq
to enable callingnext_in_place
on the inner seq.I tweaked the
timeout-minutes
to 35m after seeing this build fail to see if it was a transient issue. The next build succeeded in 28m.