Fix fresh_seq on lazy_sequence #263

frenchy64 · 2025-02-12T05:34:51Z

Closes #262

fresh_seq now returns nullptr if lazy-seq is empty.

with-meta also needed to change since it didn't handle fresh_seq returning nullptr.

Perf: Takes ownership of inner sequence in the result of fresh_seq to enable calling next_in_place on the inner seq.

I tweaked the timeout-minutes to 35m after seeing this build fail to see if it was a transient issue. The next build succeeded in 28m.

$ ./build/jank repl
Bottom of clojure.core
clojure.core=> (reduce (fn [a b] :c) :a (lazy-seq))
:a
clojure.core=> (reduce (fn [a b] :c) :a (map identity []))
:a
clojure.core=> (with-meta (lazy-seq) {})
()

jeaye

Nice work, Ambrose. Intricate changes, so I've been quite particular about these. The sequence stuff is very easy to mess up, so we need to be extra careful with it.

jeaye · 2025-02-12T21:25:49Z

compiler+runtime/src/cpp/jank/runtime/obj/lazy_sequence.cpp

@@ -62,18 +70,16 @@ namespace jank::runtime::obj

  lazy_sequence_ptr lazy_sequence::next_in_place()
  {
-    resolve_seq();
-    if(sequence)
+    // sequence is non-nullptr


Comments are documentation, so please always use /* */ and proper capitalization and grammar. See the deleted comment on line 68 for an example.

jeaye · 2025-02-12T21:40:14Z

compiler+runtime/src/cpp/jank/runtime/obj/lazy_sequence.cpp

@@ -62,18 +70,16 @@ namespace jank::runtime::obj

  lazy_sequence_ptr lazy_sequence::next_in_place()
  {
-    resolve_seq();
-    if(sequence)
+    // sequence is non-nullptr


Let's clarify this comment. I'm a fan of explaining the WHY things clearly, in documentation: https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/src/cpp/jank/codegen/llvm_processor.cpp#L129

/* We know that `sequence` is non-null here because we can only use `next_in_place` * on a fresh sequence and we'll return null from `fresh_seq` if we don't have a valid * `sequence`. However, outside of this fn, `sequence` may definitely be null. */

Quick tangent:

Is it important that lazy_sequence::fresh_seq returns a lazy_sequence_ptr? For example, if we returned (a properly nulled) runtime::fresh_seq(sequence), we'd only need to visit 1 sequence on each next_in_place call instead of also the lazy_sequence itself.

Then this code we're explaining would be unreachable (if we copied the same approach to next and seq).

Hm, where is the second visit coming from? I'm not seeing it.

With the current implementation, say we run this:

(loop [c (fresh-seq (lazy-seq [...]))] (recur (next-in-place c)))

On the second line, we first visit lazy_sequence::next_in_place, which then recursively visits persistent_vector::next_in_place. This happens every time around the loop AFAICT.

I guess this isn't really a problem in a C++ for loop using templates, because you directly call ->next_in_place() on the known sequence type, eliminating the "first" visit. AFAICT this only works if fresh_seq and next_in_place have a concrete return type, which conflicts with my idea.

That being said, it looks like there are some places that visit twice because it doesn't call the member functions directly. e.g., changing to it->next_in_place() here would skip straight to lazy_sequence::next_in_place() when iterating a lazy-seq instead of having to visit.

jank/compiler+runtime/src/cpp/jank/hash.cpp

Line 171 in fe6f1dd

for(auto it(fresh_seq(typed_sequence)); it != nullptr; it = next_in_place(it))

Maybe seq/next is more relevant to my observation than the in-place versions, as it's more common in Clojure code, and hence less likely to be able to infer the sequence type to call a member function directly.

For example, if lazy sequence implemented seq/next like this:

object_ptr lazy_sequence::seq() const { resolve_seq(); return sequence; } object_ptr lazy_sequence::next() const { resolve_seq(); return runtime::next(sequence)); }

then code like this wouldn't need to do the double visit at (next c) because the lazy-seq is gone.

(defn foo [c] (recur (next c)) (foo (lazy-seq [...]))

I'm not sure whether that would affect C++ template code, but presumably you'd reach for the in-place operations instead.

Even with the optimal iterator, you'd still need 1 visit_seqable at every step to visit the vector seq if s is a (lazy-seq [...]):

for(auto i(s->fresh_seq()); i != nullptr; i = i->next_in_place())

I wonder if tightening the scope of the in-place operations would yield even better results for the optimal C++ case, similar in spirit to transients. For example, if the return of fresh_seq and next_in_place were always a non-empty concrete data structure like SeqIterator, but in this case only exposed member functions first, next-in-place, and seq! that produced a seq (analogous to persistent!).

;; clojure pseudocode for C++ classes (deftype SeqIterator [^:unsynchronized-mutable s next-specialized] (first [_] (first s)) (next-in-place [this] (when-some [s' (next-specialized s)] (set! s s') this)) (seq! [_] s)) (deftype LazySeq [sequence] ;; sequence->next-in-place-fn is a `visit_seqable` call that returns ;; [](T sequence) { sequence->next_in_place()}; ;; for the given sequence type. This caches the member function ;; so we don't need to visit during next-in-place. (fresh-seq [_] (->SeqIterator sequence (sequence->next-in-place-fn sequence))))

All this of course assumes this is a problem worth solving, which I'm not sure :)

We can pick this up another time (or maybe this is what you had originally with whatever sequenceptr was). It's a pretty big change.

Back to your feedback, I went with a different approach for the documentation, more in line with the rest of jank (for example persistent_string_sequence::next_in_place).

I went into depth about ownership in the documentation of sequenceable_in_place and generally the optimizations you can use in next_in_place and why.

I added an assertion that sequence is non-empty.

I removed all comments in lazy_sequence::next_in_place.

It felt off to specifically explain (1) in any one next_in_place implementation. Is this something you'd accept?

Opened #265 for the concern about extra visiting.

That being said, it looks like there are some places that visit twice because it doesn't call the member functions directly. e.g., changing to it->next_in_place() here would skip straight to lazy_sequence::next_in_place() when iterating a lazy-seq instead of having to visit.

There's no visit in that case. We have an overload for next_in_place which does the job without visiting: https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/include/cpp/jank/runtime/core/seq.hpp#L41

jeaye · 2025-02-12T21:50:23Z

compiler+runtime/src/cpp/jank/runtime/obj/lazy_sequence.cpp

+      // seq is consumed, no need to update
+      //sequence = nullptr;


Let's not leave commented code in to explain why we don't need it. The comment alone will suffice, if we make it clear enough. This comment is nowhere near clear enough, since the removal of this one line has huge implications.

It's safe to not set sequence to null, we're saying, because we know that we're in a fresh sequence. Nobody else has a pointer to this sequence. Thus, if we fully consume sequence and we're done with our current lazy seq, we return null and expect that this sequence will never be used again.

compiler+runtime/src/cpp/jank/runtime/obj/lazy_sequence.cpp

frenchy64 added 6 commits February 11, 2025 23:33

Fix fresh_seq on lazy_sequence

e66afe5

wip

c0e26bc

wip

4e7f565

fix

a2ff9db

fmt

c92544e

wip

a42dab1

frenchy64 marked this pull request as ready for review February 12, 2025 06:29

frenchy64 changed the title ~~WIP: Fix fresh_seq on lazy_sequence~~ Fix fresh_seq on lazy_sequence Feb 12, 2025

more generous timeout

ac6c3d5

jeaye requested changes Feb 12, 2025

View reviewed changes

frenchy64 added 7 commits February 12, 2025 17:19

don't need fresh seq

a5ceda0

add sequenceable_in_place documentation

467fd2f

[skip ci] undo

65ebb73

[skip ci] tweak

08945ab

[skip ci] fresh seqs also created by seq and next

3bcd239

Merge branch 'main' into issue-262-empty-reduce

54ef812

fresh_seq on non-empty seq is non-empty

19c9837

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fresh_seq on lazy_sequence #263

Fix fresh_seq on lazy_sequence #263

frenchy64 commented Feb 12, 2025 •

edited

Loading

jeaye left a comment

jeaye Feb 12, 2025

jeaye Feb 12, 2025

frenchy64 Feb 13, 2025 •

edited

Loading

jeaye Feb 13, 2025

frenchy64 Feb 13, 2025 •

edited

Loading

frenchy64 Feb 13, 2025

frenchy64 Feb 13, 2025 •

edited

Loading

frenchy64 Feb 13, 2025

frenchy64 Feb 13, 2025

jeaye Feb 13, 2025

jeaye Feb 12, 2025

Fix fresh_seq on lazy_sequence #263

Are you sure you want to change the base?

Fix fresh_seq on lazy_sequence #263

Conversation

frenchy64 commented Feb 12, 2025 • edited Loading

jeaye left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frenchy64 Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frenchy64 Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frenchy64 Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frenchy64 commented Feb 12, 2025 •

edited

Loading

frenchy64 Feb 13, 2025 •

edited

Loading

frenchy64 Feb 13, 2025 •

edited

Loading

frenchy64 Feb 13, 2025 •

edited

Loading