Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fresh_seq on lazy_sequence #263

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

frenchy64
Copy link
Contributor

@frenchy64 frenchy64 commented Feb 12, 2025

Closes #262

fresh_seq now returns nullptr if lazy-seq is empty.

with-meta also needed to change since it didn't handle fresh_seq returning nullptr.

Perf: Takes ownership of inner sequence in the result of fresh_seq to enable calling next_in_place on the inner seq.

I tweaked the timeout-minutes to 35m after seeing this build fail to see if it was a transient issue. The next build succeeded in 28m.

$ ./build/jank repl
Bottom of clojure.core
clojure.core=> (reduce (fn [a b] :c) :a (lazy-seq))
:a
clojure.core=> (reduce (fn [a b] :c) :a (map identity []))
:a
clojure.core=> (with-meta (lazy-seq) {})
()

@frenchy64 frenchy64 marked this pull request as ready for review February 12, 2025 06:29
@frenchy64 frenchy64 changed the title WIP: Fix fresh_seq on lazy_sequence Fix fresh_seq on lazy_sequence Feb 12, 2025
Copy link
Member

@jeaye jeaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, Ambrose. Intricate changes, so I've been quite particular about these. The sequence stuff is very easy to mess up, so we need to be extra careful with it.

@@ -62,18 +70,16 @@ namespace jank::runtime::obj

lazy_sequence_ptr lazy_sequence::next_in_place()
{
resolve_seq();
if(sequence)
// sequence is non-nullptr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are documentation, so please always use /* */ and proper capitalization and grammar. See the deleted comment on line 68 for an example.

@@ -62,18 +70,16 @@ namespace jank::runtime::obj

lazy_sequence_ptr lazy_sequence::next_in_place()
{
resolve_seq();
if(sequence)
// sequence is non-nullptr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify this comment. I'm a fan of explaining the WHY things clearly, in documentation: https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/src/cpp/jank/codegen/llvm_processor.cpp#L129

/* We know that `sequence` is non-null here because we can only use `next_in_place`
 * on a fresh sequence and we'll return null from `fresh_seq` if we don't have a valid
 * `sequence`. However, outside of this fn, `sequence` may definitely be null. */

Copy link
Contributor Author

@frenchy64 frenchy64 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick tangent:

Is it important that lazy_sequence::fresh_seq returns a lazy_sequence_ptr? For example, if we returned (a properly nulled) runtime::fresh_seq(sequence), we'd only need to visit 1 sequence on each next_in_place call instead of also the lazy_sequence itself.

Then this code we're explaining would be unreachable (if we copied the same approach to next and seq).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, where is the second visit coming from? I'm not seeing it.

Copy link
Contributor Author

@frenchy64 frenchy64 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation, say we run this:

(loop [c (fresh-seq (lazy-seq [...]))]
  (recur (next-in-place c)))

On the second line, we first visit lazy_sequence::next_in_place, which then recursively visits persistent_vector::next_in_place. This happens every time around the loop AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this isn't really a problem in a C++ for loop using templates, because you directly call ->next_in_place() on the known sequence type, eliminating the "first" visit. AFAICT this only works if fresh_seq and next_in_place have a concrete return type, which conflicts with my idea.

Copy link
Contributor Author

@frenchy64 frenchy64 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, it looks like there are some places that visit twice because it doesn't call the member functions directly. e.g., changing to it->next_in_place() here would skip straight to lazy_sequence::next_in_place() when iterating a lazy-seq instead of having to visit.

for(auto it(fresh_seq(typed_sequence)); it != nullptr; it = next_in_place(it))

Maybe seq/next is more relevant to my observation than the in-place versions, as it's more common in Clojure code, and hence less likely to be able to infer the sequence type to call a member function directly.

For example, if lazy sequence implemented seq/next like this:

  object_ptr lazy_sequence::seq() const
  {
    resolve_seq();
    return sequence;
  }

  object_ptr lazy_sequence::next() const
  {
    resolve_seq();
    return runtime::next(sequence));
  }

then code like this wouldn't need to do the double visit at (next c) because the lazy-seq is gone.

(defn foo [c]
  (recur (next c))

(foo (lazy-seq [...]))

I'm not sure whether that would affect C++ template code, but presumably you'd reach for the in-place operations instead.

Even with the optimal iterator, you'd still need 1 visit_seqable at every step to visit the vector seq if s is a (lazy-seq [...]):

    for(auto i(s->fresh_seq()); i != nullptr; i = i->next_in_place())

I wonder if tightening the scope of the in-place operations would yield even better results for the optimal C++ case, similar in spirit to transients. For example, if the return of fresh_seq and next_in_place were always a non-empty concrete data structure like SeqIterator, but in this case only exposed member functions first, next-in-place, and seq! that produced a seq (analogous to persistent!).

  ;; clojure pseudocode for C++ classes
  (deftype SeqIterator [^:unsynchronized-mutable s next-specialized]
    (first [_] (first s))
    (next-in-place [this] (when-some [s' (next-specialized s)] 
                            (set! s s')
                            this))
    (seq! [_] s))
  
  (deftype LazySeq [sequence]
    ;; sequence->next-in-place-fn is a `visit_seqable` call that returns
    ;;   [](T sequence) { sequence->next_in_place()};
    ;; for the given sequence type. This caches the member function
    ;; so we don't need to visit during next-in-place.
    (fresh-seq [_] (->SeqIterator sequence (sequence->next-in-place-fn sequence))))

All this of course assumes this is a problem worth solving, which I'm not sure :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pick this up another time (or maybe this is what you had originally with whatever sequenceptr was). It's a pretty big change.

Back to your feedback, I went with a different approach for the documentation, more in line with the rest of jank (for example persistent_string_sequence::next_in_place).

  1. I went into depth about ownership in the documentation of sequenceable_in_place and generally the optimizations you can use in next_in_place and why.
  2. I added an assertion that sequence is non-empty.
  3. I removed all comments in lazy_sequence::next_in_place.

It felt off to specifically explain (1) in any one next_in_place implementation. Is this something you'd accept?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #265 for the concern about extra visiting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, it looks like there are some places that visit twice because it doesn't call the member functions directly. e.g., changing to it->next_in_place() here would skip straight to lazy_sequence::next_in_place() when iterating a lazy-seq instead of having to visit.

There's no visit in that case. We have an overload for next_in_place which does the job without visiting: https://github.com/jank-lang/jank/blob/main/compiler%2Bruntime/include/cpp/jank/runtime/core/seq.hpp#L41

Comment on lines 77 to 78
// seq is consumed, no need to update
//sequence = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not leave commented code in to explain why we don't need it. The comment alone will suffice, if we make it clear enough. This comment is nowhere near clear enough, since the removal of this one line has huge implications.

It's safe to not set sequence to null, we're saying, because we know that we're in a fresh sequence. Nobody else has a pointer to this sequence. Thus, if we fully consume sequence and we're done with our current lazy seq, we return null and expect that this sequence will never be used again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Empty sequence reduces as if '(nil)
2 participants