Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the annotated join method and dispatch #51914

Merged
merged 1 commit into from
Feb 11, 2024

Conversation

tecosaur
Copy link
Contributor

With the initial implementation, join could work for AnnotatedStrings, however only when the eltype of the iterator or delim was itself a Annotated{String,Char}. This was better than nothing, but seems inconsistent in the face of mixed iterators.

Furthermore, the implementation of an annotated join was far from optimised, relying on zipping and then calling annotatedstring(xs...). By contrast, the non-annotated implementation relies on printing to IO and even has manually defined alternative methods for optional arguments to minimise code generation.

With the advent of AnnotatedIOBuffer and _isannotated, we can improve on both those aspects. The new AnnotatedIOBuffer type allows us to re-use the optimised join(::IO, ...) methods, and we can more reliably dispatch to them with _isannotated. Since this is a type-based decision, the Julia compiler is kind enough to work out which branch is taken at compile-time, making this zero-overhead in the unannotated case.


Depends on #51806 and #51807

@tecosaur tecosaur added the strings "Strings!" label Oct 28, 2023
@tecosaur tecosaur force-pushed the nicer-annotated-join branch from 454e7e9 to 7e55049 Compare February 1, 2024 18:31
@tecosaur
Copy link
Contributor Author

tecosaur commented Feb 1, 2024

Now that the prerequisite PRs have been merged 🥳, we can now think about merging this one.

@tecosaur tecosaur added the awaiting review PR is complete and seems ready to merge. Has tests and news/compat if needed. CI failures unrelated. label Feb 1, 2024
@oscardssmith
Copy link
Member

looks good to me

@tecosaur tecosaur force-pushed the nicer-annotated-join branch from 7e55049 to 4dd043e Compare February 4, 2024 15:39
@tecosaur
Copy link
Contributor Author

tecosaur commented Feb 4, 2024

Huh, I see CI actually wasn't happy. I pulled the latest changes in master and tested locally, and it works. Also tweaked an existing test to cover one of the new behaviours from this change.

CI should be happy now 🤞, and I presume this would be good to merge then?

@tecosaur
Copy link
Contributor Author

tecosaur commented Feb 5, 2024

I'm a bit confused by the fact that CI is showing errors that I don't see locally, namely:

Error in testset strings/annotated:
Test Failed at /cache/build/tester-amdci5-15/julialang/julia-master/julia-4dd043e483/share/julia/test/strings/annotated.jl:94
  Expression: join([str1, str1], ' ') == Base.AnnotatedString("test test", [(1:4, :label => 5), (6:9, :label => 5)])
   Evaluated: "test test" == "test test"
Error in testset strings/annotated:
Test Failed at /cache/build/tester-amdci5-15/julialang/julia-master/julia-4dd043e483/share/julia/test/strings/annotated.jl:98
  Expression: join([str1, str1], Base.AnnotatedString(" ", [(1:1, :label => 2)])) == Base.AnnotatedString("test test", [(1:4, :label => 5), (5:5, :label => 2), (6:9, :label => 5)])
   Evaluated: "test test" == "test test"

when if I do ./julia test/runtests.jl strings/annotated I see


Test          (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
strings/annotated  (1) |        started at 2024-02-05T12:48:01.929
strings/annotated  (1) |     2.73 |   0.04 |  1.3 |     367.04 |   665.82

Test Summary: | Pass  Total  Time
  Overall     |  476    476  4.1s

@vtjnash
Copy link
Member

vtjnash commented Feb 5, 2024

Error in testset strings/annotated:
Test Failed at /cache/build/tester-amdci5-15/julialang/julia-master/julia-4dd043e483/share/julia/test/strings/annotated.jl:94
  Expression: join([str1, str1], ' ') == Base.AnnotatedString("test test", [(1:4, :label => 5), (6:9, :label => 5)])
   Evaluated: "test test" == "test test"

This seems somewhat a lack of a complete show method being implemented. The show for annotated string should probably do something like wrapping the printing in a call to Base.AnnotatedString (e.g. use the default printing) or re-format it using the StyledStrings.jl styled"{:}" printing

@vtjnash vtjnash removed the awaiting review PR is complete and seems ready to merge. Has tests and news/compat if needed. CI failures unrelated. label Feb 5, 2024
@tecosaur
Copy link
Contributor Author

tecosaur commented Feb 6, 2024

This seems somewhat a lack of a complete show method being implemented. The show for annotated string should probably do something like wrapping the printing in a call to Base.AnnotatedString (e.g. use the default printing) or re-format it using the StyledStrings.jl styled"{:}" printing

When you're using the REPL, and styling attributes, I find that the current show works fairly well,

image

but yes, this is annoying when looking at it in CI. For CI alone, displaying the value like the constructor (Base.AnnotatedString("test test", [(1:4, :label => 5), (6:9, :label => 5)])) may be the most helpful.

@tecosaur
Copy link
Contributor Author

tecosaur commented Feb 8, 2024

Would having a different show form for AnnotatedStrings within tests be a bit silly?

@tecosaur
Copy link
Contributor Author

For now I'll temporarily change the show method in this PR to get some more useful CI output.

@tecosaur tecosaur force-pushed the nicer-annotated-join branch from 4dd043e to 239087c Compare February 10, 2024 19:11
@vtjnash vtjnash marked this pull request as draft February 10, 2024 19:54
@tecosaur
Copy link
Contributor Author

This is stange. After adding the logging, it's not complaining about the difference anymore (I just see doctest failures resulting from the log-more-info change). Was it fixed by rebasing to a newer master?

With the initial implementation, join could work for AnnotatedStrings,
however only when the eltype of the iterator or delim was itself a
Annotated{String,Char}. This was better than nothing, but seems
inconsistent in the face of mixed iterators.

Furthermore, the implementation of an annotated join was far from
optimised, relying on zipping and then calling annotatedstring(xs...).
By contrast, the non-annotated implementation relies on printing to IO
and even has manually defined alternative methods for optional arguments
to minimise code generation.

With the advent of AnnotatedIOBuffer and _isannotated, we can improve on
both those aspects. The new AnnotatedIOBuffer type allows us to re-use
the optimised join(::IO, ...) methods, and we can more reliably dispatch
to them with _isannotated. Since this is a type-based decision, the
Julia compiler is kind enough to work out which branch is taken at
compile-time, making this zero-overhead in the unannotated case.
@tecosaur tecosaur force-pushed the nicer-annotated-join branch from 239087c to 024af3e Compare February 11, 2024 06:47
@tecosaur
Copy link
Contributor Author

Tests have successfully run on everything but the Mingw32 Windows builds. I think this is good to go now.

@tecosaur tecosaur marked this pull request as ready for review February 11, 2024 09:41
@vtjnash vtjnash merged commit 002f07a into JuliaLang:master Feb 11, 2024
7 checks passed
vtjnash added a commit that referenced this pull request Feb 13, 2024
@tecosaur tecosaur deleted the nicer-annotated-join branch August 11, 2024 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants