Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a concept of memory-backed contiguous collection in Base #54581

Open
jakobnissen opened this issue May 26, 2024 · 9 comments
Open

Add a concept of memory-backed contiguous collection in Base #54581

jakobnissen opened this issue May 26, 2024 · 9 comments
Labels
arrays [a, r, r, a, y, s] collections Data structures holding multiple items, e.g. sets

Comments

@jakobnissen
Copy link
Contributor

jakobnissen commented May 26, 2024

I propose the creation of a new concrete (parametric) type in Base that corresponds to memory-backed contiguous arrays, and to implement Base methods that specialize on these properties in terms of this type.

Edit: I've made a proof-of-concept package here: https://github.com/BioJulia/MemoryViews.jl

Motivation

Various places in Base and elsewhere, Julia has methods that can operate on any contiguous memory region.
Let's call this type of data a dense array.
Examples include:

  • Search methods which work by ccalling memchr, as in string/search.jl
  • CRC32c
  • Various IO methods, e.g. read!(::IO, A::AbstractArray), which calls to unsafe_write.

This is currently handled inconsistently in Base:

  • string/search.jl uses the internal ByteArray union type to cover dense arrays of bytes
  • CRC32c uses Union{Array{UInt8},FastContiguousSubArray{UInt8,N,<:Array{UInt8}} where N}
  • read dispatches on AbstractArray, and checks of isbitstype(eltype(A)) && _checkcontiguous(Bool, A).

There are several issues with this handling:

First, all the approaches above fail to cover some important types. I've attempted to address this ad-hoc in #47693, #53178 and #54579. However, this ad-hoc patching is deeply unsatisfying, and I'm certain I have missed several types. In fact, I'm starting to think it's literally impossible to cover all permutations of views, reinterpretarrays, codeunits, memory and vector using a Union.
The practical implication is that lots of function calls needlessly fail to hit the fast path, while at the same time, the code is harder to reason about because this ad-hoc implementations uses complex big unions in their signature, and is inconsistent with each other.

The fact that this code is duplicated thrice in Base, and in all three places was lacking important methods, suggests to me that there is a need for a better (internal) API to write methods for dense arrays. That was also my experience when making those three PR's: "Surely, there must be a better way to do this". The main usability issue is that in lieu of any API to cover dense arrays, it's up to the implementer of every method to make sure they've covered non-obvious types like CodeUnits{UInt8, SubString{String}} and SubArray{UInt8, 1, Vector{UInt8}, Tuple{Base.OneTo{Int64}}, true}. Unsurprisingly, people don't correctly do this.

There are also some other, minor issues with the existing approaches: Namely, it causes unnecessary specialization, as there is no reason to compile two methods for e.g. String and Vector{UInt8} if they each just operate on a slice of bytes in memory. Also, it's more difficult to introspect and reason about methods whose signature includes a "big union".

Why not use DenseArray?

Because it doesn't cover the correct types:

  • It doesn't cover SubArray, despite many subarrays being dense vectors.
  • Currently, we incorrectly have CodeUnits <: DenseVector (CodeUnits <: DenseVector, but does not fulfill its only criteria #53996). If this is fixed, then e.g. CodeUnits{T, String} will also not be covered by DenseArray.
  • Users may create a new subtype <: DenseArray without implementing methods like pointer, or indeed without even realizing that subtyping DenseArray requires that their new type must be memory-backed and contiguous.

Also, reinterpret may create dense arrays that are not DenseArrays - however, my proposed implementation doesn't handle this, either.

Proposal

I have cooked up a proof-of-concept package here: https://github.com/BioJulia/MemoryViews.jl which you may see for more details

I propose creating a new, internal Base type with the following layout:

# M is enforced to be :immutable or :mutable
struct MemView{T, M} <: DenseVector{T}
    ref::MemoryRef{T}
    len::Int
end

These types can be constructed from the various contiguous, dense arrays:

MemView(A::Union{Array{T}, Memory{T}}) where {T} = MutableMemView{T}(A.ref, length(A))

function MemView(s::String)
    ImmutableMemView{UInt8}(MemoryRef(unsafe_wrap(Memory{UInt8}, s)), ncodeunits(s))
end

MemView(s::SubString) = MemView(parent(s))[only(parentindices(s))]

[etc...]

Optimised methods can be implemented in terms of MemView, when it's not possible to write optimised methods generic to AbstractArray, and where Union{Vector, Memory} etc. is too restrictive:

function some_function_working_on_memory(mem::MemoryView{<:Union{UInt8, Int8}})
    # the optimised function here, maybe calling ccall or intrinsics
end

Which would only need to be compiled once, and which is conceptually easier to understand then working with a huge union.

Further, my proposal implements a trait tentatively called MemKind, which is IsMemory if the type is semantically equivalent to its MemView. For example, codeunits are semantically equivalent to MemView (both being dense arrays of bytes), whereas strings have a MemView method, but are not semantically equivalent to an array of bytes.
The purpose of MemKind is that, for any type implementing that, methods can immediately wrap convert them to a MemView, then pass on to the low-level implementation function.

The surface level API would be along the lines below, using a simple hypothetical find_first_zero function.

find_first_zero(x) = find_first_zero(MemKind(x), x) # dispatch using MemKind trait

function find_first_zero(::MemKind, x) # generic fallback
    for (k,v) in pairs(x)
        iszero(v) && return k
    end
    nothing
end

find_first_zero(::IsMemory{<:MemView{<:Union{UInt8, Int8}}}, x) = find_first_zero(MemView(x))
find_first_zero(mem::MemView{UInt8}) = # call memchr

This proposal has two important features:

  1. At a high level, dispatching on the MemKind trait is more accurate at selecting dense arrays than dispatching on big unions. It both correctly includes complex nested types such as views of codeunits of substrings, while also rejecting incorrect types, such as an incorrectly implemented DenseArray.
  2. At a low level, it is easier to reason about the behaviour (and safety) of low-level methods operating on memory, if it's implemented in terms of a single, concrete MemView type. It's also better for compile times and sysimg size if the methods for many different types all dispatch to the same single implementation.

Alternatives

  1. Instead of creating concrete types, the different parts in Base could more consistently use Base._checkcontiguous and isbitstype, perhaps extracted into a function is_memory_backed_contiguous. Base could then be reviewed for places where memory-backed contiguous types are used, and methods could be added to "funnel" all the compatible types into these methods.
    Importantly, any alternative approach should include an internal API, such that it makes it easy to write a method that will correctly dispatch if, and only if, the argument is a dense array - which is not the status quo.

  2. MemView could, instead of being backed by Memory, be backed by a simple pointer. In that case, MemView would be much like Random.UnsafeView. The novelty of this proposal would be in the MemKind trait, mostly. See more here.

@oscardssmith
Copy link
Member

To me, a type seems like the wrong option since there can be cases where a type meets or doesnt meet the requirements based on it's parameters. As such, I think the alternative solution is probably the better answer.

@jakobnissen
Copy link
Contributor Author

Maybe I misunderstand, but you could very well have a method that converts MyType{Float32} to MemoryView, but not MyType{Int32}, or what have you.
The advantages of a single type still remain: Fewer code specializations, easier reasoning about the core implementation due to restricted types, and the "for free" genericness of the function (i.e. the user doesn't need to remember to cover view of codeunits of substrings of String in their signature)

@Seelengrab
Copy link
Contributor

I think the idea/concept of a memory-backed, contiguous collection type is good in principle, but I'd rather see it implemented through a trait (or, if we ever get it, multiple abstract subtyping) than a bespoke struct that is converted to. My reasoning for this is that I don't think it's good to convert to/from such a type, effectively erasing the type information & invariants of the existing instances in that collection. In the proposed design, nothing stops me from converting e.g. a Vector{UInt64} to a MemoryView{UInt8} and then converting that to something entirely different - surely that's not intended, since the MemoryView is only supposed to be a (temporarily) different view of the existing valid data?

@ericphanson
Copy link
Contributor

I like the proposed approach. I think N “convert generic object to a concrete tightly constrained one” methods followed by M “operate on concrete object” specialized implementations is a better pattern than M functions to “operate on generic objects while making sure to specially handle any of N cases”. There is some sense we are getting the “intermediate abstraction” thing of N*M -> N+M.

Additionally it orthoganalizes concerns which makes each function easier to test and ensure correctness of.

@jakobnissen
Copy link
Contributor Author

jakobnissen commented May 29, 2024

Thanks for your feedback. I've made a proof of concept package here: https://github.com/jakobnissen/MemViews.jl. I had to work a little bit back and forth with it to get an API I liked using, but I must say I like how it turned out.

@Seelengrab to respond to your concerns:

I'd rather see it implemented through a trait

A major part of my motivation is to consolidate the concrete representations of multiple types, to make things easier to reason about, which is important when you want to do low-level stuff, including working with pointers or calling into C. This is not practically possible using an abstract interface, because in these circumstances, you do really need to know the concrete memory layout of your types.

My reasoning for this is that I don't think it's good to convert to/from such a type, effectively erasing the type information & invariants of the existing instances in that collection.

The idea is that the author of a new implementation explicitly calls MemView(x), if x can be operated on in terms of memory. It's completely opt-in, you won't have your type unintentionally dispatch to a mem view method just because you implement MemView(x::MyType). That is, if your type has invariants which means it is not valid to represent it as memory, don't.

In my proof of concept package, you can also opt-in with the trait MemKind if you do want methods being able to treat your type semantically as equivalent to a memory view. This is conceptually similar to implementing AsRef<[T]> in Rust. Types such as strings, which can be represented by memory views, but which are not semantically equivalent to its own memory, would not implement MemKind, but would implement the MemView constructor.

In the proposed design, nothing stops me from converting e.g. a Vector{UInt64} to a MemoryView{UInt8} and then converting that to something entirely different

You can't convert between element types (e.g. Vector{A} to MemView{B} unless A === B). This is not easy to do.

@Seelengrab
Copy link
Contributor

Seelengrab commented May 29, 2024

You can't convert between element types (e.g. Vector{A} to MemView{B} unless A === B). This is not easy to do.

I'm not sure I understand the example in your OP with String then - how else other than with unsafe_wrap are you supposed to get that changed eltype? If you don't need to/want to change the element type, surely the existing ccall/convert infrastructure can do that conversion already. For the Julia-side of this, I'd again prefer the traits approach over forcing the use of unsafe_* to create differently typed views into the underlying memory, complicating lifetime analysis immensely. Concretely, I think the comment from your Proof-Of-Concept:

https://github.com/jakobnissen/MemViews.jl/blob/0fbcf578a85a9f370d8a2fdc64a46fedacea83be/src/construction.jl#L10-L12

must be answered with "yes" because the string is not necessarily GC grounded for the entire duration the MemoryView exists. I don't see how this can be done generically for other types either, so I don't see how MemoryView can be done safely at all.

@nhz2
Copy link
Contributor

nhz2 commented May 29, 2024

Why is the use of unsafe_wrap in the following safe from GC?

StringMemory(n::Integer) = unsafe_wrap(Memory{UInt8}, _string_n(n))

It seems very similar to https://github.com/jakobnissen/MemViews.jl/blob/0fbcf578a85a9f370d8a2fdc64a46fedacea83be/src/construction.jl#L10-L12

@vtjnash
Copy link
Member

vtjnash commented May 29, 2024

Memory can hold an internal reference to certain kinds of object, such as String

@Seelengrab
Copy link
Contributor

Memory can hold an internal reference to certain kinds of object, such as String

So if I'm understanding correctly, this only works/is safe because it's special cased for String? In other words, for other objects this is properly unsafe without keeping an actual reference to the object alive too?

@nsajko nsajko added arrays [a, r, r, a, y, s] collections Data structures holding multiple items, e.g. sets labels May 31, 2024
LilithHafner pushed a commit that referenced this issue Jul 9, 2024
Previously, this method hit the slow generic AbstractArray fallback.

Closes #55079

This is an ad-hoc bandaid that really ought to be fixed by resolving
#54581.
KristofferC pushed a commit that referenced this issue Jul 12, 2024
Previously, this method hit the slow generic AbstractArray fallback.

Closes #55079

This is an ad-hoc bandaid that really ought to be fixed by resolving
#54581.

(cherry picked from commit ec90012)
vtjnash pushed a commit that referenced this issue Sep 10, 2024
This was originally intended as a targeted fix to #54578, but I ran into
a bunch of smaller issues with this code that also needed to be solved
and it turned out to be difficult to fix them with small, trivial PRs.

I would also like to refactor this whole file, but I want these
correctness fixes to be merged first, because a larger refactoring has
higher risk of getting stuck without getting reviewed and merged.

## Larger things that needs decisions
* The internal union `Base.ByteArray` has been deleted. Instead, the
unions `DenseInt8` and `DenseUInt8` have been added. These more
comprehensively cover the types that was meant, e.g. `Memory{UInt8}` was
incorrectly not covered by the former. As stated in the TODO, the
concept of a "memory backed dense byte array" is needed throughout
Julia, so this ideally needs to be implemented as a single type and used
throughout Base. The fix here is a decent temporary solution. See #53178
#54581
* The `findall` docstring between two arrays was incorrectly not
attached to the method - now it is. **Note that this change _changes_
the documentation** since it includes a docstring that was previously
missed. Hence, it's an API addition.
* Added a new minimal `testhelpers/OffsetDenseArrays.jl` which provide a
`DenseVector` with offset axes for testing purposes.

## Trivial fixes
* `findfirst(==(Int8(-1)), [0xff])` and similar findlast, findnext and
findprev is no longer buggy, see #54578
* `findfirst([0x0ff], Int8[-1])` is similarly no longer buggy, see
#54578
* `findnext(==('\xa6'), "æ", 1)` and `findprev(==('\xa6'), "æa", 2)` no
longer incorrectly throws an error
* The byte-oriented find* functions now work correctly with offset
arrays
* Fixed incorrect use of `GC.@preserve`, where the pointer was taken
before the preserve block.
* More of the optimised string methods now also apply to
`SubString{String}`


Closes #54578
Co-authored-by: Martin Holters <martin.holters@hsu-hh.de>
kshyatt pushed a commit that referenced this issue Sep 12, 2024
This was originally intended as a targeted fix to #54578, but I ran into
a bunch of smaller issues with this code that also needed to be solved
and it turned out to be difficult to fix them with small, trivial PRs.

I would also like to refactor this whole file, but I want these
correctness fixes to be merged first, because a larger refactoring has
higher risk of getting stuck without getting reviewed and merged.

## Larger things that needs decisions
* The internal union `Base.ByteArray` has been deleted. Instead, the
unions `DenseInt8` and `DenseUInt8` have been added. These more
comprehensively cover the types that was meant, e.g. `Memory{UInt8}` was
incorrectly not covered by the former. As stated in the TODO, the
concept of a "memory backed dense byte array" is needed throughout
Julia, so this ideally needs to be implemented as a single type and used
throughout Base. The fix here is a decent temporary solution. See #53178
#54581
* The `findall` docstring between two arrays was incorrectly not
attached to the method - now it is. **Note that this change _changes_
the documentation** since it includes a docstring that was previously
missed. Hence, it's an API addition.
* Added a new minimal `testhelpers/OffsetDenseArrays.jl` which provide a
`DenseVector` with offset axes for testing purposes.

## Trivial fixes
* `findfirst(==(Int8(-1)), [0xff])` and similar findlast, findnext and
findprev is no longer buggy, see #54578
* `findfirst([0x0ff], Int8[-1])` is similarly no longer buggy, see
#54578
* `findnext(==('\xa6'), "æ", 1)` and `findprev(==('\xa6'), "æa", 2)` no
longer incorrectly throws an error
* The byte-oriented find* functions now work correctly with offset
arrays
* Fixed incorrect use of `GC.@preserve`, where the pointer was taken
before the preserve block.
* More of the optimised string methods now also apply to
`SubString{String}`


Closes #54578
Co-authored-by: Martin Holters <martin.holters@hsu-hh.de>
udesou added a commit to mmtk/julia that referenced this issue Oct 14, 2024
* Improve type-stability in SymTridiagonal triu!/tril! (#55646)

Changing the final `elseif` branch to an `else` makes it clear that the
method definite returns a value, and the returned type is now a
`Tridiagonal` instead of a `Union{Nothing, Tridiagonal}`

* Reuse size-check function from `lacpy!` in `copytrito!` (#55664)

Since there is a size-check function in `lacpy!` that does the same
thing, we may reuse it instead of duplicating the check

* Update calling-c-and-fortran-code.md: fix ccall parameters (not a tuple) (#55665)

* Allow exact redefinition for types with recursive supertype reference (#55380)

This PR allows redefining a type when the new type is exactly identical
to the previous one (like #17618, #20592 and #21024), even if the type
has a reference to itself in its supertype. That particular case used to
error (issue #54757), whereas with this PR:
```julia
julia> struct Rec <: AbstractVector{Rec} end

julia> struct Rec <: AbstractVector{Rec} end # this used to error

julia>
```


Fix #54757 by implementing the solution proposed there. Hence, this
should also fix downstream Revise bug
https://github.com/timholy/Revise.jl/issues/813.

---------

Co-authored-by: N5N3 <2642243996@qq.com>

* Reroute Symmetric/Hermitian + Diagonal through triangular (#55605)

This should fix the `Diagonal`-related issue from
https://github.com/JuliaLang/julia/issues/55590, although the
`SymTridiagonal` one still remains.
```julia
julia> using LinearAlgebra

julia> a = Matrix{BigFloat}(undef, 2,2)
2×2 Matrix{BigFloat}:
 #undef  #undef
 #undef  #undef

julia> a[1] = 1; a[3] = 1; a[4] = 1
1

julia> a = Hermitian(a)
2×2 Hermitian{BigFloat, Matrix{BigFloat}}:
 1.0  1.0
 1.0  1.0

julia> b = Symmetric(a)
2×2 Symmetric{BigFloat, Matrix{BigFloat}}:
 1.0  1.0
 1.0  1.0

julia> c = Diagonal([1,1])
2×2 Diagonal{Int64, Vector{Int64}}:
 1  ⋅
 ⋅  1

julia> a+c
2×2 Hermitian{BigFloat, Matrix{BigFloat}}:
 2.0  1.0
 1.0  2.0

julia> b+c
2×2 Symmetric{BigFloat, Matrix{BigFloat}}:
 2.0  1.0
 1.0  2.0
```

* inference: check argtype compatibility in `abstract_call_opaque_closure` (#55672)

* Forward istriu/istril for triangular to parent (#55663)

* win: move stack_overflow_warning to the backtrace fiber (#55640)

There is not enough stack space remaining after a stack overflow on
Windows to allocate the 4k page used by `write` to call the WriteFile
syscall. This causes it to hard-crash. But we can simply run this on the
altstack implementation, where there is plenty of space.

* Check if ct is not null before doing is_addr_on_stack in the macos signal handler. (#55603)

Before the check we used to segfault while segfaulting and hang

---------

Co-authored-by: Jameson Nash <vtjnash@gmail.com>

* Profile.print: color Base/Core & packages. Make paths clickable (#55335)

Updated
## This PR
![Screenshot 2024-09-02 at 1 47
23 PM](https://github.com/user-attachments/assets/1264e623-70b2-462a-a595-1db2985caf64)


## master
![Screenshot 2024-09-02 at 1 49
42 PM](https://github.com/user-attachments/assets/14d62fe1-c317-4df5-86e9-7c555f9ab6f1)



Todo:
- [ ] ~Maybe drop the `@` prefix when coloring it, given it's obviously
special when colored~ If someone copy-pasted the profile into an issue
this would make it confusing.
- [ ] Figure out why `Profile.print(format=:flat)` is truncating before
the terminal width is used up
- [x] Make filepaths terminal links (even if they're truncated)

* better signal handling (#55623)

Instead of relying on creating a fake stack frame, and having no signals
delivered, kernel bugs, accidentally gc_collect, or other issues occur
during the delivery and execution of these calls, use the ability we
added recently to emulate a longjmp into a unw_context to eliminate any
time where there would exist any invalid states.

Secondly, when calling jl_exit_thread0_cb, we used to end up completely
smashing the unwind info (with CFI_NOUNWIND), but this makes core files
from SIGQUIT much less helpful, so we now have a `fake_stack_pop`
function with contains the necessary CFI directives such that a minimal
unwind from the debugger will likely still succeed up into the frames
that were removed. We cannot do this perfectly on AArch64 since that
platform's DWARF spec lacks the ability to do so. On other platforms,
this should be possible to implement exactly (subject to libunwind
implementation quality). This is currently thus only fully implemented for
x86_64 on Darwin Apple.

* fix `exct` for mismatched opaque closure call

* improve `exct` modeling for opaque closure calls

* fix `nothrow` modeling for `invoke` calls

* improve `exct` modeling for `invoke` calls

* show a bit more detail when finished precompiling (#55660)

* subtype: minor clean up for fast path for lhs union and rhs typevar (#55645)

Follow up #55413.
The error pattern mentioned in
https://github.com/JuliaLang/julia/pull/55413#issuecomment-2288384468
care's `∃y`'s ub in env rather than its original ub.
So it seems more robust to check the bounds in env directly.
The equivalent typevar propagation is lifted from `subtype_var` for the
same reason.

* Adding `JL_DATA_TYPE` annotation to `_jl_globalref_t` (#55684)

`_jl_globalref_t` seems to be allocated in the heap, and there is an
object `jl_globalref_type` which indicates that it is in fact, a data
type, thus it should be annotated with `JL_DATA_TYPE`??

* Make GEP when loading the PTLS an inbounds one. (#55682)

Non inbounds GEPs should only be used when doing pointer arithmethic i.e
Ptr or MemoryRef boundscheck.
Found when auditing non inbounds GEPs for
https://github.com/JuliaLang/julia/pull/55681

* codegen: make boundscheck GEP not be inbounds while the load GEP is inbounds (#55681)

Avoids undefined behavior on the boundschecking arithmetic, which is
correct only assuming overflow follows unsigned arithmetic wrap around
rules.

Also add names to the Memory related LLVM instructions to aid debugging

Closes: https://github.com/JuliaLang/julia/pull/55674

* Make `rename` public (#55652)

Fixes #41584. Follow up of #55503

I think `rename` is a very useful low-level file system operation. Many
other programming languages have this function, so it is useful when
porting IO code to Julia.

One use case is to improve the Zarr.jl package to be more compatible
with zarr-python.

https://github.com/zarr-developers/zarr-python/blob/0b5483a7958e2ae5512a14eb424a84b2a75dd727/src/zarr/v2/storage.py#L994
uses the `os.replace` function. It would be nice to be able to directly
use `Base.rename` as a replacement for `os.replace` to ensure
compatibility.

Another use case is writing a safe zip file extractor in pure Julia.
https://github.com/madler/sunzip/blob/34107fa9e2a2e36e7e72725dc4c58c9ad6179898/sunzip.c#L365
uses the `rename` function to do this in C.

Lastly in
https://github.com/medyan-dev/MEDYANSimRunner.jl/blob/67d5b42cc599670486d5d640260a95e951091f7a/src/file-saving.jl#L83
I am using `ccall(:jl_fs_rename` to save files, because I have large
numbers of Julia processes creating and reading these files at the same
time on a distributed file system on a cluster, so I don't want data to
become corrupted if one of the nodes crashes (which happens fairly
regularly). However `jl_fs_rename` is not public, and might break in a
future release.

This PR also adds a note to `mv` comparing it to the `mv` command,
similar to the note on the `cp` function.

* contrib: include private libdir in `ldflags` on macOS (#55687)

The private libdir is used on macOS, so it needs to be included in our
`ldflags`

* Profile.print: Shorten C paths too (#55683)

* [LLVMLibUnwindJLL] Update llvmlibunwind to 14.0.6 (#48140)

* Add `JL_DATA_TYPE` for `jl_line_info_node_t` and `jl_code_info_t` (#55698)

* Canonicalize names of nested functions by keeping a more fine grained counter -- per (module, method name) pair (#53719)

As mentioned in https://github.com/JuliaLang/julia/pull/53716, we've
been noticing that `precompile` statements lists from one version of our
codebase often don't apply cleanly in a slightly different version.

That's because a lot of nested and anonymous function names have a
global numeric suffix which is incremented every time a new name is
generated, and these numeric suffixes are not very stable across
codebase changes.

To solve this, this PR makes the numeric suffixes a bit more fine
grained: every pair of (module, top-level/outermost function name) will
have its own counter, which should make nested function names a bit more
stable across different versions.

This PR applies @JeffBezanson's idea of making the symbol name changes
directly in `current-julia-module-counter`.

Here is an example:

```Julia
julia> function foo(x)
           function bar(y)
               return x + y
           end
       end
foo (generic function with 1 method)

julia> f = foo(42)
(::var"#bar#foo##0"{Int64}) (generic function with 1 method)
```

* Use `uv_available_parallelism` inside `jl_effective_threads` (#55592)

* [LinearAlgebra] Initialise number of BLAS threads with `jl_effective_threads` (#55574)

This is a safer estimate than `Sys.CPU_THREADS` to avoid oversubscribing
the machine when running distributed applications, or when the Julia
process is constrained by external controls (`taskset`, `cgroups`,
etc.).

Fix #55572

* Artifacts: Improve type-stability (#55707)

This improves Artifacts.jl to make `artifact"..."` fully type-stable, so
that it can be used with `--trim`.

This is a requirement for JLL support w/ trimmed executables.

Dependent on https://github.com/JuliaLang/julia/pull/55016

---------

Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>

* Remove redundant conversion in structured matrix broadcasting (#55695)

The additional construction is unnecessary, as we are already
constructing a `Matrix`.
Performance:
```julia
julia> using LinearAlgebra

julia> U = UpperTriangular(rand(1000,1000));

julia> L = LowerTriangular(rand(1000,1000));

julia> @btime $U .+ $L;
  1.956 ms (6 allocations: 15.26 MiB) # nightly
  1.421 ms (3 allocations: 7.63 MiB) # This PR
```

* [Profile] fix threading issue (#55704)

I forgot about the existence of threads, so had hard-coded this to only
support one thread. Clearly that is not sufficient though, so use the
semaphore here as it is intended to be used.

Fixes #55703

---------

Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>

* delete flaky ranges/`TwicePrecision` test (#55712)

Fixes #55710

* Avoid stack overflow in triangular eigvecs (#55497)

This fixes a stack overflow in 
```julia
julia> using LinearAlgebra, StaticArrays

julia> U = UpperTriangular(SMatrix{2,2}(1:4))
2×2 UpperTriangular{Int64, SMatrix{2, 2, Int64, 4}} with indices SOneTo(2)×SOneTo(2):
 1  3
 ⋅  4

julia> eigvecs(U)
Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.
ERROR: StackOverflowError:
Stacktrace:
 [1] eigvecs(A::UpperTriangular{Float32, SMatrix{2, 2, Float32, 4}}) (repeats 79984 times)
   @ LinearAlgebra ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/LinearAlgebra/src/triangular.jl:2749
```
After this,
```julia
julia> eigvecs(U)
2×2 Matrix{Float32}:
 1.0  1.0
 0.0  1.0
```

* builtins: add `Core.throw_methoderror` (#55705)

This allows us to simulate/mark calls that are known-to-fail.

Required for https://github.com/JuliaLang/julia/pull/54972/

* Small missing tests for Irrationals (#55657)

Looks like a bunch of methods for `Irrational`s are tested but not
picked up by coverage...

* Implement faster thread local rng for scheduler (#55501)

Implement optimal uniform random number generator using the method
proposed in https://github.com/swiftlang/swift/pull/39143 based on
OpenSSL's implementation of it in
https://github.com/openssl/openssl/blob/1d2cbd9b5a126189d5e9bc78a3bdb9709427d02b/crypto/rand/rand_uniform.c#L13-L99

This PR also fixes some bugs found while developing it. This is a
replacement for https://github.com/JuliaLang/julia/pull/50203 and fixes
the issues found by @IanButterworth with both rngs

C rng
<img width="1011" alt="image"
src="https://github.com/user-attachments/assets/0dd9d5f2-17ef-4a70-b275-1d12692be060">

New scheduler rng
<img width="985" alt="image"
src="https://github.com/user-attachments/assets/4abd0a57-a1d9-46ec-99a5-535f366ecafa">

~On my benchmarks the julia implementation seems to be almost 50% faster
than the current implementation.~
With oscars suggestion of removing the debiasing this is now almost 5x
faster than the original implementation. And almost fully branchless

We might want to backport the two previous commits since they
technically fix bugs.

---------

Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>

* Add precompile signatures to Markdown to reduce latency. (#55715)

Fixes #55706 that is seemingly a 4472x regression, not just 16x (was my
first guess, based on CondaPkg, also fixes or greatly mitigates
https://github.com/JuliaPy/CondaPkg.jl/issues/145), and large part of 3x
regression for PythonCall.

---------

Co-authored-by: Kristoffer Carlsson <kcarlsson89@gmail.com>

* Fix invalidations for FileIO (#55593)

Fixes https://github.com/JuliaIO/FileIO.jl/issues/396

* Fix various issues with PGO+LTO makefile (#55581)

This fixes various issues with the PGO+LTO makefile
- `USECCACHE` doesn't work throwing an error at
https://github.com/JuliaLang/julia/blob/eb5587dac02d1f6edf486a71b95149139cc5d9f7/Make.inc#L734
This is because setting `CC` and `CCX` by passing them as arguments to
`make` prevents `Make.inc` from prepending these variables with `ccache`
as `Make.inc` doesn't use override. To workaround this I instead set
`USECLANG` and add the toolchain to the `PATH`.
- To deal with similar issues for the other make flags, I pass them as
environment variables which can be edited in `Make.inc`.
- I add a way to build in one go by creating the `all` target, now you
can just run `make` and a PGO+LTO build that profiles Julia's build will
be generated.
- I workaround `PROFRAW_FILES` not being reevaluated after `stage1`
builds, this caused the generation of `PROFILE_FILE` to run an outdated
command if `stage1` was built and affected the profraw files. This is
important when building in one go.
- I add a way to run rules like `binary-dist` which are not defined in
this makefile with the correct toolchain which for example prevents
`make binary-dist` from unnecessarily rebuilding `sys.ji`.
- Include `-Wl,--undefined-version` till
https://github.com/JuliaLang/julia/issues/54533 gets fixed.

These changes need to be copied to the PGO+LTO+BOLT makefile and some to
the BOLT makefile in a later pr.

---------

Co-authored-by: Zentrik <Zentrik@users.noreply.github.com>

* Fix `pkgdir` for extensions (#55720)

Fixes https://github.com/JuliaLang/julia/issues/55719

---------

Co-authored-by: Max Horn <241512+fingolfin@users.noreply.github.com>

* Avoid materializing arrays in bidiag matmul (#55450)

Currently, small `Bidiagonal`/`Tridiagonal` matrices are materialized in
matrix multiplications, but this is wasteful and unnecessary. This PR
changes this to use a naive matrix multiplication for small matrices,
and fall back to the banded multiplication for larger ones.
Multiplication by a `Bidiagonal` falls back to a banded matrix
multiplication for all sizes in the current implementation, and iterates
in a cache-friendly manner for the non-`Bidiagonal` matrix.

In certain cases, the matrices were being materialized if the
non-structured matrix was small, even if the structured matrix was
large. This is changed as well in this PR.

Some improvements in performance:
```julia
julia> B = Bidiagonal(rand(3), rand(2), :U); A = rand(size(B)...); C = similar(A);

julia> @btime mul!($C, $A, $B);
  193.152 ns (6 allocations: 352 bytes) # nightly v"1.12.0-DEV.1034"
  18.826 ns (0 allocations: 0 bytes) # This PR

julia> T = Tridiagonal(rand(99), rand(100), rand(99)); A = rand(2, size(T,2)); C = similar(A);

julia> @btime mul!($C, $A, $T);
  9.398 μs (8 allocations: 79.94 KiB) # nightly
  416.407 ns (0 allocations: 0 bytes) # This PR

julia> B = Bidiagonal(rand(300), rand(299), :U); A = rand(20000, size(B,2)); C = similar(A);

julia> @btime mul!($C, $A, $B);
  33.395 ms (0 allocations: 0 bytes) # nightly
  6.695 ms (0 allocations: 0 bytes) # This PR (cache-friendly)
```

Closes https://github.com/JuliaLang/julia/pull/55414

---------

Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>

* Fix `@time_imports` extension recognition (#55718)

* drop typed GEP calls (#55708)

Now that we use LLVM 18, and almost have LLVM 19 support, do cleanup to
remove LLVM 15/16 type pointer support. LLVM now slightly prefers that
we rewrite our complex GEP to use a simple emit_ptrgep call instead,
which is also much simpler for julia to emit also.

* minor fixup for JuliaLang/julia#55705 (#55726)

* [REPL] prevent silent hang if precompile script async blocks fail (#55685)

* Various fixes to byte / bytearray search (#54579)

This was originally intended as a targeted fix to #54578, but I ran into
a bunch of smaller issues with this code that also needed to be solved
and it turned out to be difficult to fix them with small, trivial PRs.

I would also like to refactor this whole file, but I want these
correctness fixes to be merged first, because a larger refactoring has
higher risk of getting stuck without getting reviewed and merged.

## Larger things that needs decisions
* The internal union `Base.ByteArray` has been deleted. Instead, the
unions `DenseInt8` and `DenseUInt8` have been added. These more
comprehensively cover the types that was meant, e.g. `Memory{UInt8}` was
incorrectly not covered by the former. As stated in the TODO, the
concept of a "memory backed dense byte array" is needed throughout
Julia, so this ideally needs to be implemented as a single type and used
throughout Base. The fix here is a decent temporary solution. See #53178
#54581
* The `findall` docstring between two arrays was incorrectly not
attached to the method - now it is. **Note that this change _changes_
the documentation** since it includes a docstring that was previously
missed. Hence, it's an API addition.
* Added a new minimal `testhelpers/OffsetDenseArrays.jl` which provide a
`DenseVector` with offset axes for testing purposes.

## Trivial fixes
* `findfirst(==(Int8(-1)), [0xff])` and similar findlast, findnext and
findprev is no longer buggy, see #54578
* `findfirst([0x0ff], Int8[-1])` is similarly no longer buggy, see
#54578
* `findnext(==('\xa6'), "æ", 1)` and `findprev(==('\xa6'), "æa", 2)` no
longer incorrectly throws an error
* The byte-oriented find* functions now work correctly with offset
arrays
* Fixed incorrect use of `GC.@preserve`, where the pointer was taken
before the preserve block.
* More of the optimised string methods now also apply to
`SubString{String}`


Closes #54578
Co-authored-by: Martin Holters <martin.holters@hsu-hh.de>

* codegen: deduplicate code for calling a specsig (#55728)

I am tired of having 3 gratuitously different versions of this code to
maintain.

* Fix "Various fixes to byte / bytearray search"  (#55734)

Fixes the conflict between #54593 and #54579
`_search` returns `nothing` instead of zero as a sentinal in #54579

* Fix `make binary-dist` when using `USE_BINARYBUILDER_LLVM=0` (#55731)

`make binary-dist` expects lld to be in usr/tools but it ends up in
usr/bin so I copied it into usr/tools. Should fix the scheduled source
tests which currently fail at linking.

I think this is also broken with `USE_BINARYBUILDER_LLVM=0` and
`BUILD_LLD=0`, maybe
https://github.com/JuliaLang/julia/commit/ceaeb7b71bc76afaca2f3b80998164a47e30ce33
is the fix?

---------

Co-authored-by: Zentrik <Zentrik@users.noreply.github.com>

* Precompile the `@time_imports` printing so it doesn't confuse reports (#55729)

Makes functions for the report printing that can be precompiled into the
sysimage.

* codegen: some cleanup of layout computations (#55730)

Change Alloca to take an explicit alignment, rather than relying on LLVM
to guess our intended alignment from the DataLayout.

Eventually we should try to change this code to just get all layout data
from julia queries (jl_field_offset, julia_alignment, etc.) instead of
relying on creating an LLVM element type for memory and inspecting it
(CountTrackedPointers, DataLayout, and so on).

* Add some loading / LazyArtifacts precompiles to the sysimage (#55740)

Fixes https://github.com/JuliaLang/julia/issues/55725

These help LazyArtifacts mainly but seem beneficial for the sysimage.

* Update stable version number in readme to v1.10.5 (#55742)

* Add `invokelatest` barrier to `string(...)` in `@assert` (#55739)

This change protects `@assert` from invalidations to `Base.string(...)`
by adding an `invokelatest` barrier.

A common source of invalidations right now is `print(io,
join(args...))`. The problem is:
1. Inference concludes that `join(::Any...)` returns
`Union{String,AnnotatedString}`
2. The `print` call is union-split to `String` and `AnnotatedString`
3. This code is now invalidated when StyledStrings defines `print(io,
::AnnotatedString)`

The invalidation chain for `@assert` is similar: ` @assert 1 == 1` calls
into `string(::Expr)` which calls into `print(io, join(args::Any...))`.
Unfortunately that leads to the invalidation of almost all `@assert`s
without an explicit error message

Similar to
https://github.com/JuliaLang/julia/pull/55583#issuecomment-2308969806

* Don't show string concatenation error hint with zero arg `+` (#55749)

Closes #55745

* Don't leave trailing whitespace when printing do-block expr (#55738)

Before, when printing a `do`-block, we'd print a white-space after `do`
even if no arguments follow. Now we don't print that space.

---------

Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>

* Don't pass lSystem to the linker since macos always links it (#55722)

This stops it complaing about duplicated libs. 

For libunwind there isn't much we can do because it's part of lsystem
and we also need out own.

* define `numerator` and `denominator` for `Complex` (#55694)

Fixes #55693

* More testsets for SubString and a few missing tests (#55656)

Co-authored-by: Simeon David Schaub <simeon@schaub.rocks>

* Reorganize search tests into testsets (#55658)

Some of these tests are nearly 10 years old! Organized some of them into
testsets just in case one breaks in the future, should make it easier to
find the problem.

---------

Co-authored-by: Simeon David Schaub <simeon@schaub.rocks>

* fix #45494, error in ssa conversion with complex type decl (#55744)

We were missing a call to `renumber-assigned-ssavalues` in the case
where the declared type is used to assert the type of a value taken from
a closure box.

* Revert "Avoid materializing arrays in bidiag matmul" (#55737)

Reverts JuliaLang/julia#55450. @jishnub suggested reverting this PR to
fix #55727.

* Add a docs section about loading/precomp/ttfx time tuning (#55569)

* Add compat entry for `Base.donotdelete` (#55773)

* REPL: precompile in its own module because Main is closed. Add check for unexpected errors. (#55759)

* Try to put back previously flakey addmul tests (#55775)

Partial revert of #50071, inspired by conversation in
https://github.com/JuliaLang/julia/issues/49966#issuecomment-2350935477

Ran the tests 100 times to make sure we're not putting back
something that's still flaky.

Closes #49966

* Print results of `runtests` with `printstyled` (#55780)

This ensures escape characters are used only if `stdout` can accept
them.

* move null check in `unsafe_convert` of RefValue (#55766)

LLVM can optimize out this check but our optimizer can't, so this leads
to smaller IR in most cases.

* Fix hang in tmerge_types_slow (#55757)

Fixes https://github.com/JuliaLang/julia/issues/55751

Co-authored-by: Jameson Nash <jameson@juliacomputing.com>

* trace-compile: color recompilation yellow (#55763)

Marks recompilation of a method that produced a `precompile` statement
as yellow, or if color isn't supported adds a trailing comment: `#
recompilation`.

The coloring matches the `@time_imports` coloring. i.e. an excerpt of
```
% ./julia --start=no --trace-compile=stderr --trace-compile-timing -e "using InteractiveUtils; @time @time_imports using Plots"
```
![Screenshot 2024-09-13 at 5 04
24 PM](https://github.com/user-attachments/assets/85bd99e0-586e-4070-994f-2d845be0d9e7)

* Use PrecompileTools mechanics to compile REPL (#55782)

Fixes https://github.com/JuliaLang/julia/issues/55778

Based on discussion here
https://github.com/JuliaLang/julia/issues/55778#issuecomment-2352428043

With this `?reinterpret` feels instant, with only these precompiles at
the start.
![Screenshot 2024-09-16 at 9 49
39 AM](https://github.com/user-attachments/assets/20dc016d-c6f7-4870-acd7-0e795dcf541b)

* use `inferencebarrier` instead of `invokelatest` for 1-arg `@assert` (#55783)

This version would be better as per this comment:
<https://github.com/JuliaLang/julia/pull/55739#pullrequestreview-2304360447>
I confirmed this still allows us to avoid invalidations reported at
JuliaLang/julia#55583.

* Inline statically known method errors. (#54972)

This replaces the `Expr(:call, ...)` with a call of a new builtin
`Core.throw_methoderror`

This is useful because it makes very clear if something is a static
method error or a plain dynamic dispatch that always errors.
Tools such as AllocCheck or juliac can notice that this is not a genuine
dynamic dispatch, and prevent it from becoming a false positive
compile-time error.

Dependent on https://github.com/JuliaLang/julia/pull/55705

---------

Co-authored-by: Cody Tapscott <topolarity@tapscott.me>

* Fix shell `cd` error when working dir has been deleted (#41244)

root cause:
if current dir has been deleted, then pwd() will throw an IOError:
pwd(): no such file or directory (ENOENT)

---------

Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>

* codegen: fix bits compare for UnionAll (#55770)

Fixes #55768 in two parts: one is making the type computation in
emit_bits_compare agree with the parent function and two is not using
the optimized egal code for UnionAll kinds, which is different from how
the egal code itself works for kinds.

* use libuv to measure maxrss (#55806)

Libuv has a wrapper around rusage on Unix (and its equivalent on
Windows).

We should probably use it.

* REPL: use atreplinit to change the active module during precompilation (#55805)

* 🤖 [master] Bump the Pkg stdlib from 299a35610 to 308f9d32f (#55808)

* Improve codegen for `Core.throw_methoderror` and `Core.current_scope` (#55803)

This slightly improves our (LLVM) codegen for `Core.throw_methoderror`
and `Core.current_scope`

```julia
julia> foo() = Core.current_scope()
julia> bar() = Core.throw_methoderror(+, nothing)
```

Before:
```llvm
; Function Signature: foo()
define nonnull ptr @julia_foo_2488() #0 {
top:
  %0 = call ptr @jl_get_builtin_fptr(ptr nonnull @"+Core.#current_scope#2491.jit")
  %Builtin_ret = call nonnull ptr %0(ptr nonnull @"jl_global#2492.jit", ptr null, i32 0)
  ret ptr %Builtin_ret
}
; Function Signature: bar()
define void @julia_bar_589() #0 {
top:
  %jlcallframe1 = alloca [2 x ptr], align 8
  %0 = call ptr @jl_get_builtin_fptr(ptr nonnull @"+Core.#throw_methoderror#591.jit")
  %jl_nothing = load ptr, ptr @jl_nothing, align 8
  store ptr @"jl_global#593.jit", ptr %jlcallframe1, align 8
  %1 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 1
  store ptr %jl_nothing, ptr %1, align 8
  %Builtin_ret = call nonnull ptr %0(ptr nonnull @"jl_global#592.jit", ptr nonnull %jlcallframe1, i32 2)
  call void @llvm.trap()
  unreachable
}
```

After:
```llvm
; Function Signature: foo()
define nonnull ptr @julia_foo_713() #0 {
top:
  %thread_ptr = call ptr asm "movq %fs:0, $0", "=r"() #5
  %tls_ppgcstack = getelementptr inbounds i8, ptr %thread_ptr, i64 -8
  %tls_pgcstack = load ptr, ptr %tls_ppgcstack, align 8
  %current_scope = getelementptr inbounds i8, ptr %tls_pgcstack, i64 -72
  %0 = load ptr, ptr %current_scope, align 8
  ret ptr %0
}
; Function Signature: bar()
define void @julia_bar_1581() #0 {
top:
  %jlcallframe1 = alloca [2 x ptr], align 8
  %jl_nothing = load ptr, ptr @jl_nothing, align 8
  store ptr @"jl_global#1583.jit", ptr %jlcallframe1, align 8
  %0 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 1
  store ptr %jl_nothing, ptr %0, align 8
  %jl_f_throw_methoderror_ret = call nonnull ptr @jl_f_throw_methoderror(ptr null, ptr nonnull %jlcallframe1, i32 2)
  call void @llvm.trap()
  unreachable
}
```

* a minor improvement for EA-based `:effect_free`-ness refinement (#55796)

* fix #52986, regression in `@doc` of macro without REPL loaded (#55795)

fix #52986

* Assume that docstring code with no lang is julia (#55465)

* Broadcast binary ops involving strided triangular (#55798)

Currently, we evaluate expressions like `(A::UpperTriangular) +
(B::UpperTriangular)` using broadcasting if both `A` and `B` have
strided parents, and forward the summation to the parents otherwise.
This PR changes this to use broadcasting if either of the two has a
strided parent. This avoids accessing the parent corresponding to the
structural zero elements, as the index might not be initialized.

Fixes https://github.com/JuliaLang/julia/issues/55590

This isn't a general fix, as we still sum the parents if neither is
strided. However, it will address common cases.

This also improves performance, as we only need to loop over one half:
```julia
julia> using LinearAlgebra

julia> U = UpperTriangular(zeros(100,100));

julia> B = Bidiagonal(zeros(100), zeros(99), :U);

julia> @btime $U + $B;
  35.530 μs (4 allocations: 78.22 KiB) # nightly
  13.441 μs (4 allocations: 78.22 KiB) # This PR
```

* Reland " Avoid materializing arrays in bidiag matmul #55450" (#55777)

This relands #55450 and adds tests for the failing case noted in
https://github.com/JuliaLang/julia/issues/55727. The `addmul` tests that
were failing earlier pass with this change.

The issue in the earlier PR was that we were not exiting quickly for
`iszero(alpha)` in `_bibimul!` for small matrices, and were computing
the result as `C .= A * B * alpha + C * beta`. The problem with this is
that if `A * B` contains `NaN`s, this propagates to `C` even if `alpha
=== 0.0`. This is fixed now, and the result is only computed if
`!iszero(alpha)`.

* move the test case added in #50174 to test/core.jl (#55811)

Also renames the name of the test function to avoid name collision.

* [Random] Avoid conversion to `Float32` in `Float16` sampler (#55819)

* simplify the fields of `UnionSplitInfo` (#55815)

xref:
<https://github.com/JuliaLang/julia/pull/54972#discussion_r1766187771>

* Add errorhint for nonexisting fields and properties (#55165)

I played a bit with error hints and crafted this:
```julia
julia> (1+2im).real
ERROR: FieldError: type Complex has no field real, available fields: `re`, `im`

julia> nothing.xy
ERROR: FieldError: type Nothing has no field xy; Nothing has no fields at all.

julia> svd(rand(2,2)).VV
ERROR: FieldError: type SVD has no field VV, available fields: `U`, `S`, `Vt`
Available properties: `V`
```

---------

Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>

* Improve printing of several arguments (#55754)

Following a discussion on
[Discourse](https://discourse.julialang.org/t/string-optimisation-in-julia/119301/10?u=gdalle),
this PR tries to improve `print` (and variants) for more than one
argument.
The idea is that `for` is type-unstable over the tuple `args`, while
`foreach` unrolls.

---------

Co-authored-by: Steven G. Johnson <stevenj@mit.edu>

* Markdown: support `parse(::AbstractString)` (#55747)

`Markdown.parse` is documented to accept `AbstractString` but it was
implemented by calling `IOBuffer` on the string argument. `IOBuffer`,
however, is documented only for `String` arguments.

This commit changes the current `parse(::AbstractString)` to
`parse(::String)` and implements `parse(::AbstractString)` by converting
the argument to `String`.

Now, even `LazyString`s can be parsed to Markdown representation.

Fixes #55732

* better error for esc outside of macro expansion (#55797)

fixes #55788

---------

Co-authored-by: Jeff Bezanson <jeff.bezanson@gmail.com>

* allow kronecker product between recursive triangular matrices (#55527)

Using the recently introduced recursive `zero` I can remove the
specialization to `<:Number` as @dkarrasch wanted to do in #54413.

---------

Co-authored-by: Jishnu Bhattacharya <jishnub.github@gmail.com>

* [Dates] Make test more robust against non-UTC timezones (#55829)

`%M` is the format specifier for the minutes, not the month (which
should be `%m`), and it was used twice.

Also, on macOS `Libc.strptime` internally calls `mktime` which depends
on the local timezone. We now temporarily set `TZ=UTC` to avoid
depending on the local timezone.

Fix #55827.

* 🤖 [master] Bump the Pkg stdlib from 308f9d32f to ef9f76c17 (#55838)

* lmul!/rmul! for banded matrices (#55823)

This adds fast methods for `lmul!` and `rmul!` between banded matrices
and numbers.
Performance impact:
```julia
julia> T = Tridiagonal(rand(999), rand(1000), rand(999));

julia> @btime rmul!($T, 0.2);
  4.686 ms (0 allocations: 0 bytes) # nightly v"1.12.0-DEV.1225"
  669.355 ns (0 allocations: 0 bytes) # this PR
```

* Specialize indexing triangular matrices with BandIndex (#55644)

With this, certain indexing operations involving a `BandIndex` may be
evaluated as constants. This isn't used directly presently, but might
allow for more performant broadcasting in the future.
With this,
```julia
julia> n = 3; T = Tridiagonal(rand(n-1), rand(n), rand(n-1));

julia> @code_warntype ((T,j) -> UpperTriangular(T)[LinearAlgebra.BandIndex(2,j)])(T, 1)
MethodInstance for (::var"#17#18")(::Tridiagonal{Float64, Vector{Float64}}, ::Int64)
  from (::var"#17#18")(T, j) @ Main REPL[12]:1
Arguments
  #self#::Core.Const(var"#17#18"())
  T::Tridiagonal{Float64, Vector{Float64}}
  j::Int64
Body::Float64
1 ─ %1 = Main.UpperTriangular(T)::UpperTriangular{Float64, Tridiagonal{Float64, Vector{Float64}}}
│   %2 = LinearAlgebra.BandIndex::Core.Const(LinearAlgebra.BandIndex)
│   %3 = (%2)(2, j)::Core.PartialStruct(LinearAlgebra.BandIndex, Any[Core.Const(2), Int64])
│   %4 = Base.getindex(%1, %3)::Core.Const(0.0)
└──      return %4
```
The indexing operation may be evaluated at compile-time, as the band
index is constant-propagated.

* Replace regex package module checks with actual code checks (#55824)

Fixes https://github.com/JuliaLang/julia/issues/55792
Replaces https://github.com/JuliaLang/julia/pull/55822
Improves what https://github.com/JuliaLang/julia/pull/51635 was trying
to do

i.e.
```
ERROR: LoadError: `using/import Printf` outside of a Module detected. Importing a package outside of a module is not allowed during package precompilation.
```

* fall back to slower stat filesize if optimized filesize fails (#55641)

* Use "index" instead of "subscript" to refer to indexing in performance tips (#55846)

* privatize annotated string API, take two (#55845)

https://github.com/JuliaLang/julia/pull/55453 is stuck on StyledStrings
and Base documentation being entangled and there isn't a good way to
have the documentation of Base types / methods live in an stdlib. This
is a stop gap solution to finally be able to move forwards with 1.11.

* 🤖 [master] Bump the Downloads stdlib from 1061ecc to 89d3c7d (#55854)

Stdlib: Downloads
URL: https://github.com/JuliaLang/Downloads.jl.git
Stdlib branch: master
Julia branch: master
Old commit: 1061ecc
New commit: 89d3c7d
Julia version: 1.12.0-DEV
Downloads version: 1.6.0(It's okay that it doesn't match)
Bump invoked by: @KristofferC
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
https://github.com/JuliaLang/Downloads.jl/compare/1061ecc377a053fce0df94e1a19e5260f7c030f5...89d3c7dded535a77551e763a437a6d31e4d9bf84

```
$ git log --oneline 1061ecc..89d3c7d
89d3c7d fix cancelling upload requests (#259)
df33406 gracefully cancel a request (#256)
```

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>

* docs: Small edits to noteworthy differences (#55852)

- The first line edit changes it so that the Julia example goes first,
not the Python example, keeping with the general flow of the lines
above.
- The second adds a "the" that is missing.

* Add filesystem func to transform a path to a URI (#55454)

In a few places across Base and the stdlib, we emit paths that we like
people to be able to click on in their terminal and editor. Up to this
point, we have relied on auto-filepath detection, but this does not
allow for alternative link text, such as contracted paths.

Doing so (via OSC 8 terminal links for example) requires filepath URI
encoding.

This functionality was previously part of a PR modifying stacktrace
printing (#51816), but after that became held up for unrelated reasons
and another PR appeared that would benefit from this utility (#55335),
I've split out this functionality so it can be used before the
stacktrace printing PR is resolved.

* constrain the path argument of `include` functions to `AbstractString` (#55466)

Each `Module` defined with `module` automatically gets an `include`
function with two methods. Each of those two methods takes a file path
as its last argument. Even though the path argument is unconstrained by
dispatch, it's documented as constrained with `::AbstractString`:

https://docs.julialang.org/en/v1.11-dev/base/base/#include

Furthermore, I think that any invocation of `include` with a
non-`AbstractString` path will necessarily throw a `MethodError`
eventually. Thus this change should be harmless.

Adding the type constraint to the path argument is an improvement
because any possible exception would be thrown earlier than before.

Apart from modules defined with `module`, the same issue is present with
the anonymous modules created by `evalfile`, which is also addressed.

Sidenote: `evalfile` seems to be completely untested apart from the test
added here.

Co-authored-by: Florian <florian.atteneder@gmail.com>

* Mmap: fix grow! for non file IOs (#55849)

Fixes https://github.com/JuliaLang/julia/issues/54203
Requires #55641

Based on
https://github.com/JuliaLang/julia/pull/55641#issuecomment-2334162489
cc. @JakeZw @ronisbr

---------

Co-authored-by: Jameson Nash <vtjnash@gmail.com>

* codegen: split gc roots from other bits on stack (#55767)

In order to help avoid memory provenance issues, and better utilize
stack space (somewhat), and use FCA less, change the preferred
representation of an immutable object to be a pair of
`<packed-data,roots>` values. This packing requires some care at the
boundaries and if the expected field alignment exceeds that of a
pointer. The change is expected to eventually make codegen more flexible
at representing unions of values with both bits and pointer regions.

Eventually we can also have someone improve the late-gc-lowering pass to
take advantage of this increased information accuracy, but currently it
will not be any better than before at laying out the frame.

* Refactoring to be considered before adding MMTk

* Removing jl_gc_notify_image_load, since it's a new function and not part of the refactoring

* Moving gc_enable code to gc-common.c

* Addressing PR comments

* Push resolution of merge conflict

* Removing jl_gc_mark_queue_obj_explicit extern definition from scheduler.c

* Don't need the getter function since it's possible to use jl_small_typeof directly

* WIP: Adding support for MMTk/Immix

* Refactoring to be considered before adding MMTk

* Adding fastpath allocation

* Fixing removed newlines

* Refactoring to be considered before adding MMTk

* Adding a few comments; Moving some functions to be closer together

* Fixing merge conflicts

* Applying changes from refactoring before adding MMTk

* Update TaskLocalRNG docstring according to #49110 (#55863)

Since #49110, which is included in 1.10 and 1.11, spawning a task no
longer advances the parent task's RNG state, so this statement in the
docs was incorrect.

* Root globals in toplevel exprs (#54433)

This fixes #54422, the code here assumes that top level exprs are always
rooted, but I don't see that referenced anywhere else, or guaranteed, so
conservatively always root objects that show up in code.

* codegen: fix alignment typos (#55880)

So easy to type jl_datatype_align to get the natural alignment instead
of julia_alignment to get the actual alignment. This should fix the
Revise workload.

Change is visible with
```
julia> code_llvm(Random.XoshiroSimd.forkRand, (Random.TaskLocalRNG, Base.Val{8}))
```

* Fix some corner cases of `isapprox` with unsigned integers (#55828)

* 🤖 [master] Bump the Pkg stdlib from ef9f76c17 to 51d4910c1 (#55896)

* Profile: fix order of fields in heapsnapshot & improve formatting (#55890)

* Profile: Improve generation of clickable terminal links (#55857)

* inference: add missing `TypeVar` handling for `instanceof_tfunc` (#55884)

I thought these sort of problems had been addressed by d60f92c, but it
seems some were missed. Specifically, `t.a` and `t.b` from `t::Union`
could be `TypeVar`, and if they are passed to a subroutine or recursed
without being unwrapped or rewrapped, errors like JuliaLang/julia#55882
could occur.

This commit resolves the issue by calling `unwraptv` in the `Union`
handling within `instanceof_tfunc`. I also found a similar issue inside
`nfields_tfunc`, so that has also been fixed, and test cases have been
added. While I haven't been able to make up a test case specifically for
the fix in `instanceof_tfunc`, I have confirmed that this commit
certainly fixes the issue reported in JuliaLang/julia#55882.

- fixes JuliaLang/julia#55882

* Install terminfo data under /usr/share/julia (#55881)

Just like all other libraries, we don't want internal Julia files to
mess with system files.

Introduced by https://github.com/JuliaLang/julia/pull/55411.

* expose metric to report reasons why full GCs were triggered (#55826)

Additional GC observability tool.

This will help us to diagnose why some of our servers are triggering so
many full GCs in certain circumstances.

* Revert "Improve printing of several arguments" (#55894)

Reverts JuliaLang/julia#55754 as it overrode some performance heuristics
which appeared to be giving a significant gain/loss in performance:
Closes https://github.com/JuliaLang/julia/issues/55893

* Do not trigger deprecation warnings in `Test.detect_ambiguities` and `Test.detect_unbound_args` (#55869)

#55868

* do not intentionally suppress errors in precompile script from being reported or failing the result (#55909)

I was slightly annoying that the build was set up to succeed if this
step failed, so I removed the error suppression and fixed up the script
slightly

* Remove eigvecs method for SymTridiagonal (#55903)

The fallback method does the same, so this specialized method isn't
necessary

* add --trim option for generating smaller binaries (#55047)

This adds a command line option `--trim` that builds images where code
is only included if it is statically reachable from methods marked using
the new function `entrypoint`. Compile-time errors are given for call
sites that are too dynamic to allow trimming the call graph (however
there is an `unsafe` option if you want to try building anyway to see
what happens).

The PR has two other components. One is changes to Base that generally
allow more code to be compiled in this mode. These changes will either
be merged in separate PRs or moved to a separate part of the workflow
(where we will build a custom system image for this purpose). The branch
is set up this way to make it easy to check out and try the
functionality.

The other component is everything in the `juliac/` directory, which
implements a compiler driver script based on this new option, along with
some examples and tests. This will eventually become a package "app"
that depends on PackageCompiler and provides a CLI for all of this
stuff, so it will not be merged here. To try an example:

```
julia contrib/juliac.jl --output-exe hello --trim test/trimming/hello.jl
```

When stripped the resulting executable is currently about 900kb on my
machine.

Also includes a lot of work by @topolarity

---------

Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
Co-authored-by: Tim Holy <tim.holy@gmail.com>
Co-authored-by: Cody Tapscott <topolarity@tapscott.me>

* fix rawbigints OOB issues (#55917)

Fixes issues introduced in #50691 and found in #55906:
* use `@inbounds` and `@boundscheck` macros in rawbigints, for catching
OOB with `--check-bounds=yes`
* fix OOB in `truncate`

* prevent loading other extensions when precompiling an extension (#55589)

The current way of loading extensions when precompiling an extension
very easily leads to cycles. For example, if you have more than one
extension and you happen to transitively depend on the triggers of one
of your extensions you will immediately hit a cycle where the extensions
will try to load each other indefinitely. This is an issue because you
cannot directly influence your transitive dependency graph so from this
p.o.v the current system of loading extension is "unsound".

The test added here checks this scenario and we can now precompile and
load it without any warnings or issues.

Would have made https://github.com/JuliaLang/julia/issues/55517 a non
issue.

Fixes https://github.com/JuliaLang/julia/issues/55557

---------

Co-authored-by: KristofferC <kristoffer.carlsson@juliacomputing.com>

* TOML: Avoid type-pirating `Base.TOML.Parser` (#55892)

Since stdlibs can be duplicated but Base never is, `Base.require_stdlib`
makes type piracy even more complicated than it normally would be.

To adapt, this changes `TOML.Parser` to be a type defined by the TOML
stdlib, so that we can define methods on it without committing
type-piracy and avoid problems like Pkg.jl#4017

Resolves
https://github.com/JuliaLang/Pkg.jl/issues/4017#issuecomment-2377589989

* [FileWatching] fix PollingFileWatcher design and add workaround for a stat bug

What started as an innocent fix for a stat bug on Apple (#48667) turned
into a full blown investigation into the design problems with the libuv
backend for PollingFileWatcher, and writing my own implementation of it
instead which could avoid those singled-threaded concurrency bugs.

* [FileWatching] fix FileMonitor similarly and improve pidfile reliability

Previously pidfile used the same poll_interval as sleep to detect if
this code made any concurrency mistakes, but we do not really need to do
that once FileMonitor is fixed to be reliable in the presence of
parallel concurrency (instead of using watch_file).

* [FileWatching] reorganize file and add docs

* Add `--trace-dispatch` (#55848)

* relocation: account for trailing path separator in depot paths (#55355)

Fixes #55340

* change compiler to be stackless (#55575)

This change ensures the compiler uses very little stack, making it
compatible with running on any arbitrary system stack size and depths
much more reliably. It also could be further modified now to easily add
various forms of pause-able/resumable inference, since there is no
implicit state on the stack--everything is local and explicit now.

Whereas before, less than 900 frames would crash in less than a second:
```
$ time ./julia -e 'f(::Val{N}) where {N} = N <= 0 ? 0 : f(Val(N - 1)); f(Val(1000))'
Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.
Internal error: during type inference of
f(Base.Val{1000})
Encountered stack overflow.
This might be caused by recursion over very long tuples or argument lists.

[23763] signal 6: Abort trap: 6
in expression starting at none:1
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 1 (Pool: 1; Big: 0); GC: 0
Abort trap: 6

real	0m0.233s
user	0m0.165s
sys	0m0.049s
````

Now: it is effectively unlimited, as long as you are willing to wait for
it:
```
$ time ./julia -e 'f(::Val{N}) where {N} = N <= 0 ? 0 : f(Val(N - 1)); f(Val(50000))'
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 2500 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 5000 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 10000 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 20000 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 40000 frames (may be slow).
real	7m4.988s

$ time ./julia -e 'f(::Val{N}) where {N} = N <= 0 ? 0 : f(Val(N - 1)); f(Val(1000))'
real	0m0.214s
user	0m0.164s
sys	0m0.044s

$ time ./julia -e '@noinline f(::Val{N}) where {N} = N <= 0 ? GC.safepoint() : f(Val(N - 1)); f(Val(5000))'
info: inference of f(Base.Val{5000}) from f(Base.Val{N}) where {N} exceeding 2500 frames (may be slow).
info: inference of f(Base.Val{5000}) from f(Base.Val{N}) where {N} exceeding 5000 frames (may be slow).
real	0m8.609s
user	0m8.358s
sys	0m0.240s
```

* optimizer: simplify the finalizer inlining pass a bit (#55934)

Minor adjustments have been made to the algorithm of the finalizer
inlining pass. Previously, it required that the finalizer registration
dominate all uses, but this is not always necessary as far as the
finalizer inlining point dominates all the uses. So the check has been
relaxed. Other minor fixes have been made as well, but their importance
is low.

* Limit `@inbounds` to indexing in the dual-iterator branch in `copyto_unaliased!` (#55919)

This simplifies the `copyto_unalised!` implementation where the source
and destination have different `IndexStyle`s, and limits the `@inbounds`
to only the indexing operation. In particular, the iteration over
`eachindex(dest)` is not marked as `@inbounds` anymore. This seems to
help with performance when the destination uses Cartesian indexing.
Reduced implementation of the branch:
```julia
function copyto_proposed!(dest, src)
    axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
    iterdest, itersrc = eachindex(dest), eachindex(src)
    for (destind, srcind) in zip(iterdest, itersrc)
        @inbounds dest[destind] = src[srcind]
    end
    dest
end

function copyto_current!(dest, src)
    axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
    iterdest, itersrc = eachindex(dest), eachindex(src)
    ret = iterate(iterdest)
    @inbounds for a in src
        idx, state = ret::NTuple{2,Any}
        dest[idx] = a
        ret = iterate(iterdest, state)
    end
    dest
end

function copyto_current_limitinbounds!(dest, src)
    axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
    iterdest, itersrc = eachindex(dest), eachindex(src)
    ret = iterate(iterdest)
    for isrc in itersrc
        idx, state = ret::NTuple{2,Any}
        @inbounds dest[idx] = src[isrc]
        ret = iterate(iterdest, state)
    end
    dest
end
```
```julia
julia> a = zeros(40000,4000); b = rand(size(a)...);

julia> av = view(a, UnitRange.(axes(a))...);

julia> @btime copyto_current!($av, $b);
  617.704 ms (0 allocations: 0 bytes)

julia> @btime copyto_current_limitinbounds!($av, $b);
  304.146 ms (0 allocations: 0 bytes)

julia> @btime copyto_proposed!($av, $b);
  240.217 ms (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.12.0-DEV.1260
Commit 4a4ca9c8152 (2024-09-28 01:49 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = subl
```
I'm not quite certain why the proposed implementation here
(`copyto_proposed!`) is even faster than
`copyto_current_limitinbounds!`. In any case, `copyto_proposed!` is
easier to read, so I'm not complaining.

This fixes https://github.com/JuliaLang/julia/issues/53158

* Strong zero in Diagonal triple multiplication (#55927)

Currently, triple multiplication with a `LinearAlgebra.BandedMatrix`
sandwiched between two `Diagonal`s isn't associative, as this is
implemented using broadcasting, which doesn't assume a strong zero,
whereas the two-term matrix multiplication does.
```julia
julia> D = Diagonal(StepRangeLen(NaN, 0, 3));

julia> B = Bidiagonal(1:3, 1:2, :U);

julia> D * B * D
3×3 Matrix{Float64}:
 NaN  NaN  NaN
 NaN  NaN  NaN
 NaN  NaN  NaN

julia> (D * B) * D
3×3 Bidiagonal{Float64, Vector{Float64}}:
 NaN    NaN       ⋅ 
    ⋅   NaN    NaN
    ⋅      ⋅   NaN

julia> D * (B * D)
3×3 Bidiagonal{Float64, Vector{Float64}}:
 NaN    NaN       ⋅ 
    ⋅   NaN    NaN
    ⋅      ⋅   NaN
```
This PR ensures that the 3-term multiplication is evaluated as a
sequence of two-term multiplications, which fixes this issue. This also
improves performance, as only the bands need to be evaluated now.
```julia
julia> D = Diagonal(1:1000); B = Bidiagonal(1:1000, 1:999, :U);

julia> @btime $D * $B * $D;
  656.364 μs (11 allocations: 7.63 MiB) # v"1.12.0-DEV.1262"
  2.483 μs (12 allocations: 31.50 KiB) # This PR
```

* Fix dispatch on `alg` in Float16 Hermitian eigen (#55928)

Currently,
```julia
julia> using LinearAlgebra

julia> A = Hermitian(reshape(Float16[1:16;], 4, 4));

julia> eigen(A).values |> typeof
Vector{Float16} (alias for Array{Float16, 1})

julia> eigen(A, LinearAlgebra.QRIteration()).values |> typeof
Vector{Float32} (alias for Array{Float32, 1})
```
This PR moves the specialization on the `eltype` to an internal method,
so that firstly all `alg`s dispatch to that method, and secondly, there
are no ambiguities introduce by specializing the top-level `eigen`. The
latter currently causes test failures in `StaticArrays`
(https://github.com/JuliaArrays/StaticArrays.jl/actions/runs/11092206012/job/30816955210?pr=1279),
and should be fixed by this PR.

* Remove specialized `ishermitian` method for `Diagonal{<:Real}` (#55948)

The fallback method for `Diagonal{<:Number}` handles this already by
checking that the `diag` is real, so we don't need this additional
specialization.

* Fix logic in `?` docstring example (#55945)

* fix `unwrap_macrocalls` (#55950)

The implementation of `unwrap_macrocalls` has assumed that what
`:macrocall` wraps is always an `Expr` object, but that is not
necessarily correct:
```julia
julia> Base.@assume_effects :nothrow @show 42
ERROR: LoadError: TypeError: in typeassert, expected Expr, got a value of type Int64
Stacktrace:
 [1] unwrap_macrocalls(ex::Expr)
   @ Base ./expr.jl:906
 [2] var"@assume_effects"(__source__::LineNumberNode, __module__::Module, args::Vararg{Any})
   @ Base ./expr.jl:756
in expression starting at REPL[1]:1
```
This commit addresses this issue.

* make faster BigFloats (#55906)

We can coalesce the two required allocations for the MFPR BigFloat API
design into one allocation, hopefully giving a easy performance boost.
It would have been slightly easier and more efficient if MPFR BigFloat
was already a VLA instead of containing a pointer here, but that does
not prevent the optimization.

* Add propagate_inbounds_meta to atomic genericmemory ops (#55902)

`memoryref(mem, i)` will otherwise emit a boundscheck.

```
; │ @ /home/vchuravy/WorkstealingQueues/src/CLL.jl:53 within `setindex_atomic!` @ genericmemory.jl:329
; │┌ @ boot.jl:545 within `memoryref`
    %ptls_field = getelementptr inbounds i8, ptr %tls_pgcstack, i64 16
    %ptls_load = load ptr, ptr %ptls_field, align 8
    %"box::GenericMemoryRef" = call noalias nonnull align 8 dereferenceable(32) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 552, i32 32, i64 23456076646928) #9
    %"box::GenericMemoryRef.tag_addr" = getelementptr inbounds i64, ptr %"box::GenericMemoryRef", i64 -1
    store atomic i64 23456076646928, ptr %"box::GenericMemoryRef.tag_addr" unordered, align 8
    store ptr %memoryref_data, ptr %"box::GenericMemoryRef", align 8
    %.repack8 = getelementptr inbounds { ptr, ptr }, ptr %"box::GenericMemoryRef", i64 0, i32 1
    store ptr %memoryref_mem, ptr %.repack8, align 8
    call void @ijl_bounds_error_int(ptr nonnull %"box::GenericMemoryRef", i64 %7)
    unreachable
```

For the Julia code:

```julia
function Base.setindex_atomic!(buf::WSBuffer{T}, order::Symbol, val::T, idx::Int64) where T
    @inbounds Base.setindex_atomic!(buf.buffer, order, val,((idx - 1) & buf.mask) + 1)
end
```

from
https://github.com/gbaraldi/WorkstealingQueues.jl/blob/0ebc57237cf0c90feedf99e4338577d04b67805b/src/CLL.jl#L41

* fix rounding mode in construction of `BigFloat` from pi (#55911)

The default argument of the method was outdated, reading the global
default rounding directly, bypassing the `ScopedValue` stuff.

* fix `nonsetable_type_hint_handler` (#55962)

The current implementation is wrong, causing it to display inappropriate
hints like the following:
```julia
julia> s = Some("foo");

julia> s[] = "bar"
ERROR: MethodError: no method matching setindex!(::Some{String}, ::String)
The function `setindex!` exists, but no method is defined for this combination of argument types.
You attempted to index the type String, rather than an instance of the type. Make sure you create the type using its constructor: d = String([...]) rather than d = String
Stacktrace:
 [1] top-level scope
   @ REPL[2]:1
```

* REPL: make UndefVarError aware of imported modules (#55932)

* fix test/staged.jl (#55967)

In particular, the implementation of `overdub_generator54341` was
dangerous. This fixes it up.

* Explicitly store a module's location (#55963)

Revise wants to know what file a module's `module` definition is in.
Currently it does this by looking at the source location for the
implicitly generated `eval` method. This is terrible for two reasons:

1. The method may not exist if the module is a baremodule (which is not
particularly common, which is probably why we haven't seen it).
2. The fact that the implicitly generated `eval` method has this
location information is an implementation detail that I'd like to get
rid of (#55949).

This PR adds explicit file/line info to `Module`, so that Revise doesn't
have to use the hack anymore.

* mergewith: add single argument example to docstring (#55964)

I ran into this edge case. I though it should be documented.
---------

Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>

* [build] avoid libedit linkage and align libccalllazy* SONAMEs (#55968)

While building the 1.11.0-rc4 in Homebrew[^1] in preparation for 1.11.0
release (and to confirm Sequoia successfully builds) I noticed some odd
linkage for our Linux builds, which included of:

1. LLVM libraries were linking to `libedit.so`, e.g.
    ```
    Dynamic Section:
      NEEDED       libedit.so.0
      NEEDED       libz.so.1
      NEEDED       libzstd.so.1
      NEEDED       libstdc++.so.6
      NEEDED       libm.so.6
      NEEDED       libgcc_s.so.1
      NEEDED       libc.so.6
      NEEDED       ld-linux-x86-64.so.2
      SONAME       libLLVM-16jl.so
    ```
    CMakeCache.txt showed
    ```
    //Use libedit if available.
    LLVM_ENABLE_LIBEDIT:BOOL=ON
    ```
Which might be overriding `HAVE_LIBEDIT` at
https://github.com/JuliaLang/llvm-project/blob/julia-release/16.x/llvm/cmake/config-ix.cmake#L222-L225.
So just added `LLVM_ENABLE_LIBEDIT`

2. Wasn't sure if there was a reason for this but `libccalllazy*` had
mismatched SONAME:
    ```console
    ❯ objdump -p lib/julia/libccalllazy* | rg '\.so'
    lib/julia/libccalllazybar.so:	file format elf64-x86-64
      NEEDED       ccalllazyfoo.so
      SONAME       ccalllazybar.so
    lib/julia/libccalllazyfoo.so:	file format elf64-x86-64
      SONAME       ccalllazyfoo.so
    ```
    Modifying this, but can drop if intentional.

---

[^1]: https://github.com/Homebrew/homebrew-core/pull/192116

* Add missing `copy!(::AbstractMatrix, ::UniformScaling)` method (#55970)

Hi everyone! First PR to Julia here.

It was noticed in a Slack thread yesterday
that `copy!(A, I)` doesn't work, but `copyto!(A, I)` does. This PR adds
the missing method for `copy!(::AbstractMatrix, ::UniformScaling)`,
which simply defers to `copyto!`, and corresponding tests.

I added a `compat` notice for Julia 1.12.

---------

Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>

* Add forward progress update to NEWS.md (#54089)

Closes #40009 which was left open because of the needs news tag.

---------

Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>

* Fix an intermittent test failure in `core` test (#55973)

The test wants to assert that `Module` is not resolved in `Main`, but
other tests do resolve this identifier, so the test can fail depending
on test order (and I've been seeing such failures on CI recently). Fix
that by running the test in a fresh subprocess.

* fix comma logic in time_print (#55977)

Minor formatting fix

* optimizer: fix up the inlining algorithm to use correct `nargs`/`isva` (#55976)

It appears that inlining.jl was not updated in JuliaLang/julia#54341.
Specifically, using `nargs`/`isva` from `mi.def::Method` in
`ir_prepare_inlining!` causes the following error to occur:
```julia
function generate_lambda_ex(world::UInt, source::LineNumberNode,
                            argnames, spnames, @nospecialize body)
    stub = Core.GeneratedFunctionStub(identity, Core.svec(argnames...), Core.svec(spnames...))
    return stub(world, source, body)
end
function overdubbee54341(a, b)
    return a + b
end
const overdubee_codeinfo54341 = code_lowered(overdubbee54341, Tuple{Any, Any})[1]
function overdub_generator54341(world::UInt, source::LineNumberNode, selftype, fargtypes)
    if length(fargtypes) != 2
        return generate_lambda_ex(world, source,
            (:overdub54341, :args), (), :(error("Wrong number of arguments")))
    else
        return copy(overdubee_codeinfo54341)
    end
end
@eval function overdub54341(args...)
    $(Expr(:meta, :generated, overdub_generator54341))
    $(Expr(:meta, :generated_only))
end
topfunc(x) = overdub54341(x, 2)
```
```julia
julia> topfunc(1)
Internal error: during type inference of
topfunc(Int64)
Encountered unexpected error in runtime:
BoundsError(a=Array{Any, 1}(dims=(2,), mem=Memory{Any}(8, 0x10632e780)[SSAValue(2), SSAValue(3), #<null>, #<null>, #<null>, #<null>, #<null>, #<null>]), i=(3,))
throw_boundserror at ./essentials.jl:14
getindex at ./essentials.jl:909 [inlined]
ssa_substitute_op! at ./compiler/ssair/inlining.jl:1798
ssa_substitute_op! at ./compiler/ssair/inlining.jl:1852
ir_inline_item! at ./compiler/ssair/inlining.jl:386
...
```

This commit updates the abstract interpretation and inlining algorithm
to use the `nargs`/`isva` values held by `CodeInfo`. Similar
modifications have also been made to EscapeAnalysis.jl.

@nanosoldier `runbenchmarks("inference", vs=":master")`

* Add `.zed` directory to `.gitignore` (#55974)

Similar to the `vscode` config directory, we may ignore the `zed`
directory as well.

* typeintersect: reduce unneeded allocations from `merge_env`

`merge_env` and `final_merge_env` could be skipped
for emptiness test or if we know there's only 1 valid Union state.

* typeintersect: trunc env before nested `intersect_all` if valid.

This only covers the simplest cases. We might want a full dependence analysis and keep env length minimum in the future.

* `@time` actually fix time report commas & add tests (#55982)

https://github.com/JuliaLang/julia/pull/55977 looked simple but wasn't
quite right because of a bad pattern in the lock conflicts report
section.

So fix and add tests.

* adjust EA to JuliaLang/julia#52527 (#55986)

`Ent…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests

7 participants