diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index d7d8dcb..60e06ac 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.12.0-DEV.766","generation_timestamp":"2024-06-22T10:05:04","documenter_version":"1.4.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.12.0-DEV.766","generation_timestamp":"2024-06-22T10:34:08","documenter_version":"1.4.1"}} \ No newline at end of file diff --git a/dev/base/index.html b/dev/base/index.html index a3c2b52..acdca1f 100644 --- a/dev/base/index.html +++ b/dev/base/index.html @@ -1,2 +1,2 @@ -MemViews in Base · MemViews.jl

MemViews.jl

It is my hope that MemViews, or something like MemViews, will eventually be moved into Base Julia. This is because Base Julia, too, includes code that uses the concept of a memory-backed array. However, Base currently lacks any kind of interface and internal API to handle memory-backed objects.

See the related issue on JuliaLang/julia.

What's wrong with SubArrays of Memory as memory views?

SubArray is generic over too much, and is therefore too hard to reason about, and to uphold its guarantees.

First, it's generic over the array type, meaning it may be backed by Memory or Vector, but also UnitRange or Base.LogRange (bitstypes, so not backed by memory), BitMatrix (memory-backed, but elements are stored packed), OffsetArrays, CodeUnits (memory-backed but immutable) and many more. What can you do with the underlying array, generally speaking? Take a pointer to it? No. Assume one-based indexing? No. Assume a stride of one? No. Assume mutability? No.

Second, it's generic over the index type. It may be UnitRange{Int}, of course, but also Base.OneTo{UInt16}, or StepRange{BigInteger}, CartesianIndices (which it itself generic over the indexes), Colon. Can you define the subset of these types which indicate dense indices? I can't.

Third, it's multidimensional. It may collect to a Vector or Matrix.

This is not a design flaw of SubArray - it's a perfectly fine design choice, which enables SubArray to be extremely flexible and broadly useful. Unfortunately, it also makes it nearly impossible to write robust, low-level code using SubArray, because it's almost imopssible not to violate the assumptions of a subset of SubArrays many concrete types. Practically speaking, what happens is that methods taking SubArray fall back to only assuming what can be assumed about AbstractArray - which may be inefficient, and buggy (as the recurring bugs due to assumption of one-based indexing has taught us).

In contrast, a MemView{T} is always represented by exactly a MemoryRef{T} and an Int as length. You know exactly what you get.

Design decisions

Mutability

Mutable and immutable memory views are statically distinguished, such that users can write methods that only take mutable memory views. This will statically prevent users from accidentally mutating e.g. strings.

MemKind

The MemKind trait is used because constructing a MemView only for dispatch purposes may not be able to be optimised away by the compiler for some types (currently, strings).

MemKind could be replaced with a function that returned nothing, or the correct MemView type directly, but it's nicer to dispatch on ::MemKind than on ::Union{Nothing, Type{<:MemView}}.

Limitations

  • Currently, MemView does not make use of Core.GenericMemory's additional parameters, such as atomicity or address space. This may easily be added with a GenericMemView type, similar to Memory / GenericMemory.

  • I can't figure out how to support reinterpreted arrays. Any way I can think of doing so will sigificantly complicate MemView, which takes away some of the appeal of this type's simplicity. It's possible that reinterpreted arrays are so outside Julia's ordinary memory management that this simply can't be done.

  • Currently, Strings are not backed by Memory in Julia. Therefore, creating a MemView of a string requires heap-allocating a new Memory pointing to the existing memory of the string. This can be fixed if String is re-implemented to be backed by Memory, but I don't know enough details about the implementation of String to know if this is practical.

Alternative proposal

In examples/alternative.jl, there is an implementation where a MemView is just a pointer and a length. This makes it nearly identical to Random.UnsafeView, however, compared to UnsafeView, this propsal has:

  • The MemKind trait, useful to control dispatch to functions that can treat arrays as being memory
  • The distinction between mutable and immutable memory views

Overall, I like the alternative proposal less. Raw pointers are bad for safety and ergonomics, and they interact less nicely with the Julia runtime. Also, the existing GenericMemoryRef is essentially perfect for this purpose.

Advantages

  • Pointer-based memviews are cheaper to construct, and do not allocate for strings, unlike Memory. Perhaps in the future, strings too will be backed by Memory.
  • Their interaction with the GC is simpler (as there is no interaction)

Disadvantages

  • While some low-level methods using MemView will just forward to calling external libraries where using a pointer is fine, many will be written in pure Julia. There, it's less nice to have raw pointers.
  • Code using pointer-based memviews must make sure to only have the views exist inside GC.@preserve blocks, which is annoying and will almost certainly be violated accidentally somewhere
  • We can't use advantages of the existing Memory infrasrtructure, e.g. having a GenericMemRef which supports atomic memory.
+MemViews in Base · MemViews.jl

MemViews.jl

It is my hope that MemViews, or something like MemViews, will eventually be moved into Base Julia. This is because Base Julia, too, includes code that uses the concept of a memory-backed array. However, Base currently lacks any kind of interface and internal API to handle memory-backed objects.

See the related issue on JuliaLang/julia.

What's wrong with SubArrays of Memory as memory views?

SubArray is generic over too much, and is therefore too hard to reason about, and to uphold its guarantees.

First, it's generic over the array type, meaning it may be backed by Memory or Vector, but also UnitRange or Base.LogRange (bitstypes, so not backed by memory), BitMatrix (memory-backed, but elements are stored packed), OffsetArrays, CodeUnits (memory-backed but immutable) and many more. What can you do with the underlying array, generally speaking? Take a pointer to it? No. Assume one-based indexing? No. Assume a stride of one? No. Assume mutability? No.

Second, it's generic over the index type. It may be UnitRange{Int}, of course, but also Base.OneTo{UInt16}, or StepRange{BigInteger}, CartesianIndices (which it itself generic over the indexes), Colon. Can you define the subset of these types which indicate dense indices? I can't.

Third, it's multidimensional. It may collect to a Vector or Matrix.

This is not a design flaw of SubArray - it's a perfectly fine design choice, which enables SubArray to be extremely flexible and broadly useful. Unfortunately, it also makes it nearly impossible to write robust, low-level code using SubArray, because it's almost imopssible not to violate the assumptions of a subset of SubArrays many concrete types. Practically speaking, what happens is that methods taking SubArray fall back to only assuming what can be assumed about AbstractArray - which may be inefficient, and buggy (as the recurring bugs due to assumption of one-based indexing has taught us).

In contrast, a MemView{T} is always represented by exactly a MemoryRef{T} and an Int as length. You know exactly what you get.

Design decisions

Mutability

Mutable and immutable memory views are statically distinguished, such that users can write methods that only take mutable memory views. This will statically prevent users from accidentally mutating e.g. strings.

MemKind

The MemKind trait is used because constructing a MemView only for dispatch purposes may not be able to be optimised away by the compiler for some types (currently, strings).

MemKind could be replaced with a function that returned nothing, or the correct MemView type directly, but it's nicer to dispatch on ::MemKind than on ::Union{Nothing, Type{<:MemView}}.

Limitations

  • Currently, MemView does not make use of Core.GenericMemory's additional parameters, such as atomicity or address space. This may easily be added with a GenericMemView type, similar to Memory / GenericMemory.

  • I can't figure out how to support reinterpreted arrays. Any way I can think of doing so will sigificantly complicate MemView, which takes away some of the appeal of this type's simplicity. It's possible that reinterpreted arrays are so outside Julia's ordinary memory management that this simply can't be done.

  • Currently, Strings are not backed by Memory in Julia. Therefore, creating a MemView of a string requires heap-allocating a new Memory pointing to the existing memory of the string. This can be fixed if String is re-implemented to be backed by Memory, but I don't know enough details about the implementation of String to know if this is practical.

Alternative proposal

In examples/alternative.jl, there is an implementation where a MemView is just a pointer and a length. This makes it nearly identical to Random.UnsafeView, however, compared to UnsafeView, this propsal has:

  • The MemKind trait, useful to control dispatch to functions that can treat arrays as being memory
  • The distinction between mutable and immutable memory views

Overall, I like the alternative proposal less. Raw pointers are bad for safety and ergonomics, and they interact less nicely with the Julia runtime. Also, the existing GenericMemoryRef is essentially perfect for this purpose.

Advantages

  • Pointer-based memviews are cheaper to construct, and do not allocate for strings, unlike Memory. Perhaps in the future, strings too will be backed by Memory.
  • Their interaction with the GC is simpler (as there is no interaction)

Disadvantages

  • While some low-level methods using MemView will just forward to calling external libraries where using a pointer is fine, many will be written in pure Julia. There, it's less nice to have raw pointers.
  • Code using pointer-based memviews must make sure to only have the views exist inside GC.@preserve blocks, which is annoying and will almost certainly be violated accidentally somewhere
  • We can't use advantages of the existing Memory infrasrtructure, e.g. having a GenericMemRef which supports atomic memory.
diff --git a/dev/index.html b/dev/index.html index e6b9696..ab8c8ba 100644 --- a/dev/index.html +++ b/dev/index.html @@ -32,4 +32,4 @@ # output 3 -1 +1 diff --git a/dev/interfaces/index.html b/dev/interfaces/index.html index fb5b690..3d031a0 100644 --- a/dev/interfaces/index.html +++ b/dev/interfaces/index.html @@ -29,4 +29,4 @@ # we want to treat strings as if they are. function my_hash(x::Union{String, SubString{String}}) my_hash(MemView(x)) -end +end