Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental: Add Slice.literal for numeric slice constants #13716

Merged
merged 3 commits into from
Sep 24, 2023

Conversation

HertzDevil
Copy link
Contributor

@HertzDevil HertzDevil commented Jul 29, 2023

Inspired by #13339, this PR adds a Slice.literal method that constructs Slice constants of numeric elements in a program's read-only data section directly. String-like literals such as those suggested in #2886 can be implemented on top of this with user-land macros or new literal expansions. Resolves part of #5792 (there is no way to create read-only Hashes from these literals). Does not affect #2485 and #9486 (these use cases require a mutable data structure).

The call:

Slice(Int32).literal(12, 34, 56, 78)

expands to:

# internal constant whose initializer is filled out during codegen
$Slice:0 : Int32[4] = ...

::Slice(Int32).new(pointerof($Slice:0.@buffer), 4, read_only: true)

Currently Slice(T).literal is marked as experimental and offers only a minimally sufficient API. T must be explicitly specified, must be one of the primitive number types, and cannot be a union. All elements must be number literals fitting into T's range.

The literals are implemented via a new @[Primitive(:slice_literal)] which performs the above transformation in the semantic phase; the codegen phase never sees the slice_literal primitive, and only needs to define the global LLVM symbols corresponding to those slice contents. The above will emit:

@"$Slice:0" = internal constant [4 x i32] [i32 12, i32 34, i32 56, i32 78]
; ...
call %"Slice(Int32)" @"*Slice(T)::new:read_only<Pointer(Int32), Int32, Bool>:Slice(Int32)"(ptr @"$Slice:0", i32 4, i1 true)

One goal of such literals is to reduce compilation times of huge constant arrays without resorting to workarounds like #5792 (comment). The following snippet generates two files, one with a 50,000-element array literal, another with a 50,000-element Slice.literal: (we don't want to measure macro expansion overhead here, so we won't express those literals as macro for-loops)

File.open("empty.cr", "w") { }
File.open("cache_array.cr", "w") do |f|
  f << "DATA = [\n"
  50000.times do |i|
    f << "  #{i},\n"
  end
  f << "]\n\n"
  f << "puts DATA.sum\n"
end
File.open("cache_literal.cr", "w") do |f|
  f << "DATA = Slice(Int32).literal(\n"
  50000.times do |i|
    f << "  #{i},\n"
  end
  f << ")\n\n"
  f << "puts DATA.sum\n"
end

The two files are compiled with and without release mode, and the four --stats, each relative to an empty source file with the same compilation settings, are shown below. Only compilation phases with significant time changes are shown. All times are collected on an Apple M2 with a non-release build of the compiler with this PR applied.

Phase cache_array.cr
w/o --release
cache_literal.cr
w/o --release
cache_array.cr
with --release
cache_literal.cr
with --release
Parse 0.0499 0.0523 0.0497 0.0524
Semantic (top level) 0.0153 0.0088 0.0119 0.0155
Semantic (type declarations) 0.0019 0.0013 0.0016 0.0013
Semantic (cvars initializers) 0.0240 0.0259 0.0259 0.0266
Semantic (main) 0.2261 0.0752 0.2294 0.0757
Codegen (crystal) 0.3040 0.0976 0.2872 0.0110
Codegen (bc+obj) 2.7044 0.0110 6.0686 0.0390
Codegen (linking) 0.0068 0.0004 0.0114 0.0013
dsymutil 0.0001 -0.0005 0.0165 0.0005

From this we could conclude:

  • Parsing takes a tad longer, most likely because Calls are more complex than ArrayLiterals.
  • The main phase is faster because it doesn't allocate O(number of elements) AST nodes, which the literal expander does for array literals. This saves memory too. (For the same reason, Slice.literal must not be a user-land macro, as that would also interpolate O(n) nodes.)
  • The Crystal codegen phase is a lot faster because there are now zero allocas per element.
  • The bc+obj phase is so fast that non-release mode cache_literal.cr is almost indistinguishable from an empty file. This is again due to eliminating the allocas.
  • The linking and dsymutil phases are probably related to the allocas as well. Presumably some bad inlining makes release mode cache_array.cr far slower than the other three builds.

As a side effect, Slice(T) is now available in the empty prelude because of the primitive. It is not a built-in type yet; in particular, @[Primitive(:slice_literal)] does not assume any specific layout from Slice, since only the contents are stored in read-only memory, not the Slices themselves. (Contrast with #12020 where String's layout is hardcoded in multiple places.) It does expect a specific constructor signature though, and one such as below is required for an empty prelude:

struct Slice(T)
  def initialize(pointer : T*, size : Int32, *, read_only : Bool = false)
  end
end

The interpreter does not support these literals yet.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Jul 31, 2023

Times on my x86-64 Debian machine, also relative to an empty source:

Phase cache_array.cr
w/o --release
cache_literal.cr
w/o --release
cache_array.cr
with --release
cache_literal.cr
with --release
Semantic (main) 0.2906 0.0898 0.2871 0.0907
Codegen (crystal) 0.2525 0.0057 0.3459 -0.0010
Codegen (bc+obj) 2.6796 0.0023 56.4698 0.0396
Codegen (linking) 0.0218 -0.0054 0.1281 -0.0076

Times on x86-64 Windows:

Phase cache_array.cr
w/o --release
cache_literal.cr
w/o --release
cache_array.cr
with --release
cache_literal.cr
with --release
Semantic (main) 0.3507 0.1808 0.3712 0.1610
Codegen (crystal) 0.7778 0.0200 0.7368 0.0502
Codegen (bc+obj) 3.2093 0.0800 91.9952 0.0285
Codegen (linking) -0.0270 -0.0085 0.2107 0.1789

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@HertzDevil HertzDevil marked this pull request as ready for review September 20, 2023 13:58
@straight-shoota
Copy link
Member

I'm wondering if .literal is actually a good name because it's more than just defining a literal in code. It's also putting the contents in the data section. So it's fundamentally different from Regex.literal for example.
Maybe Slice.const would be a viable alternative?
I'm happy with merging it as is (it's marked as experimental anyway) with the option of changing the name later if there's support for the idea.

@straight-shoota straight-shoota merged commit 4beaf27 into crystal-lang:master Sep 24, 2023
53 checks passed
@HertzDevil HertzDevil deleted the feature/slice-literal branch September 25, 2023 12:29
Blacksmoke16 pushed a commit to Blacksmoke16/crystal that referenced this pull request Dec 11, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants