Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit LLVM bitcode without using LLVM #19031

Merged
merged 57 commits into from
Feb 24, 2024
Merged

Emit LLVM bitcode without using LLVM #19031

merged 57 commits into from
Feb 24, 2024

Conversation

antlilja
Copy link
Contributor

This PR should close #13265

With this PR bitcode is created through the Builder.zig, this bitcode is then parsed into a module by LLVM through LLVMParseBitcodeInContext2 and then compiled into object files as before. The LLVM DIBuilder has been replaced by a system in Builder.zig. This PR also removes all uses of the LLVM library inside Builder.zig and removes a lot of the bindings in bindings.zig and zig_llvm.h/.cpp.

Performance

Hare are some perf stats of a release safe version of the compiler compiling itself (on the llvm-bc branch) with an empty cache:

llvm-bc branch (ReleaseSafe):

Performance counter stats for 'zig build -Dstatic-llvm --zig-lib-dir lib --search-prefix zig-bootstrap/release/x86_64-linux-musl-native -Dno-langref -Dno-autodocs -Dtarget=x86_64-linux-musl -p stage3 install':

        209,102.81 msec task-clock:u                     #    2.236 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
         8,347,566      page-faults:u                    #   39.921 K/sec
   787,734,595,025      cycles:u                         #    3.767 GHz
     3,345,576,458      stalled-cycles-frontend:u        #    0.42% frontend cycles idle
    10,671,110,379      stalled-cycles-backend:u         #    1.35% backend cycles idle
   970,481,802,496      instructions:u                   #    1.23  insn per cycle
                                                  #    0.01  stalled cycles per insn
   195,371,479,830      branches:u                       #  934.332 M/sec
     6,637,680,252      branch-misses:u                  #    3.40% of all branches

      93.498796731 seconds time elapsed

     172.877783000 seconds user
      36.207457000 seconds sys

master branch (ReleaseSafe):

 Performance counter stats for 'zig build -Dstatic-llvm --zig-lib-dir lib --search-prefix zig-bootstrap/release/x86_64-linux-musl-native -Dno-langref -Dno-autodocs -Dtarget=x86_64-linux-musl -p stage3 install':

        220,426.12 msec task-clock:u                     #    2.026 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
         9,726,043      page-faults:u                    #   44.124 K/sec
   821,802,723,056      cycles:u                         #    3.728 GHz
     3,671,828,684      stalled-cycles-frontend:u        #    0.45% frontend cycles idle
     9,380,976,638      stalled-cycles-backend:u         #    1.14% backend cycles idle
 1,025,703,403,709      instructions:u                   #    1.25  insn per cycle
                                                  #    0.01  stalled cycles per insn
   206,035,478,692      branches:u                       #  934.714 M/sec
     6,999,582,116      branch-misses:u                  #    3.40% of all branches

     108.792765074 seconds time elapsed

     179.451902000 seconds user
      40.951977000 seconds sys

antlilja and others added 27 commits February 21, 2024 16:24
The LLVM bitcode requires all type references in
structs to be to earlier defined types.

We make sure types are ordered in the builder
itself in order to avoid having to iterate the
types multiple times and changing the values
of type indicies.
value_indices keeps track of the value index of
each instruction in the function (i.e skips
instruction which do not have a result)
* Added missing legacy field (unused_algebra)
* Made struct correct size (u32 -> u8)
This fixes a problem where empty strings where not emitted as null.
This should also emit a smaller stringtab as all metadata strings were
emitted in both the strtab and in the strings block inside the metadata
block.
The bitcode abbrev was missing the subrange code
@andrewrk
Copy link
Member

Nice work, @antlilja! I'm looking forward to merging this.

By the way, did you take any peak RSS measurements with these changes? I'm curious to know how it affected memory usage.

@andrewrk andrewrk enabled auto-merge February 24, 2024 06:16
@jacobly0 jacobly0 disabled auto-merge February 24, 2024 12:24
@jacobly0
Copy link
Member

jacobly0 commented Feb 24, 2024

I also have an idea for a new debug location system which should have a less clunky interface than my last one without over emitting debug locations. And a plan for how to deal with these "phantom" debug locations when emitting text ir. I'll ping you when it's done.

I have a fix for this, but it can emit the same debug location up to twice in LLVM IR, so I'll wait to see what your plan is.

@jacobly0 jacobly0 merged commit b344ff0 into ziglang:master Feb 24, 2024
10 checks passed
@antlilja
Copy link
Contributor Author

Nice work, @antlilja! I'm looking forward to merging this.

By the way, did you take any peak RSS measurements with these changes? I'm curious to know how it affected memory usage.

I haven't but I'll definitely take a look at that as well when benchmarking some improvements to debug locations.

@jacobly0
Copy link
Member

I realize these aren't completely comparable, but I compared the compiler compiling itself to unoptimized bitcode with an empty cache from before the llvm rewrite started to after this change lands:

3bada8e: (before rewrite)
13.08s
2.214172 GB RSS

edb6486: (after rewrite)
19.49s
2.836292 GB RSS

edb6486 + the following patch:
13.87s
1.0562 GB RSS

diff --git a/src/codegen/llvm.zig b/src/codegen/llvm.zig
index 5ea749d6d9..4eee469cc6 100644
--- a/src/codegen/llvm.zig
+++ b/src/codegen/llvm.zig
@@ -1187,14 +1187,8 @@ pub const Object = struct {
             }
         }
 
-        var bitcode_arena_allocator = std.heap.ArenaAllocator.init(
-            std.heap.page_allocator,
-        );
-        errdefer bitcode_arena_allocator.deinit();
-
-        const bitcode = try self.builder.toBitcode(
-            bitcode_arena_allocator.allocator(),
-        );
+        const bitcode = try self.builder.toBitcode(self.gpa);
+        defer self.gpa.free(bitcode);
 
         if (options.pre_bc_path) |path| {
             var file = try std.fs.cwd().createFile(path, .{});
@@ -1250,7 +1244,6 @@ pub const Object = struct {
 
             break :blk module;
         };
-        bitcode_arena_allocator.deinit();
 
         const target_triple_sentinel =
             try self.gpa.dupeZ(u8, self.builder.target_triple.slice(&self.builder).?);
diff --git a/src/codegen/llvm/Builder.zig b/src/codegen/llvm/Builder.zig
index a5aeb7dee3..38903c6289 100644
--- a/src/codegen/llvm/Builder.zig
+++ b/src/codegen/llvm/Builder.zig
@@ -14945,7 +14945,7 @@ pub fn toBitcode(self: *Builder, allocator: Allocator) bitcode_writer.Error![]co
         try strtab_block.end();
     }
 
-    return bitcode.toSlice();
+    return bitcode.toOwnedSlice();
 }
 
 const Allocator = std.mem.Allocator;
diff --git a/src/codegen/llvm/bitcode_writer.zig b/src/codegen/llvm/bitcode_writer.zig
index bfb406d087..d48a92dd40 100644
--- a/src/codegen/llvm/bitcode_writer.zig
+++ b/src/codegen/llvm/bitcode_writer.zig
@@ -40,9 +40,9 @@ pub fn BitcodeWriter(comptime types: []const type) type {
             self.buffer.deinit();
         }
 
-        pub fn toSlice(self: BcWriter) []const u32 {
+        pub fn toOwnedSlice(self: *BcWriter) Error![]const u32 {
             std.debug.assert(self.bit_count == 0);
-            return self.buffer.items;
+            return self.buffer.toOwnedSlice();
         }
 
         pub fn length(self: BcWriter) usize {

@antlilja
Copy link
Contributor Author

I realize these aren't completely comparable, but I compared the compiler compiling itself to unoptimized bitcode with an empty cache from before the llvm rewrite started to after this change lands:

3bada8e: (before rewrite) 13.08s 2.214172 GB RSS

edb6486: (after rewrite) 19.49s 2.836292 GB RSS

edb6486 + the following patch: 13.87s 1.0562 GB RSS

diff --git a/src/codegen/llvm.zig b/src/codegen/llvm.zig
index 5ea749d6d9..4eee469cc6 100644
--- a/src/codegen/llvm.zig
+++ b/src/codegen/llvm.zig
@@ -1187,14 +1187,8 @@ pub const Object = struct {
             }
         }
 
-        var bitcode_arena_allocator = std.heap.ArenaAllocator.init(
-            std.heap.page_allocator,
-        );
-        errdefer bitcode_arena_allocator.deinit();
-
-        const bitcode = try self.builder.toBitcode(
-            bitcode_arena_allocator.allocator(),
-        );
+        const bitcode = try self.builder.toBitcode(self.gpa);
+        defer self.gpa.free(bitcode);
 
         if (options.pre_bc_path) |path| {
             var file = try std.fs.cwd().createFile(path, .{});
@@ -1250,7 +1244,6 @@ pub const Object = struct {
 
             break :blk module;
         };
-        bitcode_arena_allocator.deinit();
 
         const target_triple_sentinel =
             try self.gpa.dupeZ(u8, self.builder.target_triple.slice(&self.builder).?);
diff --git a/src/codegen/llvm/Builder.zig b/src/codegen/llvm/Builder.zig
index a5aeb7dee3..38903c6289 100644
--- a/src/codegen/llvm/Builder.zig
+++ b/src/codegen/llvm/Builder.zig
@@ -14945,7 +14945,7 @@ pub fn toBitcode(self: *Builder, allocator: Allocator) bitcode_writer.Error![]co
         try strtab_block.end();
     }
 
-    return bitcode.toSlice();
+    return bitcode.toOwnedSlice();
 }
 
 const Allocator = std.mem.Allocator;
diff --git a/src/codegen/llvm/bitcode_writer.zig b/src/codegen/llvm/bitcode_writer.zig
index bfb406d087..d48a92dd40 100644
--- a/src/codegen/llvm/bitcode_writer.zig
+++ b/src/codegen/llvm/bitcode_writer.zig
@@ -40,9 +40,9 @@ pub fn BitcodeWriter(comptime types: []const type) type {
             self.buffer.deinit();
         }
 
-        pub fn toSlice(self: BcWriter) []const u32 {
+        pub fn toOwnedSlice(self: *BcWriter) Error![]const u32 {
             std.debug.assert(self.bit_count == 0);
-            return self.buffer.items;
+            return self.buffer.toOwnedSlice();
         }
 
         pub fn length(self: BcWriter) usize {

Yeah now that I'm thinking about it it's much more reasonable to not use an arena as it will definitely over allocate.

@andrewrk
Copy link
Member

andrewrk commented Feb 24, 2024

Here is my measurement of the impact of this change building the self-hosted compiler:

Benchmark 1 (3 runs): master/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          85.7s  ± 2.98s     82.2s  … 87.7s           0 ( 0%)        0%
  peak_rss           4.71GB ±  763KB    4.71GB … 4.71GB          0 ( 0%)        0%
  cpu_cycles          330G  ± 2.28G      328G  …  332G           0 ( 0%)        0%
  instructions        460G  ±  303M      460G  …  460G           0 ( 0%)        0%
  cache_references   23.4G  ±  184M     23.3G  … 23.6G           0 ( 0%)        0%
  cache_misses       2.01G  ±  206M     1.78G  … 2.19G           0 ( 0%)        0%
  branch_misses      2.22G  ± 2.09M     2.22G  … 2.23G           0 ( 0%)        0%
Benchmark 2 (3 runs): this-pr/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           109s  ± 2.25s      106s  …  110s           0 ( 0%)        💩+ 26.9% ±  7.0%
  peak_rss           4.92GB ±  544KB    4.92GB … 4.92GB          0 ( 0%)        💩+  4.5% ±  0.0%
  cpu_cycles          409G  ± 3.65G      405G  …  412G           0 ( 0%)        💩+ 23.7% ±  2.1%
  instructions        592G  ±  283M      592G  …  592G           0 ( 0%)        💩+ 28.7% ±  0.1%
  cache_references   26.1G  ± 30.6M     26.1G  … 26.1G           0 ( 0%)        💩+ 11.4% ±  1.3%
  cache_misses       3.53G  ±  141M     3.38G  … 3.66G           0 ( 0%)        💩+ 75.3% ± 19.9%
  branch_misses      2.54G  ± 1.52M     2.53G  … 2.54G           0 ( 0%)        💩+ 14.0% ±  0.2%

@antlilja did you perhaps get your branches mixed up when measuring?

@antlilja
Copy link
Contributor Author

Here is my measurement of the impact of this change building the self-hosted compiler:

Benchmark 1 (3 runs): master/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          85.7s  ± 2.98s     82.2s  … 87.7s           0 ( 0%)        0%
  peak_rss           4.71GB ±  763KB    4.71GB … 4.71GB          0 ( 0%)        0%
  cpu_cycles          330G  ± 2.28G      328G  …  332G           0 ( 0%)        0%
  instructions        460G  ±  303M      460G  …  460G           0 ( 0%)        0%
  cache_references   23.4G  ±  184M     23.3G  … 23.6G           0 ( 0%)        0%
  cache_misses       2.01G  ±  206M     1.78G  … 2.19G           0 ( 0%)        0%
  branch_misses      2.22G  ± 2.09M     2.22G  … 2.23G           0 ( 0%)        0%
Benchmark 2 (3 runs): this-pr/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           109s  ± 2.25s      106s  …  110s           0 ( 0%)        💩+ 26.9% ±  7.0%
  peak_rss           4.92GB ±  544KB    4.92GB … 4.92GB          0 ( 0%)        💩+  4.5% ±  0.0%
  cpu_cycles          409G  ± 3.65G      405G  …  412G           0 ( 0%)        💩+ 23.7% ±  2.1%
  instructions        592G  ±  283M      592G  …  592G           0 ( 0%)        💩+ 28.7% ±  0.1%
  cache_references   26.1G  ± 30.6M     26.1G  … 26.1G           0 ( 0%)        💩+ 11.4% ±  1.3%
  cache_misses       3.53G  ±  141M     3.38G  … 3.66G           0 ( 0%)        💩+ 75.3% ± 19.9%
  branch_misses      2.54G  ± 1.52M     2.53G  … 2.54G           0 ( 0%)        💩+ 14.0% ±  0.2%

@antlilja did you perhaps get your branches mixed up when measuring?

Yeah it's definitely possible I missed or did something weird when switching between branches. Seems like the most reasonable explanation, there haven't been any changes since the benchmark that should have that huge of an impact. The only major thing that is different is how were emitting debug locations but is definitely not the cause of that difference. My bad.

@andrewrk
Copy link
Member

andrewrk commented Feb 24, 2024

Well, I'm just glad we caught it now. Thanks for all your efforts on this branch.

Here's a new data point that includes #19069 vs old master:

Benchmark 1 (3 runs): 0.12.0-dev.2931+8d651f512/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          84.6s  ± 2.89s     81.3s  … 86.7s           0 ( 0%)        0%
  peak_rss           4.68GB ± 70.2MB    4.64GB … 4.76GB          0 ( 0%)        0%
  cpu_cycles          330G  ± 6.41G      327G  …  338G           0 ( 0%)        0%
  instructions        464G  ± 12.6G      456G  …  478G           0 ( 0%)        0%
  cache_references   23.9G  ±  427M     23.6G  … 24.4G           0 ( 0%)        0%
  cache_misses       1.78G  ±  239M     1.51G  … 1.93G           0 ( 0%)        0%
  branch_misses      2.25G  ± 66.6M     2.21G  … 2.32G           0 ( 0%)        0%
Benchmark 2 (3 runs): master+19069/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           105s  ± 4.07s      101s  …  109s           0 ( 0%)        💩+ 24.5% ±  9.5%
  peak_rss           4.92GB ±  141KB    4.92GB … 4.92GB          0 ( 0%)        💩+  5.0% ±  2.4%
  cpu_cycles          410G  ± 4.33G      405G  …  414G           0 ( 0%)        💩+ 24.1% ±  3.8%
  instructions        596G  ±  200M      596G  …  596G           0 ( 0%)        💩+ 28.6% ±  4.4%
  cache_references   26.3G  ± 42.8M     26.2G  … 26.3G           0 ( 0%)        💩+ 10.0% ±  2.9%
  cache_misses       3.24G  ±  303M     3.05G  … 3.59G           0 ( 0%)        💩+ 81.8% ± 34.6%
  branch_misses      2.56G  ± 1.66M     2.56G  … 2.56G           0 ( 0%)        💩+ 14.1% ±  4.8%

Increased peak RSS? I'm not sure how that happened. Now I'm starting to doubt my methodology. Why are my results so different than both @jacobly0 and @antlilja?

@Jarred-Sumner
Copy link
Contributor

i wonder if toOwnedSlice is increasing memory usage since it potentially re-allocates the shrunk array before freeing the old one? or, if arena allocator is in use when toOwnedSlice is called or on ArrayList that expand a lot that could also cause this

@antlilja
Copy link
Contributor Author

Well, I'm just glad we caught it now. Thanks for all your efforts on this branch.

Here's a new data point that includes #19069 vs old master:

Benchmark 1 (3 runs): 0.12.0-dev.2931+8d651f512/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          84.6s  ± 2.89s     81.3s  … 86.7s           0 ( 0%)        0%
  peak_rss           4.68GB ± 70.2MB    4.64GB … 4.76GB          0 ( 0%)        0%
  cpu_cycles          330G  ± 6.41G      327G  …  338G           0 ( 0%)        0%
  instructions        464G  ± 12.6G      456G  …  478G           0 ( 0%)        0%
  cache_references   23.9G  ±  427M     23.6G  … 24.4G           0 ( 0%)        0%
  cache_misses       1.78G  ±  239M     1.51G  … 1.93G           0 ( 0%)        0%
  branch_misses      2.25G  ± 66.6M     2.21G  … 2.32G           0 ( 0%)        0%
Benchmark 2 (3 runs): master+19069/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           105s  ± 4.07s      101s  …  109s           0 ( 0%)        💩+ 24.5% ±  9.5%
  peak_rss           4.92GB ±  141KB    4.92GB … 4.92GB          0 ( 0%)        💩+  5.0% ±  2.4%
  cpu_cycles          410G  ± 4.33G      405G  …  414G           0 ( 0%)        💩+ 24.1% ±  3.8%
  instructions        596G  ±  200M      596G  …  596G           0 ( 0%)        💩+ 28.6% ±  4.4%
  cache_references   26.3G  ± 42.8M     26.2G  … 26.3G           0 ( 0%)        💩+ 10.0% ±  2.9%
  cache_misses       3.24G  ±  303M     3.05G  … 3.59G           0 ( 0%)        💩+ 81.8% ± 34.6%
  branch_misses      2.56G  ± 1.66M     2.56G  … 2.56G           0 ( 0%)        💩+ 14.1% ±  4.8%

Increased peak RSS? I'm not sure how that happened. Now I'm starting to doubt my methodology. Why are my results so different than both @jacobly0 and @antlilja?

Hmm, maybe those results differ from @jacobly0 because he only does the bitcode emission without actually compiling to a binary? But maybe I'm misinterpreting this line:

... I compared the compiler compiling itself to unoptimized bitcode with an empty cache ...

I'll do some benchmarks as a sanity check

@andrewrk
Copy link
Member

It's looking like, while the time and memory spent in Zig land is reduced, the time and memory spent in C++ land is increased even more. That's too bad. Looks like my prediction was very wrong.

That being said, it does not change the plan. This is a large step towards the long term goal of eliminating the library dependency on LLVM and providing fast compilation speed by avoiding LLVM altogether.

@andrewrk
Copy link
Member

2 more data points:

Hello World (-ODebug):

Benchmark 1 (5 runs): zig-dev build-exe hello.zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.04s  ± 19.6ms    1.03s  … 1.08s           0 ( 0%)        0%
  peak_rss            174MB ±  304KB     174MB …  175MB          0 ( 0%)        0%
  cpu_cycles         4.34G  ± 22.1M     4.31G  … 4.37G           0 ( 0%)        0%
  instructions       6.17G  ±  993K     6.17G  … 6.17G           0 ( 0%)        0%
  cache_references    279M  ± 2.07M      276M  …  281M           0 ( 0%)        0%
  cache_misses       10.3M  ± 83.3K     10.2M  … 10.4M           0 ( 0%)        0%
  branch_misses      34.7M  ± 42.3K     34.6M  … 34.7M           0 ( 0%)        0%
Benchmark 2 (5 runs): zig build-exe hello.zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.15s  ± 29.1ms    1.13s  … 1.20s           0 ( 0%)        💩+ 10.1% ±  3.5%
  peak_rss            181MB ±  160KB     180MB …  181MB          0 ( 0%)        💩+  3.7% ±  0.2%
  cpu_cycles         4.73G  ± 18.7M     4.71G  … 4.76G           0 ( 0%)        💩+  9.1% ±  0.7%
  instructions       7.15G  ± 3.55M     7.15G  … 7.16G           0 ( 0%)        💩+ 15.9% ±  0.1%
  cache_references    278M  ± 1.35M      276M  …  279M           0 ( 0%)          -  0.5% ±  0.9%
  cache_misses       10.4M  ±  222K     10.2M  … 10.7M           0 ( 0%)          +  0.8% ±  2.4%
  branch_misses      38.2M  ± 53.4K     38.1M  … 38.2M           0 ( 0%)        💩+ 10.1% ±  0.2%

Hello World (-OReleaseFast):

Benchmark 1 (3 runs): zig-dev build-exe hello.zig -OReleaseFast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          7.61s  ± 1.60s     6.63s  … 9.45s           0 ( 0%)        0%
  peak_rss            195MB ± 26.6MB     179MB …  225MB          0 ( 0%)        0%
  cpu_cycles         32.7G  ± 7.18G     28.6G  … 41.0G           0 ( 0%)        0%
  instructions       44.7G  ± 9.70G     39.1G  … 55.9G           0 ( 0%)        0%
  cache_references   2.08G  ±  400M     1.84G  … 2.55G           0 ( 0%)        0%
  cache_misses       41.9M  ± 8.42M     36.9M  … 51.6M           0 ( 0%)        0%
  branch_misses       252M  ± 55.6M      220M  …  316M           0 ( 0%)        0%
Benchmark 2 (3 runs): zig build-exe hello.zig -OReleaseFast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          7.80s  ± 1.50s     6.73s  … 9.51s           0 ( 0%)          +  2.5% ± 46.2%
  peak_rss            197MB ± 23.3MB     182MB …  223MB          0 ( 0%)          +  1.0% ± 29.2%
  cpu_cycles         32.8G  ± 6.73G     28.9G  … 40.6G           0 ( 0%)          +  0.3% ± 48.2%
  instructions       45.8G  ± 9.91G     40.1G  … 57.3G           0 ( 0%)          +  2.6% ± 49.8%
  cache_references   2.04G  ±  397M     1.81G  … 2.50G           0 ( 0%)          -  2.0% ± 43.3%
  cache_misses       39.6M  ± 6.53M     35.0M  … 47.1M           0 ( 0%)          -  5.5% ± 40.8%
  branch_misses       255M  ± 55.0M      223M  …  318M           0 ( 0%)          +  1.0% ± 49.7%

Note that for the near future, the plan is to not use LLVM for debug builds but only for release builds, in which the perf difference of this data point is insignificant.

@andrewrk
Copy link
Member

andrewrk commented Feb 25, 2024

@Jarred-Sumner

i wonder if toOwnedSlice is increasing memory usage since it potentially re-allocates the shrunk array before freeing the old one? or, if arena allocator is in use when toOwnedSlice is called or on ArrayList that expand a lot that could also cause this

It's 100% guaranteed to reuse the same pointer when shrinking, when using the C allocator:

zig/lib/std/heap.zig

Lines 121 to 123 in 31763d2

if (new_len <= buf.len) {
return true;
}

however I did notice a related possible improvement to make to std.heap.raw_c_allocator. #19073. I expect this to have near zero effect.

@Jarred-Sumner
Copy link
Contributor

Another thought

What if LLVM's implementation uses the shorthand for repeated elements? Is the file size of the .bc files similar between the Zig-generated one and the LLVM-generated one?

@antlilja
Copy link
Contributor Author

Another thought

What if LLVM's implementation uses the shorthand for repeated elements? Is the file size of the .bc files similar between the Zig-generated one and the LLVM-generated one?

I have some measurements in my new PR: #19083

self: *Builder,
elements: []const u32,
) Allocator.Error!Metadata {
try self.ensureUnusedMetadataCapacity(1, Metadata.Expression, elements.len * @sizeOf(u32));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not noticing during the original review, but all of these calls are passing the wrong third argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted a fix: #20484

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

directly output LLVM bitcode rather than using LLVM's IRBuilder API
4 participants