optimize swaps #6710

thestinger · 2013-05-24T03:09:23Z

No description provided.

This isn't needed semantically, and it's the wrong case to optimize for.

alexcrichton · 2013-05-24T03:37:03Z

I'm just curious, but is this measurably faster in the sense that LLVM has been actually seen to generate faster code? I realize that in theory it's an improvement, but I'm curious if there's something currently which actually improves from this.

catamorphism · 2013-05-24T03:42:49Z

Yeah, it looked like an OK change to me but I really should have asked for benchmarks.

thestinger · 2013-05-24T03:46:29Z

@alexcrichton: dropping the equality check is definitely an improvement I can see in the IR for TreeMap, and memmove implies an overlap check (but LLVM was reducing it to one equality check).

I doubt it would show up in benchmarks for most code, but it's a measurable little improvement for the tree balancing.

For very large objects, it would probably be helpful to expose alignment information to LLVM too, but I'll deal with that separately once this lands (we are always passing 1 as the alignment to memcpy and memmove).

Passing higher alignment values gives the optimization passes more freedom since it can copy in larger chunks. This change results in rustc outputting the same post-optimization IR as clang for swaps and most copies excluding the lack of information about padding. Code snippet: ```rust #[inline(never)] fn swap<T>(x: &mut T, y: &mut T) { util::swap(x, y); } ``` Original IR (for `int`): ```llvm define internal fastcc void @_ZN9swap_283417_a71830ca3ed2d65d3_00E(i64*, i64*) #1 { static_allocas: %2 = icmp eq i64* %0, %1 br i1 %2, label %_ZN4util9swap_283717_a71830ca3ed2d65d3_00E.exit, label %3 ; <label>:3 ; preds = %static_allocas %4 = load i64* %0, align 1 %5 = load i64* %1, align 1 store i64 %5, i64* %0, align 1 store i64 %4, i64* %1, align 1 br label %_ZN4util9swap_283717_a71830ca3ed2d65d3_00E.exit _ZN4util9swap_283717_a71830ca3ed2d65d3_00E.exit: ; preds = %3, %static_allocas ret void } ``` After #6710: ```llvm define internal fastcc void @_ZN9swap_283017_a71830ca3ed2d65d3_00E(i64* nocapture, i64* nocapture) #1 { static_allocas: %2 = load i64* %0, align 1 %3 = load i64* %1, align 1 store i64 %3, i64* %0, align 1 store i64 %2, i64* %1, align 1 ret void } ``` After this change: ```llvm define internal fastcc void @_ZN9swap_283017_a71830ca3ed2d65d3_00E(i64* nocapture, i64* nocapture) #1 { static_allocas: %2 = load i64* %0, align 8 %3 = load i64* %1, align 8 store i64 %3, i64* %0, align 8 store i64 %2, i64* %1, align 8 ret void } ``` Another example: ```rust #[inline(never)] fn set<T>(x: &mut T, y: T) { *x = y; } ``` Before, with `(int, int)` (align 1): ```llvm define internal fastcc void @_ZN8set_282517_8fa972e3f9e451983_00E({ i64, i64 }* nocapture, { i64, i64 }* nocapture) #1 { static_allocas: %2 = bitcast { i64, i64 }* %1 to i8* %3 = bitcast { i64, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 1, i1 false) ret void } ``` After, with `(int, int)` (align 8): ```llvm define internal fastcc void @_ZN8set_282617_8fa972e3f9e451983_00E({ i64, i64 }* nocapture, { i64, i64 }* nocapture) #1 { static_allocas: %2 = bitcast { i64, i64 }* %1 to i8* %3 = bitcast { i64, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) ret void } ```

thestinger added 4 commits May 23, 2013 21:57

swap_ptr: rm equality check

5ba5865

This isn't needed semantically, and it's the wrong case to optimize for.

add memcpy intrinsic to mirror memmove

7d2f836

indentation fix

a7f450a

optimize util::swap, &mut pointers never alias

7bff028

bors added a commit that referenced this pull request May 24, 2013

auto merge of #6710 : thestinger/rust/swap, r=catamorphism

212d6a2

bors closed this May 24, 2013

thestinger mentioned this pull request May 24, 2013

assorted performance optimizations #6724

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize swaps #6710

optimize swaps #6710

thestinger commented May 24, 2013

alexcrichton commented May 24, 2013

catamorphism commented May 24, 2013

thestinger commented May 24, 2013

optimize swaps #6710

optimize swaps #6710

Conversation

thestinger commented May 24, 2013

alexcrichton commented May 24, 2013

catamorphism commented May 24, 2013

thestinger commented May 24, 2013