Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic: The Unexpected Journey. #124

Closed
wants to merge 9 commits into from
Closed

Atomic: The Unexpected Journey. #124

wants to merge 9 commits into from

Conversation

andersio
Copy link
Member

@andersio andersio commented Nov 25, 2016

The benchmark

Source
It simulates the critical path of Signal, which is grabbing the signal state + querying the interrupted.

Compiler Settings: Both the framework and the test are compiled with -Owholemodule. Checked builds.
Test Environment: i5 4258U, macOS 10.12.1, Xcode 8.1, Swift 3.0.1

Final Speedup

4.641x 😦

The optimised Atomic now takes only 0.029 sec (5%) in the synthetic test of the lock-free disposal PR, significantly faster than it was before.

Baseline Performance

master
0.673 sec (2%)

The Journey

Mark AtomicProtocol.value and AtomicProtocol.swap as @inline(__always).
0.434 sec (2%)

Mark Atomic.modify and Atomic.swap as @inline(__always).
0.296 sec (2%)

Mark PosixThreadMutex.lock and PosixThreadMutex.unlock as @inline(__always).
0.223 sec (2%)

Drop defer statements in Atomic.modify and Atomic.swap.
0.212 sec (3%)

Drop precondition in favour of if-fatalError.
0.188 sec (2%)

Embed PosixThreadMutex into Atomic.
0.179 sec (4%)

Implement value and swap in Atomic.
0.149 sec (4%)

Use the non-copying modify instead of withValue in value.get.
0.145 sec (2%)

Misc

Inline modify into swap manually.
0.145 sec (2%)

Closure based _PosixThreadMutex.lock.
0.179 sec (3%)

-Onone
0.479 sec (3%)

Copy link
Contributor

@sharplet sharplet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we accept this PR, need to fix a couple issues.

return try action(&_value)
let returnValue = try action(&_value)
didSetObserver?(_value)
lock.unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without defer, if try action(&_value) throws, the lock will never be unlocked.

swap(newValue)
}
let returnValue = try action(_value)
lock.unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should bring back defer here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is an oversight. Thanks for catching this.

@andersio
Copy link
Member Author

andersio commented Nov 25, 2016

The hole @sharplet caught is fixed using do...catch. defer would bump the time up by ~20% due to it being closure based.

P.S. In master, RecursiveAtomic.didSetObserver is called when an error is thrown. Looks like a bug.

Copy link
Contributor

@sharplet sharplet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding didSetObserver, is it possible that the throwing closure can change the inout parameter before throwing, and have that change copied back to the caller? Or does Swift guarantee that in the throwing case the inout parameter won't be changed?

precondition(result == 0, "Failed to destroy mutex with error \(result).")

UnsafeMutableRawPointer(mutex).deallocate(bytes: MemoryLayout<pthread_mutex_t>.size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of using a raw pointer here it might make sense to use mutex.deinitialize(1) and mutex.deallocate(1)? I'm not super familiar with the new pointer APIs though.

Copy link
Member Author

@andersio andersio Nov 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems UnsafeMutablePointer is enough.

Instances must be aligned to MemoryLayout<Pointee>.alignment, i.e. (UnsafePointer<Int8>(self) - nil) % MemoryLayout<Pointee>.alignment == 0

let value = try action(&_value)
lock.unlock()
return value
} catch let error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit let error here, and in the other catch blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

@discardableResult
@inline(__always)
public func swap(_ newValue: Value) -> Value {
return modify { (value: inout Value) in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the closure type annotation necessary here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@andersio
Copy link
Member Author

andersio commented Nov 26, 2016

Hmm. It won't restore automatically. But making it restoring on throw doesn't make sense in performance for the common use cases either. Maybe it should be left as is.

@andersio andersio force-pushed the atomic-perf branch 2 times, most recently from 0a1862a to 43cde25 Compare November 26, 2016 14:59
precondition(result == 0, "Failed to destroy mutex with error \(result).")
mutex.deallocate(capacity: 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.apple.com/reference/swift/unsafemutablepointer/2295090-deallocate:

Precondition: The memory is not initialized.

This is what made me think that perhaps we should be calling mutex.deinitialize() first.

Copy link
Member Author

@andersio andersio Nov 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the memory model, it has to be first initialised before being used too. Addressed in 974663c.

}
}

internal final class PosixThreadMutex: NSLocking {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this is used anywhere? Let's remove it and then remove the underscore from the version we are using.


extension AtomicProtocol {
/// Atomically get or set the value of the variable.
public var value: Value {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still keep this and swap as part of AtomicProtocol?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance is much worse due to the need to query the protocol witness table, even with the @inline(__always) annotation. I have rerun my benchmarks and it is still the case.

I am actually thinking of removing AtomicProtocol, hmm...

Embed PosixThreadMutex into Atomic.
0.179 sec (4%)

Implement value and swap in Atomic.
0.149 sec (4%)

@@ -8,27 +8,37 @@

import Foundation

final class PosixThreadMutex: NSLocking {
private var mutex = pthread_mutex_t()
internal struct PosixThreadMutex {
Copy link
Contributor

@mdiep mdiep Dec 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a struct with an UnsafeMutablePointer instead of a class with a deinit?

Copy link
Member Author

@andersio andersio Dec 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is usually used with an apparent owner. So defining it as a struct gives the flexibility to have it embedded in the containing type, should it be a stored property, and have one less pointer to chase.

Drop precondition in favour of if-fatalError.
0.188 sec (2%)

Embed PosixThreadMutex into Atomic.
0.179 sec (4%)

@andersio
Copy link
Member Author

andersio commented Dec 5, 2016

Related: https://lists.swift.org/pipermail/swift-users/Week-of-Mon-20161205/004147.html

Joe Groff jgroff at apple.com
If you apply them to memory you allocated manually with malloc/free on UnsafeMutablePointer's allocation methods, then yeah, they should work as they do in C. That's the safest way to use these functions today. Passing a Swift var inout to one of these functions does not guarantee that accesses to that var will maintain atomicity, since there may be bridging or reabstracting conversions happening under the hood.

Copy link
Contributor

@mdiep mdiep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good overall. Just needs a couple tweaks.

} catch {
lock.unlock()
throw error
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about keeping the defers?

Removing them seems to have a very minor effect on the speed but makes mistakes more likely.

Copy link
Member Author

@andersio andersio Jan 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How likely are we to change this again though? It are is one of the pieces that perhaps would never be altered otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way to know. 🙃

@@ -277,7 +378,8 @@ extension AtomicProtocol {
///
/// - returns: The old value.
@discardableResult
public func swap(_ newValue: Value) -> Value {
@inline(__always)
func swap(_ newValue: Value) -> Value {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still public?

Copy link
Member Author

@andersio andersio Jan 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomic.swap is. RecursiveAtomic.swap isn't. Fixed.

final class PosixThreadMutex: NSLocking {
private var mutex = pthread_mutex_t()
internal struct PosixThreadMutex {
private let mutex: UnsafeMutablePointer<pthread_mutex_t>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment that explains this?

Any time we choose the non-obvious route to get a speed boost, we should document it.

@@ -131,7 +131,7 @@ internal struct UnsafeAtomicState<State: RawRepresentable>: AtomicStateProtocol
/// The wrapper is a struct so that it can be inlined when it is contained as
/// part of a reference type, and dodges an extra level of calling indirection
/// that would otherwise happen if the wrapper itself is an reference type.
internal struct PosixThreadMutex {
internal struct PthreadMutex {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is generally called pthread anyway... So let's make it a bit shorter...

@andersio andersio closed this Feb 25, 2017
@andersio andersio deleted the atomic-perf branch February 25, 2017 08:22
This was referenced Feb 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants