The ThreadLocalCache implementation is used by the MLIRContext (among
other things) to try to manage thread contention in the StorageUniquers.
There is a bunch of fancy shared pointer/weak pointer setups that
basically keeps everything alive across threads at the right time, but a
huge bottleneck is the `weak_ptr::lock` call inside the `::get` method.
This is because the `lock` method has to hit the atomic refcount several
times, and this is bottlenecking performance across many threads.
However, all this is doing is checking whether the storage is
initialized. Importantly, when the `PerThreadInstance` goes out of
scope, it does not remove all of its associated entries from the
thread-local hash map (it contains dangling `PerThreadInstance *` keys).
The `weak_ptr` also allows the thread local cache to synchronize with
the `PerThreadInstance`'s destruction:
1. if `ThreadLocalCache` destructs, the `weak_ptr`s that reference its
contained values are immediately invalidated
2. if `CacheType` destructs within a thread, any entries still live are
removed from the owning `PerThreadInstance`, and it locks the
`weak_ptr` first to ensure it's kept alive long enough for the
removal.
This PR changes the TLC entries to contain a `shared_ptr<ValueT*>` and a
`weak_ptr<PerInstanceState>`. It gives the `PerInstanceState` entries a
`weak_ptr<ValueT*>` on top of the `unique_ptr<ValueT>`. This enables
`ThreadLocalCache::get` to check if the value is initialized by
dereferencing the `shared_ptr<ValueT*>` and check if the contained
pointer is null. When `PerInstanceState` destructs, the values inside
the TLC are written to nullptr. The TLC uses the
`weak_ptr<PerInstanceState>` to satisfy (2).
(1) is no longer the case. When `ThreadLocalCache` begins destruction,
the `weak_ptr<PerInstanceState>` are invalidated, but not the
`shared_ptr<ValueT*>`. This is OK: because the overall object is being
destroyed, `::get` cannot get called and because the
`shared_ptr<PerInstanceState>` finishes destruction before freeing the
pointer, it cannot get reallocated to another `ThreadLocalCache` during
destruction. I.e. the values inside the TLC associated with a
`PerInstanceState` cannot be read during destruction. The most important
thing is to make sure destruction of the TLC doesn't race with the
destructor of `PerInstanceState`. Because `PerInstanceState` carries
`weak_ptr` references into the TLC, we guarantee to not have any
use-after-frees.