Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Implement ValueTask extensibility #16618

Merged
merged 3 commits into from
Mar 1, 2018

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Feb 27, 2018

This commit adds support for extending ValueTask<T> with arbitrary backing sources. Prior to this change, ValueTask<T> could wrap a T or a Task<T>; now it can also wrap an IValueTaskSource<T>, which can be implemented by arbitrary objects to be represented by ValueTask<T>. These objects can then be pooled and reused to minimize allocation. The commit also adds a non-generic ValueTask that can represent void-returning operations, including a default synchronous success, Task, and IValueTaskSource. For the non-generic ValueTask, the commit also includes awaiters and async method builders, so it can be both awaited and used as the return type of an async method.

The rest of the changes fall into a few buckets all related to enabling this support:

  • Modifying AsyncTaskMethodBuilder<TResult>.AwaitUnsafeOnCompleted to specially recognize any ValueTask and utilize either the Task or IValueTaskSource that backs it to avoid allocating an Action MoveNext method. If every object awaited in an async method is either a Task/Task<T> or ValueTask/ValueTask<T>, regardless of what the ValueTask/ValueTask<T> wraps, we'll be able to avoid allocating the delegate and only allocate the single state machine object that also serves as the returned object.
  • Changing Stream.WriteAsync to return ValueTask instead of Task. This enables interested overriding stream types to use a reusable/pooled object to avoid WriteAsync allocations.
  • Modifying Stream.CopyToAsync to use the new Memory-based overloads of ReadAsync and WriteAsync. This enables the default CopyToAsync implementation to take advantage of any pooling done by derived streams, but even without pooling to take advantage of synchronously completing ReadAsyncs returning ValueTask<int>s that contained an int rather than an allocated object. (While I was modifying this, I also removed some unnecessary array clearing that we'd added before later deciding it wasn't needed in general.)
  • Modifying StreamReader/Writer to use the new ReadAsync/WriteAsync overloads.

Contributes to https://github.com/dotnet/corefx/issues/27445

@stephentoub stephentoub changed the title [WIP] coreclr changes for ValueTask extensibility Implement ValueTask extensibility Feb 28, 2018
@stephentoub
Copy link
Member Author

destination.Write(buffer, 0, read);
}
}
finally
{
Array.Clear(buffer, 0, highwaterMark); // clear only the most we used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you removed this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted in the PR description:
"(While I was modifying this, I also removed some unnecessary array clearing that we'd added before later deciding it wasn't needed in general.)"
We'd initially gone through and adding clearing everywhere we were using array pool. Then we subsequently decided we didn't need to, that we would only explicitly clear in very explicitly high-security code (like when using an array to hold a key in a crypto lib), and we removed some of these and stopped adding more, but we missed some.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah

@@ -1127,7 +1127,7 @@ public override ValueTask<int> ReadAsync(Memory<char> buffer, CancellationToken
{
Debug.Assert(_bytePos == 0, "_bytePos can be non zero only when we are trying to _checkPreamble. Are two threads using this StreamReader at the same time?");

_byteLen = await tmpStream.ReadAsync(tmpByteBuffer, 0, tmpByteBuffer.Length, cancellationToken).ConfigureAwait(false);
_byteLen = await tmpStream.ReadAsync(new Memory<byte>(tmpByteBuffer), cancellationToken).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: tmpByteBuffer.AsMemory() is cleaner (generic-type inference)

here and elsewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmpByteBuffer.AsMemory() is cleaner (generic-type inference)

I prefer the explicitness.

@@ -963,14 +963,14 @@ private bool HaveWrittenPreamble_Prop
byte[] preamble = encoding.GetPreamble();
if (preamble.Length > 0)
{
await stream.WriteAsync(preamble, 0, preamble.Length, cancellationToken).ConfigureAwait(false);
await stream.WriteAsync(new ReadOnlyMemory<byte>(preamble), cancellationToken).ConfigureAwait(false);
Copy link
Member

@ahsonkhan ahsonkhan Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to explicitly create a ROM here? Would implicit cast work? I don't see an array based stream.WriteAsync overload that has 2 parameters (second one being the cancellation token).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to explicitly create a ROM here?

Being explicit helps me when reading the code to know that I'm using the desired overload. Otherwise I need to start paying attention to how many arguments there are and whether I might be slipping into the wrong one.

/// <summary>Schedules the state machine to proceed to the next action when the specified awaiter completes.</summary>
/// <typeparam name="TAwaiter">The type of the awaiter.</typeparam>
/// <typeparam name="TStateMachine">The type of the state machine.</typeparam>
/// <param name="awaiter">the awaiter</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inconsistent capitalization/use of period in the xml comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -5,88 +5,212 @@
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.Threading.Tasks;
#if !netstandard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add new line between using System.* directives and Internal.* ones

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


/// <summary>Gets the result of the ValueTask.</summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
[StackTraceHidden]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why is GetResult marked with the StackTraceHidden attribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception is thrown inside it so looks like gunk in the stack trace?

Copy link
Member

@davidfowl davidfowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stephentoub
Copy link
Member Author

stephentoub commented Feb 28, 2018

Open issues to address or decide against before merging:

  • Putting IValueTaskSource-related types in a subnamespace.
  • Using an opaque integer cookie passed into the ValueTask(IValueTaskSource) ctor and then fed to all interface invocations

// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

namespace System.Threading.Tasks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agreed to put the source APIs in a subnamespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #16618 (comment).

We discussed it. There was no conclusion. I'm happy for you to make a call on if and what it should be. My default is to leave them where they are.

Copy link
Member

@KrzysztofCwalina KrzysztofCwalina Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't feel super strongly, so maybe @terrajobst has some input?

public interface IValueTaskSource
{
/// <summary>Gets the status of the current operation.</summary>
ValueTaskSourceStatus Status { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not doing the cookies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason when I click on the link it does not take me anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason when I click on the link it does not take me anywhere.

Hmm, ok, well it's a top-level comment earlier in this PR thread that outlines a few remaining action items to follow-up one and determine whether to address or ignore, one of which was cookies, another of which was namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't feel strongly about whether they are added; but if it were something like CorrelationToken would be better than Cookie

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't feel strongly about whether they are added; but if it were something like CorrelationToken would be better than Cookie

You mean just in terms of name? If we were to do something (I'm still investigating any perf ramifications), I was thinking it would just be something like short token.

@stephentoub
Copy link
Member Author

stephentoub commented Feb 28, 2018

Using an opaque integer cookie passed into the ValueTask(IValueTaskSource) ctor and then fed to all interface invocations

I spent some time today investigating this possibility...

The core idea here is to modify the ValueTask constructor that takes an IValueTaskSource to also take an integral value, e.g.

public ValueTask(IValueTaskSource<T> source, short token);

That token is meaningless to the ValueTask itself, which simply passes that value back to the interface on every call back to the interface:

ValueTaskSourceStatus GetStatus(short token);
void OnCompleted(Action<object> continuation, object state, short token, ValueTaskSourceOnCompletedFlags flags);
TResult GetResult(short token);

There were two reasons raised for considering something like this:

  1. Allowing a single object to support multiple distinct operation types even if they are all non-generic or generic with the same TResult value. The id could be used by the IValueTaskSource implementation to effectively tell itself which kind of operation is being performed, switching on it.
  2. Diagnostics / errors. An implementation could maintain an operation count, and feed a (somewhat) unique value into the ValueTask, which would hold on to it and use it for all subsequent calls back to the IValueTaskSource; if attempts were then made to use that ValueTask after the IValueTaskSource had already moved on to represent another operation, the IValueTaskSource could sometimes detect that and throw an exception.

(1) doesn't seem to me to be a real issue. An implementation can already be used to represent any number of distinct operations, simply by itself maintaining the state that tells it what kind of operation the current one is; it doesn't need to feed that to the ValueTask. The only time that would be useful is in the case where a single object wanted to be able to represent multiple operations outstanding concurrently and with the same TResult value, in which case it would need to be told by the ValueTask which operation it was. But this is incredibly niche, and if you're going to support multiple operations at the same time, you're already going to need to be able to store the state associated with each, state that includes a callback, and object, an ExecutionContext, a scheduler, and so on; at that point you're better off simply using another object. The whole point of this feature is to enable pooling / reuse, and if you're pooling, there's little difference between one larger object and two smaller objects pooled together.

(2) does have some value. The question is what it costs us and whether it’s worth the price:

  • There's a complexity cost, in that the interface needs to have this id threaded through it. Then again, this is already an advanced feature, so making the interface slightly more complicated isn't particularly compelling an argument.
  • There's a size cost; two, actually. There's the impact on ValueTask needing to store this value, and although it's a struct, it will end up getting stored in a lot of async state machines, and so it'll increase the size of those heap allocations, plus it gets passed around and we want to minimize the size of the data passed. With this feature (pre-id), ValueTask<T> is three fields: a T, an object (that can be either a Task or an IValueTaskSource, and a byte used for some bit fields. So, assuming T is a word, with padding on 32-bit, that gives us 30-bits to play with without expanding its size, which means we'd want to limit it to a short (the largest primitive integral type we have that fits in that bound), which somewhat limits the diagnostic value but it's still likely to catch many errors. It’s possible we’d want to add something else to ValueTask<T> in the future, and if we then end up needing more than a few bits for that, we'll be forced to spill over and increase the size further, but it’s not clear that’s a real concern. Of course, it's likely T isn't a word; we expect bool and int to both be very common. On both 32-bit and 64-bit, the token won't change the padded size of the struct containing T==int or T==bool. It will, however, change it for a less common result type, like T==short.
  • There's also a second size cost: the implementation of IValueTaskSource. It also needs to store the value, and it will be a class, which means it'll make that object bigger as well. But if you’re using this IValueTaskSource feature, you’re likely doing so because you’re reusing these objects, and so the amortized cost of that extra space seems not particularly impactful.
  • There's a throughput cost. This is the main one I spent time investigating. I looked at what I think is absolute worst case: an IValueTaskSource that represents an operation that has completed by the time it's awaited, and whose implementation of the interfaces is a nop in one case and does a token check in the other (throwing if the comparison fails, but it never does in my test). I changed the interfaces as described, and implemented a very simple object with both variants of the interface, before and after the token. Await'ing that in a time loop regressed by ~25%. That's big, but it’s also so worst-case as to not be particularly realistic: if you actually have a value synchronously, you should use the ValueTask(TResult) ctor, and even if you don’t, you’re going to be doing a lot more work in general than nop’ing or a single comparison. In a real implementation, obviously there's a lot more going on that will shrink that value, so I also updated the System.Threading.Channels implementation of IValueTaskSource to include the token and the token checks on all operations. There was no measurable regression. An implementation could also choose to ignore the token entirely, just pass 0, not store it, and not validate it; it's entirely up to the IValueTaskSource implementation.

Given that there is diagnostic value here (we won’t catch every misuse, but many of them) and there doesn’t seem to be a meaningful perf cost in a real scenario, I think we should go ahead with this, with the short token addition as shown earlier.

Any concerns? I'm going to proceed with it and assume not.

@KrzysztofCwalina
Copy link
Member

No concerns about the short token from me.

@davidfowl
Copy link
Member

I personally don't think the cookie is important and I'm hoping implementations can just ignore it.

@stephentoub
Copy link
Member Author

I personally don't think the cookie is important and I'm hoping implementations can just ignore it.

It's up to the implementation what to do with it. I do think it adds value, in that it can help to catch misuse, e.g. now if you do this:

ValueTask<int> r = networkStream.ReadAsync(...);
await r;
await r;

you'll get an exception, whereas previously you would have just silently gotten bad behavior if another operation had already been started on the stream.

But an implementation can choose to ignore it. After all, it's the implementation that provides it in the first place.... it's ValueTask that ignores it, and just passes whatever value the API implementation gave it back to the API.

This commit adds support for extending `ValueTask<T>` with arbitrary backing sources.  Prior to this change, `ValueTask<T>` could wrap a `T` or a `Task<T>`; now it can also wrap an `IValueTaskSource<T>`, which can be implemented by arbitrary objects to be represented by `ValueTask<T>`.  These objects can then be pooled and reused to minimize allocation.  The commit also adds a non-generic `ValueTask` that can represent void-returning operations, including a `default` synchronous success, `Task`, and `IValueTaskSource`.  For the non-generic `ValueTask`, the commit also includes awaiters and async method builders, so it can be both awaited and used as the return type of an async method.

The rest of the changes fall into a few buckets all related to enabling this support:
- Modifying `AsyncTaskMethodBuilder<TResult>.AwaitUnsafeOnCompleted` to specially recognize any `ValueTask` and utilize either the `Task` or `IValueTaskSource` that backs it to avoid allocating an Action MoveNext method.  If every object awaited in an async method is either a `Task`/`Task<T>` or `ValueTask`/`ValueTask<T>`, regardless of what the `ValueTask`/`ValueTask<T>` wraps, we'll be able to avoid allocating the delegate and only allocate the single state machine object that also serves as the returned object.
- Changing `Stream.WriteAsync` to return `ValueTask` instead of `Task`.  This enables interested overriding stream types to use a reusable/pooled object to avoid `WriteAsync` allocations.
- Modifying Stream.CopyToAsync to use the new `Memory`-based overloads of `ReadAsync` and `WriteAsync`.  This enables the default `CopyToAsync` implementation to take advantage of any pooling done by derived streams, but even without pooling to take advantage of synchronously completing `ReadAsync`s returning `ValueTask<int>`s that contained an `int` rather than an allocated object. (While I was modifying this, I also removed some unnecessary array clearing that we'd added before later deciding it wasn't needed in general.)
- Modifying StreamReader/Writer to use the new `ReadAsync`/`WriteAsync` overloads.
Group these more advanced types into a subnamespace.
@stephentoub
Copy link
Member Author

I believe I've addressed all of the remaining feedback, including adding in the cookie/token and moving the types to a subnamespace. I plan to merge this once CI is green. Thanks.

@ahsonkhan
Copy link
Member

@dotnet-bot test Ubuntu x64 Checked Innerloop Build and Test
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test

Previous failed CI legs:
https://ci.dot.net/job/dotnet_coreclr/job/master/job/x64_checked_ubuntu_innerloop_flow_prtest/1230/
https://ci.dot.net/job/dotnet_coreclr/job/master/job/armel_cross_checked_tizen_innerloop_prtest/1250/

@marksmeltzer
Copy link

marksmeltzer commented Mar 1, 2018

Firstly, I will be one of the consumers of this API.

Secondly, I love the concept.

Thirdly, I'm not sure I love the specific implementation discussed above...

Interfaces are nice, but due to the higher dispatch cost, I'm thinking this could be a good case where the capability expressed by IValueTaskSource is better expressed as an abstract class BaseValueTaskSource (viz., not every point of extensibility needs to be exposed as an interface -- especially the performance sensitive cases.)

I understand that .NET Core is probably going to get smarter inlining before the big brother .NET framework, and I know that interface inlining is one of the things on the drawing board... However, this is a case where even a comprehensive effort on the inling effort isn't likely to succeed in getting these dispatches inlined... Moreover, for cases where this interface were implemented on a struct, we'll likely get the boxing interface penalties as well.

Since the TPL, async, etc., affect literally almost everything in .NET these days, I highly recommend against doing anything that adds processing overhead.

For these advanced cases, I would not mind at all creating a type that extends BaseValueTaskSource and does what I need it to do. Moreover, since I'm making a class anyway, I absolutely do not mind adding whatever tracking I want to have in my BaseValueTaskSource implementation. (If I want a state counter, I'll add it.)

In fact, you could ship a BaseTokenValueTaskSource implementation that simply exposes the short Token read/write property, and an abstract Execute( ref short token ). In the implementation, have the Execute() implemented as a sealed overload and grab the current Token, call Execute( ref token ), and then store the result back in Token. Etc., etc.: all of the relevant APIs on BaseValueTaskSource could be overriden to pass the current token along to the BaseTokenValueTaskSource abstract implementation. (Bonus: your current code would be pretty trivial to update on this route.)

I would, again, however, stress that as much as possible be implemented as non-virtual. Again, this flexibility is awesome, but the added costs of the inevitable failed branch prediction and the dispatching will add up when multiplied by large scale apps where almost everything is async nowadays. However, I am less concerned about exposing virtual members on BaseValueTaskSource as appropriate -- as those costs will only apply to cases where BaseValueTaskSource is actually used.

@marksmeltzer
Copy link

I would think the refactoring effort to move from the current IValueTaskSource approach to the suggested BaseValueTaskSource + BaseTokenValueTaskSource approach would be relatively minor. You'd get the functionality you want, and reduce much of the implied runtime costs.

@marksmeltzer
Copy link

After reviewing the actual interfaces, here's the code for what I had in mind:

    /// <summary>Represents an object that can be wrapped by a <see cref="ValueTask" />.</summary>
    public abstract class BaseValueTaskSource
    {
        #region Abstract Methods (Public)

        /// <summary>Gets the result of the <see cref="IValueTaskSource" />.</summary>
        public abstract void GetResult();

        /// <summary>Gets the status of the current operation.</summary>
        public abstract ValueTaskSourceStatus GetStatus();

        /// <summary>Schedules the continuation action for this <see cref="IValueTaskSource" />.</summary>
        /// <param name="continuation">The continuation to invoke when the operation has completed.</param>
        /// <param name="state">The state object to pass to <paramref name="continuation" /> when it's invoked.</param>
        /// <param name="flags">The flags describing the behavior of the continuation.</param>
        public abstract void OnCompleted( Action<object> continuation, object state, ValueTaskSourceOnCompletedFlags flags );

        #endregion
    }

    /// <summary>Represents an object that can be wrapped by a <see cref="ValueTask" />.</summary>
    public abstract class BaseTokenValueTaskSource<TToken> : BaseValueTaskSource
    {
        #region Constructors

        protected BaseTokenValueTaskSource( TToken token )
        {
            Token = token;
        }

        #endregion

        #region Properties (Public)

        public TToken Token { get; }

        #endregion

        #region Abstract Methods (Protected)

        /// <summary>Gets the result of the <see cref="IValueTaskSource" />.</summary>
        /// <param name="token">Opaque value that was provided to the <see cref="ValueTask" />'s constructor.</param>
        protected abstract void GetResult( TToken token );

        /// <summary>Gets the status of the current operation.</summary>
        /// <param name="token">Opaque value that was provided to the <see cref="ValueTask" />'s constructor.</param>
        protected abstract ValueTaskSourceStatus GetStatus( TToken token );

        /// <summary>Schedules the continuation action for this <see cref="IValueTaskSource" />.</summary>
        /// <param name="continuation">The continuation to invoke when the operation has completed.</param>
        /// <param name="state">The state object to pass to <paramref name="continuation" /> when it's invoked.</param>
        /// <param name="token">Opaque value that was provided to the <see cref="ValueTask" />'s constructor.</param>
        /// <param name="flags">The flags describing the behavior of the continuation.</param>
        protected abstract void OnCompleted(
            Action<object> continuation,
            object state,
            TToken token,
            ValueTaskSourceOnCompletedFlags flags );

        #endregion

        #region Methods (Public)

        /// <inheritdoc />
        public sealed override void GetResult( )
        {
            GetResult( Token );
        }

        /// <inheritdoc />
        public sealed override ValueTaskSourceStatus GetStatus( )
        {
            return GetStatus( Token );
        }

        /// <inheritdoc />
        public sealed override void OnCompleted( Action<object> continuation, object state, ValueTaskSourceOnCompletedFlags flags )
        {
            OnCompleted( continuation, state, Token, flags );
        }

        #endregion
    }

    /// <summary>Represents an object that can be wrapped by a <see cref="ValueTask{TResult}" />.</summary>
    /// <typeparam name="TResult">Specifies the type of data returned from the object.</typeparam>
    public abstract class BaseValueTaskSource<TResult>
    {
        #region Abstract Methods (Public)

        /// <summary>Gets the result of the <see cref="!:IValueTaskSource{TResult}" />.</summary>
        public abstract TResult GetResult( );

        /// <summary>Gets the status of the current operation.</summary>
        public abstract ValueTaskSourceStatus GetStatus( );

        /// <summary>Schedules the continuation action for this <see cref="!:IValueTaskSource{TResult}" />.</summary>
        /// <param name="continuation">The continuation to invoke when the operation has completed.</param>
        /// <param name="state">The state object to pass to <paramref name="continuation" /> when it's invoked.</param>
        /// <param name="flags">The flags describing the behavior of the continuation.</param>
        public abstract void OnCompleted( Action<object> continuation, object state, ValueTaskSourceOnCompletedFlags flags );

        #endregion
    }

    /// <summary>Represents an object that can be wrapped by a <see cref="ValueTask{TResult}" />.</summary>
    /// <typeparam name="TResult">Specifies the type of data returned from the object.</typeparam>
    /// <typeparam name="TToken">Specified the type of the token that is associated with the value task source.</typeparam>
    public abstract class BaseTokenValueTaskSource<TToken, TResult> : BaseValueTaskSource<TResult>
    {
        #region Constructors

        protected BaseTokenValueTaskSource( TToken token )
        {
            Token = token;
        }

        #endregion

        #region Properties (Public)

        public TToken Token { get; }

        #endregion

        #region Abstract Methods (Protected)

        /// <summary>Gets the result of the <see cref="!:IValueTaskSource{TResult}" />.</summary>
        /// <param name="token">Opaque value that was provided to the <see cref="!:ValueTask" />'s constructor.</param>
        protected abstract TResult GetResult( TToken token );

        /// <summary>Gets the status of the current operation.</summary>
        /// <param name="token">Opaque value that was provided to the <see cref="!:ValueTask" />'s constructor.</param>
        protected abstract ValueTaskSourceStatus GetStatus( TToken token );

        /// <summary>Schedules the continuation action for this <see cref="!:IValueTaskSource{TResult}" />.</summary>
        /// <param name="continuation">The continuation to invoke when the operation has completed.</param>
        /// <param name="state">The state object to pass to <paramref name="continuation" /> when it's invoked.</param>
        /// <param name="token">Opaque value that was provided to the <see cref="!:ValueTask" />'s constructor.</param>
        /// <param name="flags">The flags describing the behavior of the continuation.</param>
        protected abstract void OnCompleted(
            Action<object> continuation,
            object state,
            TToken token,
            ValueTaskSourceOnCompletedFlags flags );

        #endregion

        #region Methods (Public)

        /// <inheritdoc />
        public sealed override TResult GetResult()
        {
            return GetResult( Token );
        }

        /// <inheritdoc />
        public sealed override ValueTaskSourceStatus GetStatus()
        {
            return GetStatus( Token );
        }

        /// <inheritdoc />
        public sealed override void OnCompleted( Action<object> continuation, object state, ValueTaskSourceOnCompletedFlags flags )
        {
            OnCompleted( continuation, state, Token, flags );
        }

        #endregion
    }

This would allow removing the token from the ValueTask<TResult>and moving it to the value source by using BaseTokenValueTaskSource<short,TResult> or BaseTokenValueTaskSource<short>.

Feel free to use that code verbatim if it helps 😀

@marksmeltzer
Copy link

NOTE: even if there is good reason to keep the interfaces (I would actually not mind having a dual approach and supporting both internally), using a specific struct TokenValueSource<TToken,TResult> that implements IValueTaskSource<TResult> should get the desired outcome without adding the token to the ValueTask directly. There's definitely a boxing penalty for this compromise, but it probably wouldn't hurt overmuch for the specific intended use cases.

If the performance were bad with specific cases using the suggested TokenValueSource<TToken,TResult> approach, modifying those cases to use BaseTokenValueSource<TToken,TResult> instead would likely resolve the observed penalty.

@stephentoub stephentoub merged commit 57eda74 into dotnet:master Mar 1, 2018
@stephentoub
Copy link
Member Author

even if there is good reason to keep the interfaces

There is. We want to be able to use any object rather than one with a specific base type. For example, in Sockets we want to use an object derived from SocketAsyncEventArgs. And in pipelines, we want to use the same object to support multiple operations, all with different return types. An abstract base class would make such things impossible without more allocations, and avoiding allocations is a key aspect of this feature.

(I would actually not mind having a dual approach and supporting both internally),

If there's a proven need, it can be considered for the future.

@stephentoub stephentoub deleted the valuetaskextensibility branch March 1, 2018 11:21
@marksmeltzer
Copy link

marksmeltzer commented Mar 1, 2018

The base class approach can definitely be added later, and I'm positive it would help in some cases.

In the framework I support, I've got a custom task class that I'm already using for something similar to this (thus I've already got the allocation cost), and I had to go through great lengths to avoid the double allocate of passing an Action delegate instance to create the Task<T>. To avoid the cost of allocating the delegate, I've got a static readonly delegate to a static method, I use the Task<T>'s state to hold my task instance, and the static method simply casts the state back to my task and calls the Execute on my task class. That is rather convoluted, but avoids the double allocation cost.

I like the idea of modifying my code (down the road) to use this new mechanism instead for those cases. It's cleaner.

My bigger concern is adding fields to the core task related structures like ValueTask. I see your point about the SocketAsyncEventArgs quandary. I've got a lot of rather nasty code (callback hell) that hits the SocketAsyncEventArgs APIs... it would be wonderful to get async support down at that level of the .NET APIs as efficiently as possible. And I see what you're trying to do there: pass some state around via the token in ValueTask without any additional allocation costs. (We all want the sockets APIs to be as fast as possible.)

I could see the token helping with a few edge cases, but I cannot think of much use for it offhand in the APIs that I support. I would have to do a deep dive through my code to see if the token would add any value or assist in eliminating any allocations...

Did you happen to test something similar to using a token "box" like I posted above which uses a struct to store the token and the struct implements the IValueTaskSource interface? .NET and .NET Core currently cause some boxing/unboxing operations when invoking interface members on boxed structs. The rumor is that that will be improved sooner than later. (So, the additional cost for those cases would theoretically be reduced once those runtime improvements are implemented.)

In my personal experience, I have repeatedly seen that the cost of allocating temporary purpose built helper classes to avoid interface dispatch costs is often much lower than my very "clever" interface-based code. In fact, that's one of my goto, tried and true, optimization strategies. I've seen code that was as optimized as possible, using interfaces, get roughly a 30% speed bump by refactoring to a base class. (The payoff is going to be higher in cases where the code is hitting multiple interface members repeatedly in a tight loop.)

dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Mar 1, 2018
…ensibility

Implement ValueTask extensibility

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
@stephentoub
Copy link
Member Author

I could see the token helping with a few edge cases, but I cannot think of much use for it offhand in the APIs that I support.

The primary purpose is to enable an IValueTaskSource implementation to catch misuse. If you're pooling / reusing these across multiple operations, which is the primary use case, then it's possible that a previous call that has a ValueTask could still be holding on to an IValueTaskSource instance that's already being used for another operation. If someone then tried to access the original ValueTask, they'd be accessing the wrong operation. The token allows for the IValueTaskSource to fail operations through the original ValueTask rather than silently getting corrupted or returning incorrect results.

I'm sure inventive folks will come up with other uses for the token, but the above is the primary motivating use case.

Did you happen to test something similar to using a token "box" like I posted above which uses a struct to store the token and the struct implements the IValueTaskSource interface?

I don't understand the suggestion. The whole point here is avoiding allocation; if you implement IValueTaskSource on a struct and pass that struct to the ValueTask, that's going to allocate the box. At that point there's no value to using a struct, or to even having the token: just allocate a class that implements the interface and pass that in, there's no potential for reuse as you're not pooling the object.

stephentoub added a commit to dotnet/corefx that referenced this pull request Mar 1, 2018
…ensibility

Implement ValueTask extensibility

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
@benaadams
Copy link
Member

is better expressed as an abstract class BaseValueTaskSource

Then you can't have a ReadAsync and WriteAsync on same source (as the base class can only be inherited once) so would need three classes to create a duplex source; however only need one via interfaces (using explicit implementations)?

@marksmeltzer
Copy link

Sorry about my off-base comments above, I took the time to read all of the code and look at the implementation for ValueTask. The way it's implemented, with the flags enum stored as a byte, the space for short token pretty much comes for free due to the packing rules. So, yeah, it definitely doesn't hurt to do something useful with that unused dataspace since its allocated and moving anyway.

Moreover, that should actually leave an extra byte to work with... if you can think of something useful to do with it. Would the token be any more useful it was a quasi-24 bit int instead of a 16 bit short? In that case, the flags could be stored in the upper byte and stripped off to get the int token. You'd get a range of 0 - 16,777,215 on 24 bits going that route instead only 0 - 65,536 with the current 16 bits (or rather -32,768 - 32,767).

I would personally find that additional range very useful -- that's a pretty big number for runtime operation numbers, version numbers, etc.) There's the question of what to do with negative numbers (uint?)... although there's a bit twiddle for that (http://graphics.stanford.edu/~seander/bithacks.html#FixedSignExtend).

Likely it is just fine the way it is, I just thought I would mention the possibility of doing something useful with that extra byte while it's still hot on the press...

(1) Is there any chance that this will make it into 2.1? (I'm excited to play with this new feature in production code.)

(2) Any thoughts at all on when this feature might bubble up to full .NET? My main project is cross platform... so the platform deltas can get a bit painful to straddle. Regardless, I'll take wins where I can. (I'm guessing a long time...)

(3) Do you any place you could point me towards to review the scenarios using the token on the ValueTask that you described? Like for SocketAsyncEventArgs, etc.? I didn't see any cases in this commit where the token is actually used, and I'd like to see some examples of what you described above.

@stephentoub
Copy link
Member Author

Moreover, that should actually leave an extra byte to work with

It depends on the size of T. If T is a bool, for example, which will actually be very common, then on 32-bit we'd have a 64-bit type with 32-bits for the object and 32-bits for the bool+byte+short, with basically no space left over.

Is there any chance that this will make it into 2.1?

Yes

Any thoughts at all on when this feature might bubble up to full .NET?

Actually built in, I'm not sure. But the plan is all of this ships as part of the System.Threading.Tasks.Extensions NuGet package as well, so you can just reference that from your .NET Framework application. There are some optimizations in the coreclr implementation that aren't in that out-of-band implementation, but the core pieces are there.

Do you any place you could point me towards to review the scenarios using the token on the ValueTask that you described? Like for SocketAsyncEventArgs, etc.? I didn't see any cases in this commit where the token is actually used, and I'd like to see some examples of what you described above.

See the corresponding PR in corefx:
dotnet/corefx#27497

@marksmeltzer
Copy link

marksmeltzer commented Mar 1, 2018

Sidebar...

While we're talking about reducing allocations for Async and Task implementations, I thought I'd share a concrete example of how to avoid allocating a delegate instance for the continuations and/or the initial tasks.

Have a look at my sample changes to the FileStream.WriteAsyncInternal() method to remove the allocation of the delegate entirely. This strategy is cumbersome, but works beautifully when the task's state is sufficient to feed the continuation (and as long as the setting up the continuation state doesn't cause an additional allocation -- like boxing; otherwise it ends up being the same difference as creating a capturing delegate).

https://github.com/marksmeltzer/coreclr/blame/patch-2/src/mscorlib/shared/System/IO/FileStream.Unix.cs#L683

I've got two commits on that branch: the first shows the strategy implemented using pure static method (good) and the second shows the static method deferring to the instance method (better).

    private ValueTask WriteAsyncInternal(ReadOnlyMemory<byte> source, CancellationToken cancellationToken)
    {
       ..... //removed for simplicity

       // Otherwise, issue the whole request asynchronously.
        _asyncState.ReadOnlyMemory = source;
        return new ValueTask( waitTask.ContinueWith( _staticContinueWriteAsyncInternalDelegate, this, CancellationToken.None, TaskContinuationOptions.DenyChildAttach, TaskScheduler.Default ) );
    }

    private static readonly Action<Task, object> _staticContinueWriteAsyncInternalDelegate = ContinueWriteAsyncInternal;

    private static void ContinueWriteAsyncInternal( Task t, object s )
    {
        Debug.Assert( t.Status == TaskStatus.RanToCompletion );
        var thisRef = ( FileStream )s;
        thisRef.ContinueWriteAsyncInternal( );
    }

    private void ContinueWriteAsyncInternal( )
    {
        // The options available on Unix for writing asynchronously to an arbitrary file 
        // handle typically amount to just using another thread to do the synchronous write, 
        // which is exactly  what this implementation does. This does mean there are subtle
        // differences in certain FileStream behaviors between Windows and Unix when multiple 
        // asynchronous operations are issued against the stream to execute concurrently; on 
        // Unix the operations will be serialized due to the usage of a semaphore, but the 
        // position/length information won't be updated until after the write has completed, 
        // whereas on Windows it may happen before the write has completed.

        try
        {
            ReadOnlyMemory<byte> readOnlyMemory = _asyncState.ReadOnlyMemory;
            _asyncState.ReadOnlyMemory = default( ReadOnlyMemory<byte> );
            WriteSpan( readOnlyMemory.Span );
        }
        finally { _asyncState.Release( ); }
    }

@marksmeltzer
Copy link

marksmeltzer commented Mar 1, 2018

Moreover, that should actually leave an extra byte to work with

It depends on the size of T. If T is a bool, for example, which will actually be very common, then on 32-bit we'd have a 64-bit type with 32-bits for the object and 32-bits for the bool+byte+short, with basically no space left over.

That math makes sense for ValueTask<T>. I was specifically thinking of and referring to ValueTask in my comment, which does have an extra byte. The impact on ValueTask<T> is interesting... definitely an empirical question. In my own extensive asynchronous codebase, I found exactly one use of Task<bool> and zero Task<byte> uses... I might have some that happen to return an byte-length enum.

My guess is that much more often that not, we'd see ValueTask<T> instances occupying > 64 bits. If you have access to a GitHub or NuGet datamine tool to get some actual usage stats though, your perspective would be better than mine.

This is my last comment on this topic. I just wanted to raise the possibility and let you think on it.

@stephentoub
Copy link
Member Author

Sidebar...

Let's avoid continuing an unrelated conversation on this PR. Thanks.

I thought I'd share a concrete example of how to avoid allocating a delegate instance for the continuations

There's no per-operation delegate allocation there. I assume you're referring to this code:

return new ValueTask(waitTask.ContinueWith((t, s) =>
{
// The options available on Unix for writing asynchronously to an arbitrary file
// handle typically amount to just using another thread to do the synchronous write,
// which is exactly what this implementation does. This does mean there are subtle
// differences in certain FileStream behaviors between Windows and Unix when multiple
// asynchronous operations are issued against the stream to execute concurrently; on
// Unix the operations will be serialized due to the usage of a semaphore, but the
// position/length information won't be updated until after the write has completed,
// whereas on Windows it may happen before the write has completed.
Debug.Assert(t.Status == TaskStatus.RanToCompletion);
var thisRef = (FileStream)s;
try
{
ReadOnlyMemory<byte> readOnlyMemory = thisRef._asyncState.ReadOnlyMemory;
thisRef._asyncState.ReadOnlyMemory = default(ReadOnlyMemory<byte>);
thisRef.WriteSpan(readOnlyMemory.Span);
}
finally { thisRef._asyncState.Release(); }
}, this, CancellationToken.None, TaskContinuationOptions.DenyChildAttach, TaskScheduler.Default));

The lambda explicitly avoids closing over any state, and as a result the C# compiler will cache the delegate for it and reuse that same delegate object over and over and over. All of the state is passed into the object as a state argument.

@marksmeltzer
Copy link

The lambda explicitly avoids closing over any state, and as a result the C# compiler will cache the delegate for it and reuse that same delegate object over and over and over. All of the state is passed into the object as a state argument.

Well, if the compiler has gotten smart enough to do that... Hurray! (It does make perfect sense to do that automatically though.) I really want to believe you on that 😁 I'm distrusting by nature though so I'll have to check some IL to confirm.

The delegation to the instance method can have a bit of perf kick though due to the field locality. That might be the source of the bulk of the gains that I've measured by implementing this strategy. So perhaps not a total waste of time.

Let's avoid continuing an unrelated conversation on this PR. Thanks.

Not too unrelated as it pertained to specific code in the commit, but I do think we've hammered this nail enough. Thanks for the info about the C# compiler & delegate caching. I definitely need to read up on that.

dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Mar 1, 2018
…ensibility

Implement ValueTask extensibility

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub added a commit to dotnet/corert that referenced this pull request Mar 1, 2018
…ensibility

Implement ValueTask extensibility

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Mar 1, 2018
…ensibility

Implement ValueTask extensibility

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@marksmeltzer
Copy link

marksmeltzer commented Mar 3, 2018

@stephentoub, I hate to beat a horse to death here, but I wanted to clarify that while you're right about the C# compiler generating a field for us at compile time to store stateless delegates, it isn't exactly the same as doing it by hand (e.g. the strategy I outlined in my "sidebar" above).

Consider these two C# statements. The first using my static readonly callback strategy:

private static readonly Action<ContextBoundActionAsyncTask> _staticExecuteCallbackDelegate = ExecuteCallback;

public override void Start( object state )
{
    StartAsyncDetached( _staticExecuteCallbackDelegate, this );
}

And the second uses the stateless delegate strategy:

public override void Start( object state )
{
    StartAsyncDetached( s => ExecuteCallback(s), this );
}

The IL for the first is very simple:

.method public hidebysig virtual instance void 
  Start(
    object state
  ) cil managed 
{
  .custom instance void [mscorlib]System.Diagnostics.DebuggerStepThroughAttribute::.ctor() 
    = (01 00 00 00 )
  .maxstack 8

  // [115 13 - 115 14]
  IL_0000: nop          

  // [116 17 - 116 76]
  IL_0001: ldsfld       class [mscorlib]System.Action`1<class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask> LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask::_staticExecuteCallbackDelegate
  IL_0006: ldarg.0      // this
  IL_0007: call         void LivingAgile.Core.AsyncUtil::StartAsyncDetached<class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask>(class [mscorlib]System.Action`1<!!0/*class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask*/>, !!0/*class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask*/)
  IL_000c: nop          

  // [128 13 - 128 14]
  IL_000d: ret          

} // end of method ContextBoundActionAsyncTask::Start

The IL for the stateless delegate method is more complex:

.method public hidebysig virtual instance void 
  Start(
    object state
  ) cil managed 
{
  .custom instance void [mscorlib]System.Diagnostics.DebuggerStepThroughAttribute::.ctor() 
    = (01 00 00 00 )
  .maxstack 8

  // [115 13 - 115 14]
  IL_0000: nop          

  // [116 17 - 116 69]
  IL_0001: ldsfld       class [mscorlib]System.Action`1<class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask> LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask/'<>c'::'<>9__7_0'
  IL_0006: dup          
  IL_0007: brtrue.s     IL_0020
  IL_0009: pop          
  IL_000a: ldsfld       class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask/'<>c' LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask/'<>c'::'<>9'
  IL_000f: ldftn        instance void LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask/'<>c'::'<Start>b__7_0'(class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask)
  IL_0015: newobj       instance void class [mscorlib]System.Action`1<class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask>::.ctor(object, native int)
  IL_001a: dup          
  IL_001b: stsfld       class [mscorlib]System.Action`1<class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask> LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask/'<>c'::'<>9__7_0'
  IL_0020: ldarg.0      // this
  IL_0021: call         void LivingAgile.Core.AsyncUtil::StartAsyncDetached<class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask>(class [mscorlib]System.Action`1<!!0/*class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask*/>, !!0/*class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask*/)
  IL_0026: nop          

  // [128 13 - 128 14]
  IL_0027: ret          

} // end of method ContextBoundActionAsyncTask::Start

In the second method, where the compiler creates a delegate class, and a backing field, it is checking to see if the field is non-null, and if it is null, it creates the delegate and stores it, and then otherwise executes the stored delegate.

That branching does have a cost. The branch predict should essentially always get it right after the field is populated. So on heavy use code-paths, this method is probably okay. But there is an implied, even if minor, cost to letting the compiler "do it for us" here. (I suppose the counterargument is that a future version of the compiler might "do it better", but it is what it is for now).

I will tell you one benefit of this investigation for me: my Resharper configuration was complaining that s => ExecuteCallback(s) can be replaced with the method group (e.g. StartAsyncDetached( ExecuteCallback, this ), and letting it do the simplification to the method group ALWAYS allocates a delegate:

public override void Start( object state )
{
    StartAsyncDetached( ExecuteCallback, this );
}

produces this horrid IL:

.method public hidebysig virtual instance void 
  Start(
    object state
  ) cil managed 
{
  .custom instance void [mscorlib]System.Diagnostics.DebuggerStepThroughAttribute::.ctor() 
    = (01 00 00 00 )
  .maxstack 8

  // [115 5 - 115 6]
  IL_0000: nop          

  // [116 9 - 116 53]
  IL_0001: ldnull       
  IL_0002: ldftn        void LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask::ExecuteCallback(class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask)
  IL_0008: newobj       instance void class [mscorlib]System.Action`1<class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask>::.ctor(object, native int)
  IL_000d: ldarg.0      // this
  IL_000e: call         void LivingAgile.Core.AsyncUtil::StartAsyncDetached<class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask>(class [mscorlib]System.Action`1<!!0/*class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask*/>, !!0/*class LivingAgile.Core.AsyncUtil/ContextBoundActionAsyncTask*/)
  IL_0013: nop          

  // [128 5 - 128 6]
  IL_0014: ret          

} // end of method ContextBoundActionAsyncTask::Start

I know in the past I've always let Resharper make that substitution for the method group for me, and now it is littered all through my codebase. I had assumed (incorrectly) that it was allocating always anyway, and the method group substitution reads better.

I would argue that the compiler should handle the method group case using the same strategy as the stateless delegate. I think I'll try to find the place to add an issue for the C# compiler team. If they make this change, it would apply to old frameworks too!

(ps. I've pinged Jetbrains on Twitter to see if they can help us poor saps who've made this blunder... https://twitter.com/marksmeltzer/status/969762966655582208)

@svick
Copy link

svick commented Mar 3, 2018

@marksmeltzer

I would argue that the compiler should handle the method group case using the same strategy as the stateless delegate. I think I'll try to find the place to add an issue for the C# compiler team. If they make this change, it would apply to old frameworks too!

That's dotnet/roslyn#5835, which has a PR, dotnet/roslyn#6642, that has been open for several years, with last activity several months ago. I'm not sure what's currently blocking it.

@stephentoub
Copy link
Member Author

@benaadams, this change significantly impacted the await code paths for value tasks, which you'd previously looked at improving. Might be worth another look if you have the time 😄

kbaladurin pushed a commit to kbaladurin/corert that referenced this pull request Mar 15, 2018
…ensibility

Implement ValueTask extensibility

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants