API Proposal: ShiftLeft, ShiftRight for Vector<T> #20510

mjmckp · 2017-03-07T22:15:08Z

Updated Proposal

#20510 (comment)

Original Proposal

This is a proposal of the ideas discussed in issue #20147.

There are a few SIMD operations on System.Numerics.Vector<T> missing that are required to implement vectorized versions of log, exp, sin, and cos.

There is a C implementation of these methods using AVX intrinsics here, which I have translated into a C# implemention using Vector<T> here. In the implementation, I have added fake versions of the missing intrinsics.

@mellinoe suggested creating two separate issues for the missing methods (the other issue is #20509), so the purpose of this issue is to propose the addition of the following methods:

public static Vector<short> ShiftLeft(Vector<short> x, int n)
public static Vector<int> ShiftLeft(Vector<int> x, int n)
public static Vector<long> ShiftLeft(Vector<long> x, int n)
public static Vector<short> ShiftRight(Vector<short> x, int n)
public static Vector<int> ShiftRight(Vector<int> x, int n)
public static Vector<long> ShiftRight(Vector<long> x, int n)

which map directly to the following AVX2 intrinsics:

_mm256_slli_epi16
_mm256_slli_epi32
_mm256_slli_epi64
_mm256_srli_epi16
_mm256_srli_epi32
_mm256_srli_epi64

respectively.

The text was updated successfully, but these errors were encountered:

mellinoe · 2017-03-08T01:41:59Z

@CarolEidt @sivarv

The main conceptual problem with these methods is that, at least if our past investigations are valid, it is not feasible for the JIT to produce good SIMD instructions for these in all cases, because the second parameter is not necessarily a constant. It can do so if the methods are called with a constant parameter, but this cannot be guaranteed in C#. There is a "pit of failure" so to speak, where the performance of the method drops off a cliff if used improperly. For example, consider the following:

result = ShiftLeft(v, 5); // Constant parameter
result = ShiftLeft(v, this.Settings.ShiftAmount); // Parameter from an indirect value

The second call cannot be properly optimized because this.Settings.ShiftAmount is not a compile-time constant. The first call CAN be, because 5 is a compile-time constant. There is no way to enforce or communicate that the second usage is degenerate / invalid and will be many times slower than the first usage.

The most obvious workaround for this: bake specific constants into the method signature, e.g. force them to be a part of the compile-time signature:

public static Vector<T> ShiftLeftOne(Vector<T> v);
public static Vector<T> ShiftLeftTwo(Vector<T> v);
public static Vector<T> ShiftLeftThree(Vector<T> v);
public static Vector<T> ShiftLeftFour(Vector<T> v);
...

sivarv · 2017-03-08T01:44:31Z

CC @russellhadley

jackmott · 2017-03-22T03:45:38Z

What about use of an enum?

enum ShiftAmount : byte/int { One = 1, Two, Three, ... };
public static Vector<T> ShiftLeft(Vector<T> v, ShiftAmount a);

// called like:

var result = ShiftLeft(v,ShiftAmount.One);
var result = ShiftLeft(v,ShiftAmount.Two);
var result = ShiftLeft(v,ShiftAmount.Three);

mellinoe · 2017-03-22T04:37:00Z

The problem is that the value cannot be guaranteed to be a compile-time constant. There is no way to guarantee that in C#, except to make the constant a part of the method signature (i.e. the name of the method). Enums don't help in that regard.

jackmott · 2017-03-22T05:01:33Z

Of course, sorry. So the C# compiler seems to have the facility to identify if something is a compile time constant or not, as evidenced by error messages on optional parameters on functions.

Are there any other motivations for semantics to constrain an input parameter to be a compile time constant? Or would that definitely be a no-go?

svick · 2017-03-22T14:07:32Z

Could including a Roslyn analyzer in the NuGet package be a solution?

That way, if you write ShiftLeft(v, notAConstant) in C# or VB, you get a warning in VS.

Caveats:

Not sure what would happen in e.g. VS Code or dotnet build, but those could probably be improved to support analyzers, if they don't already.
The warning won't show up in other languages, like F#.

jackmott · 2017-04-04T17:11:01Z

Is NEON the only architecture where this needs to be a constant?

mjmckp · 2017-05-10T22:34:32Z

Out of interest: Why doesn't the JIT support immediate operands?

mellinoe · 2017-05-10T22:55:41Z

@svick Including a Roslyn analyzer is a potential option -- it's the only one we've thought of that could help here, TBH.

@jackmott I'm not very familiar with NEON. The discussion above certainly applies to SSE/AVX instruction families.

@mjmckp It's not that the JIT "doesn't support immediate operands". The problem is that, when writing C# code, there is no way to enforce that you call a function with an immediate operand; that just isn't a concept available in the language or type system. When the JIT encounters such a call, every option at that point is sub-optimal. The JIT could either:

Generate an exception telling you that you messed up (not a good experience).
Generate a very slow version of the "intrinsic".

Like I said above, I think the Roslyn analyzer is the best option that we've thought of. We have never shipped such an analyzer with a BCL package, though, so I'm not sure exactly what it would look like.

mjmckp · 2017-05-10T23:11:01Z

@mellinoe I guess what I meant was: Why can't the JIT look at each invocation of the method, and either 1) generate a fixed instruction which contains the immediate operand when the argument is a literal value, or, 2) generate a dynamic instruction which contains the runtime value of the argument? I'm sure there's a perfectly good reason why it can't, I'm just curious!

mellinoe · 2017-05-10T23:20:55Z

@mjmckp It should be able to, but the "dynamic instruction" version will be many times slower, in theory. If we hit that path, we've essentially "failed" to give you an intrinsic operation. We'd like to design something where it is obvious when you make that mistake, and how to easily fix it.

4creators · 2017-06-23T22:27:42Z

I think there is a solution to that problem - code has to talk to compiler (Jit):

[AttributeUsage(AttributeTargets.Parameter)]
internal class JitIntrinsicParamAttribute : Attribute 
{
    public JitIntrinsicParamAttribute(ParamType type) { Type = type; }
    public IntrinsicParamType Type { get; private set; }    
}

internal enum IntrinsicParamType 
{ 
    Unknown,
    Immediate 
}

public static Vector<T> ShiftLeft(Vector<T> x, [JitIntrinsicParam(IntrinsicParamType.Immediate)] int n);

but on the other end compiler has to listen and should understand above code - without Jit changes implementation would be very difficult.

Encountering above attribute compiler should implement rewriting transform which would ensure that shift operation receives argument as an immediate value disregard of what we pass to method invocation.

The above pattern can be further generalized to include other difficult to implement corner cases.

mellinoe · 2017-06-23T23:28:11Z

@4creators You can still pass a non-constant value to that method. The problem isn't that the JIT can't detect this case (it can), but that there is no way to gracefully recover at that point. Ideally, your code would not compile if you passed a non-constant parameter to the function, but C#/IL does not allow that in any way.

svick · 2017-06-24T12:35:50Z

I think using an attribute is a good idea. I would mean that:

Future versions of compilers (including compilers for languages that don't support Roslyn analyzers) could detect the attribute and produce the error themselves.
The analyzer wouldn't have to change whenever a new method with immediate parameters is added, since it could detect the attribute.

Though, what would happen if the compiler and the analyzer produced the same error? Is there some way to avoid producing two errors in that case?

4creators · 2017-06-24T14:21:21Z

@mellinoe My proposal essentially allows for non-constant values to be passed to intrinsic method which uses processor instruction requiring immediate operand.

Intel definition of one of the shift intrinsics is as follows:

_mm256_slli_epi16/32/64
Logical shift of word/doubleword/quadword elements to left according to specified number. 
The corresponding Intel® AVX2 instruction is VPSLLW, VPSLLD, or VPSLLQ*.

VPSLLW 128 bit Intel instruction definition is as follows:

VPSLLW xmm1, xmm2, xmm3/m128 -> Shift words in xmm2 left by amount specified in xmm3/m128 while shifting in 0s.
VPSLLW xmm1, xmm2, imm8 -> Shift words in xmm2 left by imm8 while shifting in 0s.

VPSLLW 256 bit instruction definition is as follows:

VPSLLW ymm1, ymm2, xmm3/m128 -> Shift words in ymm2 left by amount specified in xmm3/m128 while shifting in 0s.
VPSLLW ymm1, ymm2, imm8 -> Shift words in ymm2 left by imm8 while shifting in 0s.

Using above information one can notice that shift instructions accept not only imm8 operand which has to be a compile time constant as it can be a value of the xmm register which is passed as operand to one of the forms of VPSLLW or m128 memory location.

Going further the only significant gurantee which Jit compiler has to provide is that register stores value of parameter and not it's address. In my opinion (however I do not know RyuJit internals good enough to be any authority on that) it does allow to pass a non-constant value to that instruction and the compiler (RyuJit) must ensure during all compilation passes and in particular during register allocation, instruction selection and instruction scheduling that passed value is stored in appropriate register.

What is even more important based on available VPSLL*, VPSRL* we can add additional Vector<T> shift functions overloads and operators as follows:

public static Vector<T> ShiftLeft(Vector<T> x, Vector<U> n);
public static Vector<T> ShiftRight(Vector<T> x, Vector<U> n);
public static operator Vector<T> <<(Vector<T> x, int n);
public static operator Vector<T> <<(Vector<T> x, Vector<U> n);
public static operator Vector<T> >>(Vector<T> x, int n);
public static operator Vector<T> >>(Vector<T> x, Vector<U> n);
public static operator Vector<T> <<=(Vector<T> x, int n);
public static operator Vector<T> <<=(Vector<T> x, Vector<U> n);
public static operator Vector<T> >>=(Vector<T> x, int n);
public static operator Vector<T> >>=(Vector<T> x, Vector<U> n);

@svick I agree that attributes could be used during Roslyn compilation and by Roslyn analyzers but due to flexibility which we have on processor instruction level at least with x86 ISA this would not be required.

4creators · 2017-06-24T23:03:44Z

After some experiments I have arrived at the following solution for enforcing passing compile time constant to basic function form, however, this would require a change to Attribute syntax. Snippet which compiles is below, unfortunately current Attribute syntax is a bit problematic here bcs one can still pass a non-constant value to ConstByteAttribute constructor (it's impossible if we use attribute inside square brackets):

namespace System.Numerics.Experimental
{
    [AttributeUsage(AttributeTargets.All, Inherited = false, AllowMultiple = false)]
    public class ConstByteAttribute : Attribute
    {
        public ConstByteAttribute(byte value)
        {
            Value = value;
        }
        public byte Value
        {
            get;
            private set;
        }
    }

    public class Vector<T>
    {
        // API
        public static Vector<T> ShiftLeft(Vector<T> x, ConstByteAttribute constShift)
        {
            throw new NotImplementedException();
        }

        // API usage
        public static Vector<T> ShiftLeftOne(Vector<T> x)
        {
            return ShiftLeft(x, new ConstByteAttribute(1));
        }
    } 
}

One obvious change to C# which would be helpful for the above code would be an ability to use type parametrized attributes -> generic attributes providing that we could put on them constraints in form of value types. It would be a modification of the proposal dotnet/csharplang#569

    [AttributeUsage(AttributeTargets.All, Inherited = false, AllowMultiple = false)]
    public class ConstAttribute<T> : Attribute where T : byte, sbyte 
    {
        public ConstAttribute(T value)
        {
            Value = value;
        }
        public T Value
        {
            get;
            private set;
        }
    }

Other changes could include implicit or explicit conversion operators form T to ConstAttribute and from
ConstAttribute to T. Using this pattern we could get the following syntax of calling methods with compile time constants:

    public class Vector<T>
    {
        // API
        public static Vector<T> ShiftLeft(Vector<T> x, ConstByteAttribute constShift)
        {
            throw new NotImplementedException();
        }

        // API usage
        public static Vector<T> ShiftLeftOne(Vector<T> x)
        {
            return ShiftLeft(x, (Const)1);
        }
    }

Finally we have a new very useful language feature -> compile time constants (or perhaps even another class of constants -> scoped constants) which are enforced using type system (very similar to C++).

mellinoe · 2017-06-24T23:28:08Z

@4creators I don't really understand what you're proposing, nor how it would solve the issue at hand. Like you have said, there is no way to ensure that someone passes a constant value to the attribute constructor, so it is no better than just passing a value in directly. I may be missing the reason you are using an Attribute type, though. You're not using it as an attribute at all, just a regular method parameter.

I also believe you have misunderstood the behavior of the shift instructions above, but I am not an expert in that particular instruction. The documentation just seems to indicate that the shift amount is specified by the 1-byte immediate operand.

Realistically, we will probably do the following:

[Short term] Add a Roslyn analyzer which will cause compilation errors due to improper usage (probably driven by a marker attribute on parameters that must be constant)
[Long term] Petition the C# language team to add first-class support for this in the language (or in the Roslyn compiler, at least).

4creators · 2017-06-25T00:28:07Z

@mellinoe It is misunderstending - in my first post I have used attribute to mark passed value to enable Jit support which as You have said is available:

The problem isn't that the JIT can't detect this case (it can)`

therefore attribute use is not necessary anymore for all cases where we do not have to use constants - I thought at that time of writing that it would mean always for shift intrinsics. I have not stated that clearly in the second post discussing processor instructions. This part was edited and set in bold.

What I wanted to stress in second post discussing processor instructions is that we do not have to use immediate value as an argument to shift instruction. In the x86 ISA there are almost always defined two other VPSLL*, VPSRL* instructions which use xmm register or directly memory location:

VPSLLW ymm1, ymm2, xmm3/m128 -> encoded as VEX.NDS.256.66.0F.WIG F1 /r

For this instruction variant operand in which one passes shift value is a xmm3 register or m128 memory location. The instruction is encoded at a binary level differently from instruction which uses immediate (compile time constant) value.

VPSLLW ymm1, ymm2, imm8 -> encoded as VEX.NDD.256.66.0F.WIG 71 /6 ib

Altogether AVX2 defines 18 256bit shift instructions on words, double words and quadwords out of which only 6 require immediate operand (constant), 6 can use xmm register and 6 can use memory location. AVX defines 18 128bit shift instructions with exactly same groups as AVX2. Exception to this schema is at the level of SSE2 instructions where there are only 10 128bit two operand instructions and only 4 may use xmm or m128 memory operands.

My third post, however, is an extension of argument from the first post on attribute usage but this time in a completely new context i.e. we can implement 32 Vector shift instrinsics without constants using available AVX2, AVX and SSE instructions but still there are 12 corner cases where we would need compile time constants. Actually I have noticed that we have couple of uncovered shift cases while counting x86 ISA instructions over last hour.

Uncovered cases - no hardware support - comprise processors which do not support AVX/AVX2 ISA and these are pre Sandy Bridge (no AVX) Intel processors plus lower classes of even currently available processors i.e. currently produced Intel Pentium G and lower (but not Intel Pentium D). When we go lower on product ladder than even currently produced low power processor would not support SSE.

saucecontrol · 2017-07-02T19:26:52Z

You can still pass a non-constant value to that method. The problem isn't that the JIT can't detect this case (it can), but that there is no way to gracefully recover at that point. Ideally, your code would not compile if you passed a non-constant parameter to the function, but C#/IL does not allow that in any way.

Wouldn't the recovery option just be to use the IL implementation rather than the instrinsic?

mellinoe · 2017-07-02T21:50:09Z

Wouldn't the recovery option just be to use the IL implementation rather than the instrinsic?

Yes -- that is likely an order of magnitude slower.

saucecontrol · 2017-07-02T22:43:56Z

Right, but it would still work. I think at some point, we have to acknowledge that Vector is there to support advanced performance scenarios, and developers using it should have a basic understanding of the implementation and restrictions.

I honestly don't see how this is any worse than the current state following the addition of the Narrow/Widen/Convert methods on Vector<T>. We used to know whether an operation was intrinsic based on the value of Vector.IsHardwareAccelerated

It started out as:
If the hardware supports it and you're on RyuJit, it's intrinsic.
But now it's:
If the hardware supports it AND the RyuJit version you're on has the implementation, it's intrinsic.
Could be:
If the hardware supports it AND the RyuJit version you're on has the implementation AND you use it correctly, it's intrinsic.

And I think the 'using it correctly' restriction is kind of implicit in Vector operations anyway, because if you use them when they're not supported (for whatever reason), it's always much slower than a scalar implementation would be.

mellinoe · 2017-07-03T17:34:05Z

I'm in agreement with you -- the Vector API's are for advanced users, and we can only do so much hand-holding; it is ultimately up to the user to make sure they understand what they are doing.

Like I said above, and after discussing this a lot, we are leaning towards throwing an exception if a non-const value is given. I don't think that using the IL fallback path is a good idea. The JIT is able to detect when an invalid value is passed in and generate an exception instead. We will also look into a Roslyn analyzer and/or a Roslyn compiler feature that lets us specify these parameter constraints so that errors can be caught at compile time rather than runtime.

saucecontrol · 2017-07-03T17:49:38Z

Oh yeah, that makes perfect sense. Since a non-immediate argument is never correct, throwing an exception is probably the right thing to do. I thought the hold-up was that throwing a runtime exception wasn't acceptable and that catching it at compile time wasn't possible.

pentp · 2017-07-03T19:23:19Z

What's wrong with non-immediate shift counts? From @4creators comment and Intel documentation it looks like all of these instructions have both int imm8 and __m128i count variants (excluding the non-packed shifts and byte shifts that are not relevant here).

mellinoe · 2017-07-03T19:28:33Z

The discussion above is just about how to handle hardware instructions which require immediate operands. If there's an instruction which doesn't take an immediate operand (and there is in this case), then there is no problem. It's just something to keep in mind when looking at these sorts of things.

saucecontrol · 2017-07-03T20:11:32Z

Yeah, I got mixed up there. I was thinking about shuffles because that's what I'm more interested in, and they were part of the conversation.

Having thought more about it, though, I'm not sure a runtime exception would be great for those cases where an immediate is needed for the intrinsic and a variable is passed to the method. It would be possible (however unlikely) for someone to write bad code, test it with the legacy JITs and have it work with the IL implementation, only to have it fail later in a RyuJit environment. Seems like the best option is still to fall back to the slow code for consistency.

mellinoe · 2017-07-03T20:52:27Z

If we include a Roslyn analyzer with the library (or land a built-in compiler feature), then the user would also have to actively ignore the build-time warning/error about misusing the API. I think I'm okay with throwing a runtime error in that case.

4creators · 2017-07-15T00:29:01Z

I have created a C# const keyword extended usage proposal as a one of the possible solutions to enforcing compile time constant usage in some methods overloads. I think it's clean, intuitively easy to understand since it is well known keyword used in the very same way but in new context. Should be easy to implement as well.

jkotas · 2017-07-16T18:43:21Z

I think I'm okay with throwing a runtime error in that case.

I do not think that it is a good idea to throw exceptions as performance warnings when the JIT is not able to perform given optimization. There are number of cases where the slow non-interpreted implementation is perfectly acceptable. For example, profilers and other similar diagnostic tools that instrument IL can modify the IL such that the JIT may not be able to detect the constant.

It is not unusual to see order of magnitude slow down in a performance sensitive methods when the expected optimization is not performed for them, nothing specific to SIMD. For example, MethodImplOptions.AggressiveInlining may not be honored for various reasons. We do not throw exceptions and crash the program when that happens. Instead, we have ETW events that you can use to find out when the expected optimization did not happen.

redknightlois · 2017-07-17T01:17:03Z

@jkotas @mellinoe probably I am not understanding the root of the issue of immediate access instructions. But given that the latency for the immediate is 1 and the other case (register) is 3 in the worst case (on most relevant architectures at least), there is not such a big difference between one or the other. If the JIT can see the constant, fine, you issue the immediate instruction; if not, well you just issue the register one. The difference is for most practical purposes so small (given the alternative of not having it) that it feels like a non-issue for every AVX supported platform. With aggressive inlining set on the function itself, the JIT will constant propagate and catch most of the constant uses anyways. If you really really have an issue there, its because you know what you are doing anyways; and you can definitely fix it if possible.

Even in the cases where it is just not supported, the straightforward (naive) version is instruction translate to an extra table-jump + lots of instructions + a far jump back per call; definitely not the same as having support but its doable on non AVX[1|2] without issuing a call instruction (which we should avoid like the plague). And I bet there are probably better algorithms for it anyways.

I can understand the "pit of success" drive, but in the end the problem is an abstraction one. IMHO the general issue here is that probably we are not going to use this anyways in non AVX platforms (I know I wouldn't use it if I cannot know in advance). The Vector<T> abstract definition feels just not good enough for a general purpose API given the broad support necessary and the weak ability to figure out if platforms abide to the performance needed constraints. Then it begs the question, why put resources into rubber stamp a weak implementation on an abstraction that is not playing nice with it, instead of tackle the most general issue once and for all? I know that asking the same question again and again gets old, but I still cannot see a path that leads to success that doesn't have that pre-requisite built into it. Needless to say I am here hoping that I find myself being wrong.

4creators · 2017-07-17T14:58:15Z

😆 Seems that our discussion is misplaced because of errors in Intel Architecture
Instruction Set Extensions Programming Reference edition August 2015 319433-023 which is the most recent reference published by Intel comprising SSE/AVX/AVX2 instruction set extensions. When compared to Intel Architecture Instruction Set Extensions Programming Reference edition February 2012 319433-012A all SSE shift instructions have equivalents accepting register/memory operands in 2012 instead of immediate operand. Link to 2012 edition is broken as one has to add .pdf extension to download file so many of us would use 2015 edition as the newest and most authoritative one and skip download of the old one from broken link. After verifying it with Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z edition July 2017 325462-063US there are indeed 56 logical shift instructions with all having equivalents accepting besides imm8 operand xmm or m128 operands (register and memory respectively).

@mellinoe @jkotas Perhaps it would be prudent to ask Intel for official position on that error putting behind the question MSFT weight to settle problem once for all as we have wasted quite a bit of time for non-issue.

Shift Logical Left Instructions

PSLLW xmm1, xmm2
PSLLW xmm1, m128
PSLLW xmm1, imm8
PSLLD xmm1, xmm2
PSLLD xmm1, m128
PSLLD xmm1, imm8
PSLLQ xmm1, xmm2
PSLLQ xmm1, m128
PSLLQ xmm1, imm8
VPSLLW xmm1, xmm2, xmm3
VPSLLW xmm1, xmm2, m128
VPSLLW xmm1, xmm2, imm8
VPSLLD xmm1, xmm2, xmm3
VPSLLD xmm1, xmm2, m128
VPSLLD xmm1, xmm2, imm8
VPSLLQ xmm1, xmm2, xmm3
VPSLLQ xmm1, xmm2, m128
VPSLLQ xmm1, xmm2, imm8
VPSLLW ymm1, ymm2, xmm3
VPSLLW ymm1, ymm2, m128
VPSLLW ymm1, ymm2, imm8
VPSLLD ymm1, ymm2, xmm3
VPSLLD ymm1, ymm2, m128
VPSLLD ymm1, ymm2, imm8
VPSLLQ ymm1, ymm2, xmm3
VPSLLQ ymm1, ymm2, m128
VPSLLQ ymm1, ymm2, imm8

Shift Logical Right Instructions

PSRLW xmm1, xmm2
PSRLW xmm1, m128
PSRLW xmm1, imm8
PSRLD xmm1, xmm2
PSRLD xmm1, m128
PSRLD xmm1, imm8
PSRLQ xmm1, xmm2
PSRLQ xmm1, m128
PSRLQ xmm1, imm8
VPSRLW xmm1, xmm2, xmm3
VPSRLW xmm1, xmm2, m128
VPSRLW xmm1, xmm2, imm8
VPSRLD xmm1, xmm2, xmm3
VPSRLD xmm1, xmm2, m128
VPSRLD xmm1, xmm2, imm8
VPSRLQ xmm1, xmm2, xmm3
VPSRLQ xmm1, xmm2, m128
VPSRLQ xmm1, xmm2, imm8
VPSRLW ymm1, ymm2, xmm3
VPSRLW ymm1, ymm2, m128
VPSRLW ymm1, ymm2, imm8
VPSRLD ymm1, ymm2, xmm3
VPSRLD ymm1, ymm2, m128
VPSRLD ymm1, ymm2, imm8
VPSRLQ ymm1, ymm2, xmm3
VPSRLQ ymm1, ymm2, m128
VPSRLQ ymm1, ymm2, imm8

tannergooding · 2017-07-17T16:53:00Z

I believe, for a compiler feature, adding a modreq isliteral flag to the generated IL is the best route.

The runtime would still likely need to validate the parameter is a literal and throw if it is not (for manually implemented IL, or a compiler that ignores the modreq flag anyways).

mellinoe · 2017-07-17T20:17:43Z

There are number of cases where the slow non-interpreted implementation is perfectly acceptable. For example, profilers and other similar diagnostic tools that instrument IL can modify the IL such that the JIT may not be able to detect the constant.

@jkotas Thanks for bringing that up; we hadn't considered such a scenario in our discussions. In general, I don't really have a strong preference for one option or the other, because I am very hopeful that we will be able to rely on a compiler feature to make the difference more-or-less irrelevant.

tannergooding · 2017-07-31T18:24:38Z

https://github.com/dotnet/designs/issues/13

fiigii · 2017-08-03T23:10:00Z

Intel hardware intrinsic API proposal has been opened at dotnet/corefx#22940

vermorel · 2019-03-09T08:27:21Z

It has been nearly 2 years. Any progress on ShiftRight and ShiftLeft? I would really love to this as part of Vector<T>.

redknightlois · 2019-03-09T18:45:04Z

You dont need it anymore. You can implement them directly using Hardware Intrinsics. 3.0 will support the whole SSE, AVX, AVX2 standard. 2.1 already support a bunch.

mjmckp · 2019-03-09T21:09:13Z

This isn't much help to those constrained to use .NET framework, though.

tannergooding · 2021-01-15T17:17:47Z

I'm fine with taking a proposal for ShiftLeft and ShiftRight additions to Vector<T>. Given how we implemented HWIntrinsics and the proposed analyzer: #33771, I think the overall implementation will work out.

@mjmckp could you please update the top post to follow our API proposal template? https://github.com/dotnet/runtime/issues/new?assignees=&labels=api-suggestion&template=02_api_proposal.md
I think the currently proposed shape is good, but would recommend also proposing overloads for the unsigned integer types.

Once that is done, I can mark this as api-ready-for-review. However, it is worth noting that full framework will never have this be optimized as it isn't taking runtime changes (this likely would have been the case 3 years ago when it was propoosed as well). It may also not get this functionality at all as I don't believe we are shipping any OOB changes for System.Numerics.Vectors anymore

wstaelens · 2021-06-09T14:38:36Z

Any news regarding Vector, << and >> ? 😄
Could optimize a lot of calculations by using a Vector4 and bitshifting << 8 >> 13 etc...

tannergooding · 2021-06-09T15:42:50Z

Given the OP hasn't responded, if someone wants to spend the time to write up the proposal following our template, then this can move forward. Our template is: https://github.com/dotnet/runtime/issues/new?assignees=&labels=api-suggestion&template=02_api_proposal.md

The interested party can either open a new issue or post it here and I can update the original post.
-- I'm a bit pre-occupied with other work right now, and will get to it after .NET 6 finishes locking down if someone else doesn't do it sooner.

tannergooding · 2021-08-10T03:54:58Z

Backround and motivation

Vector<T> exposes a common abstraction for SIMD acceleration across various hardware platforms and architectures. It exposes many core operations today but notably does not include support for shifting.

API Proposal

namespace System.Numerics
{
    public static class Vector
    {
        public static Vector<byte> ShiftLeft(Vector<byte> x, int n);
        public static Vector<short> ShiftLeft(Vector<short> x, int n);
        public static Vector<int> ShiftLeft(Vector<int> x, int n);
        public static Vector<long> ShiftLeft(Vector<long> x, int n);
        public static Vector<nint> ShiftLeft(Vector<nint> x, int n);
        public static Vector<sbyte> ShiftLeft(Vector<sbyte> x, int n);
        public static Vector<ushort> ShiftLeft(Vector<ushort> x, int n);
        public static Vector<uint> ShiftLeft(Vector<uint> x, int n);
        public static Vector<ulong> ShiftLeft(Vector<ulong> x, int n);

        public static Vector<byte> ShiftRightLogical(Vector<byte> x, int n);
        public static Vector<short> ShiftRightLogical(Vector<short> x, int n);
        public static Vector<int> ShiftRightLogical(Vector<int> x, int n);
        public static Vector<long> ShiftRightLogical(Vector<long> x, int n);
        public static Vector<nint> ShiftRightLogical(Vector<nint> x, int n);
        public static Vector<sbyte> ShiftRightLogical(Vector<sbyte> x, int n);
        public static Vector<ushort> ShiftRightLogical(Vector<ushort> x, int n);
        public static Vector<uint> ShiftRightLogical(Vector<uint> x, int n);
        public static Vector<ulong> ShiftRightLogical(Vector<ulong> x, int n);

        public static Vector<short> ShiftRightArithmetic(Vector<short> x, int n);
        public static Vector<int> ShiftRightArithmetic(Vector<int> x, int n);
        public static Vector<long> ShiftRightArithmetic(Vector<long> x, int n);
        public static Vector<nint> ShiftRightArithmetic(Vector<nint> x, int n);
        public static Vector<sbyte> ShiftRightArithmetic(Vector<sbyte> x, int n);
    }
}

Additional Notes

It would likely be beneficial to expose shift operators on Vector<T> where right shift defaults to arithmetic for signed integers and logical for unsigned integers (dotnet/csharplang#4682). There is a potential issue with float and double where they would either need to throw or be treated as int and long, respectively.

The RHS of the shift operator would ideally be byte to simplify conversion to the hardware instructions or T to simplify usage with non-constants inputs. However, C# currently requires the RHS to be int and this is the most natural type for general purpose C# code (dotnet/csharplang#4666).

Variants could be exposed which support "variable shift" (that is where each element is shift by a different amount). However, this isn't available on all hardware and may be a perf pit for some users.

rickbrew · 2021-08-10T03:58:01Z

Vector<T> supporting bit shift operations will open up a whole lot of ARM64 performance. Right now I only have the resources for 2 code paths (one SIMD, one scalar), so I usually can't use Vector<T> and must use SSE/AVX intrinsics. Without bit-shifts, Vector<T> is unfortunately pretty useless for most image processing code.

bartonjs · 2021-08-10T18:38:01Z

Looks good as proposed, but we normalized to the current parameter names (value and shiftCount, respectively).

namespace System.Numerics
{
    public static class Vector
    {
        public static Vector<byte> ShiftLeft(Vector<byte> value, int shiftCount);
        public static Vector<short> ShiftLeft(Vector<short> value, int shiftCount);
        public static Vector<int> ShiftLeft(Vector<int> value, int shiftCount);
        public static Vector<long> ShiftLeft(Vector<long> value, int shiftCount);
        public static Vector<nint> ShiftLeft(Vector<nint> value, int shiftCount);
        public static Vector<sbyte> ShiftLeft(Vector<sbyte> value, int shiftCount);
        public static Vector<ushort> ShiftLeft(Vector<ushort> value, int shiftCount);
        public static Vector<uint> ShiftLeft(Vector<uint> value, int shiftCount);
        public static Vector<ulong> ShiftLeft(Vector<ulong> value, int shiftCount);
        public static Vector<nuint> ShiftLeft(Vector<nuint> value, int shiftCount);

        public static Vector<byte> ShiftRightLogical(Vector<byte> value, int shiftCount);
        public static Vector<short> ShiftRightLogical(Vector<short> value, int shiftCount);
        public static Vector<int> ShiftRightLogical(Vector<int> value, int shiftCount);
        public static Vector<long> ShiftRightLogical(Vector<long> value, int shiftCount);
        public static Vector<nint> ShiftRightLogical(Vector<nint> value, int shiftCount);
        public static Vector<sbyte> ShiftRightLogical(Vector<sbyte> value, int shiftCount);
        public static Vector<ushort> ShiftRightLogical(Vector<ushort> value, int shiftCount);
        public static Vector<uint> ShiftRightLogical(Vector<uint> value, int shiftCount);
        public static Vector<ulong> ShiftRightLogical(Vector<ulong> value, int shiftCount);
        public static Vector<nuint> ShiftRightLogical(Vector<nuint> value, int shiftCount);

        public static Vector<short> ShiftRightArithmetic(Vector<short> value, int shiftCount);
        public static Vector<int> ShiftRightArithmetic(Vector<int> value, int shiftCount);
        public static Vector<long> ShiftRightArithmetic(Vector<long> value, int shiftCount);
        public static Vector<nint> ShiftRightArithmetic(Vector<nint> value, int shiftCount);
        public static Vector<sbyte> ShiftRightArithmetic(Vector<sbyte> value, int shiftCount);
    }
}

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020

tannergooding removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2020

tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed enhancement Product code improvement that does NOT require public API changes/additions labels Aug 10, 2021

bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Aug 10, 2021

WhiteBlackGoose mentioned this issue Nov 3, 2021

SIMD Math tracking issue dotnet/Silk.NET#666

Open

tannergooding mentioned this issue Jan 5, 2022

Implement the last of the approved cross platform hardware intrinsics, except shuffle #63414

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 5, 2022

tannergooding closed this as completed in #63414 Jan 13, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 13, 2022

ghost locked as resolved and limited conversation to collaborators Feb 12, 2022

API Proposal: ShiftLeft, ShiftRight for Vector<T> #20510

API Proposal: ShiftLeft, ShiftRight for Vector<T> #20510

Comments

mjmckp commented Mar 7, 2017 • edited by tannergooding Loading

Updated Proposal

Original Proposal

mellinoe commented Mar 8, 2017

sivarv commented Mar 8, 2017

jackmott commented Mar 22, 2017 • edited Loading

mellinoe commented Mar 22, 2017

jackmott commented Mar 22, 2017

svick commented Mar 22, 2017

jackmott commented Apr 4, 2017

mjmckp commented May 10, 2017

mellinoe commented May 10, 2017

mjmckp commented May 10, 2017

mellinoe commented May 10, 2017

4creators commented Jun 23, 2017 • edited Loading

mellinoe commented Jun 23, 2017 • edited Loading

svick commented Jun 24, 2017

4creators commented Jun 24, 2017 • edited Loading

4creators commented Jun 24, 2017 • edited Loading

mellinoe commented Jun 24, 2017 • edited Loading

4creators commented Jun 25, 2017 • edited Loading

saucecontrol commented Jul 2, 2017

mellinoe commented Jul 2, 2017 • edited Loading

saucecontrol commented Jul 2, 2017 • edited Loading

mellinoe commented Jul 3, 2017

saucecontrol commented Jul 3, 2017

pentp commented Jul 3, 2017

mellinoe commented Jul 3, 2017

saucecontrol commented Jul 3, 2017

mellinoe commented Jul 3, 2017

4creators commented Jul 15, 2017 • edited Loading

jkotas commented Jul 16, 2017

redknightlois commented Jul 17, 2017 • edited Loading

4creators commented Jul 17, 2017

tannergooding commented Jul 17, 2017

mellinoe commented Jul 17, 2017

tannergooding commented Jul 31, 2017

fiigii commented Aug 3, 2017

vermorel commented Mar 9, 2019

redknightlois commented Mar 9, 2019

mjmckp commented Mar 9, 2019

tannergooding commented Jan 15, 2021 • edited Loading

wstaelens commented Jun 9, 2021

tannergooding commented Jun 9, 2021

tannergooding commented Aug 10, 2021

Backround and motivation

API Proposal

Additional Notes

rickbrew commented Aug 10, 2021

bartonjs commented Aug 10, 2021

mjmckp commented Mar 7, 2017 •

edited by tannergooding

Loading

jackmott commented Mar 22, 2017 •

edited

Loading

4creators commented Jun 23, 2017 •

edited

Loading

mellinoe commented Jun 23, 2017 •

edited

Loading

4creators commented Jun 24, 2017 •

edited

Loading

4creators commented Jun 24, 2017 •

edited

Loading

mellinoe commented Jun 24, 2017 •

edited

Loading

4creators commented Jun 25, 2017 •

edited

Loading

mellinoe commented Jul 2, 2017 •

edited

Loading

saucecontrol commented Jul 2, 2017 •

edited

Loading

4creators commented Jul 15, 2017 •

edited

Loading

redknightlois commented Jul 17, 2017 •

edited

Loading

tannergooding commented Jan 15, 2021 •

edited

Loading