Update loops in CpuMath to be more efficient #1177

jwood803 · 2018-10-06T12:10:47Z

Fixes issue #835

Ivanidzo4ka · 2018-10-07T06:10:08Z

Did you run any benchmarks to make sure it bring speed improvements?

danmoseley · 2018-10-11T17:12:42Z

@jwood803 the end to end benchmarks are documented here. They take a bit of time to run.

There are micro benchmarks for CPUMath specifically which might be best to start with. They are here

danmoseley · 2018-10-11T17:14:47Z

Oh, I just realized this is a duplicate of #994. That was not linked properly to #835.

As you see in #994, it is unfortunately not at all clear that this will actually help. See comment.

@tannergooding perhaps we should close #835 without action.

tannergooding · 2018-10-11T17:17:15Z

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

@@ -431,7 +431,7 @@ public static unsafe void AddScalarU(float scalar, Span<float> dst)

                Vector128<float> scalarVector128 = Sse.SetAllVector128(scalar);

-                if (pDstCurrent + 4 <= pDstEnd)
+                if (pDstCurrent <= pDstEnd - 4)


creating a temp for pDstEnd - 4 is probably better, as that would better ensure that each loop isn't doing pDstEnd - 4 before the comparison

tannergooding · 2018-10-11T17:20:14Z

@tannergooding perhaps we should close #835 without action.

I don't believe so, both PRs have the same problem, which is that each iteration of the loop may do a substraction/addition as part of the comparison.

As commented on #1177 (comment), creating a temp var pLoopEnd = pDstEnd - 4; and then using it in the comparison (while (pDstCurrent <= pLoopEnd)) will help ensure that the subtraction happens once, rather than once per loop iteration.

jwood803 · 2018-10-15T23:25:04Z

@danmosemsft Thanks a lot for letting me know about the benchmarks! I ran it two different times with the PredictionEngine benchmark. Below are the results. So far it looks like the changes provide a bit better performance.

Without changes:

With changes:

tannergooding · 2018-10-24T15:09:51Z

src/Microsoft.ML.CpuMath/SseIntrinsics.cs

@@ -417,7 +417,7 @@ public static unsafe void AddScalarU(float scalar, Span<float> dst)

                Vector128<float> scalarVector = Sse.SetAllVector128(scalar);

-                while (pDstCurrent + 4 <= pDstEnd)
+                while (pDstCurrent <= pDstEnd - 4)


Could you update the SSE case as well?

tannergooding · 2018-10-24T15:10:51Z

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

@@ -22,6 +22,8 @@ internal static class AvxIntrinsics

        private const int Vector256Alignment = 32;

+        private const int destinationEnd = pDstEnd - 4;


Where is pDstEnd defined, I wouldn't expect this could be a constant....

danmoseley · 2018-10-29T17:14:31Z

@tannergooding does this look OK now?

Nice perf improvement @jwood803 thank you!

tannergooding · 2018-10-29T17:35:20Z

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

@@ -417,6 +417,8 @@ public static unsafe void AddScalarU(float scalar, Span<float> dst)
            {
                float* pDstEnd = pdst + dst.Length;
                float* pDstCurrent = pdst;
+                int destinationEnd = pDstEnd - 4;
+


nit: stray newline here

This should be updated to remove the extra new line.

tannergooding

LGTM

eerhardt · 2018-11-01T14:28:01Z

This was a bad change/merge

Error	CS0266	Cannot implicitly convert type 'float*' to 'int'. An explicit conversion exists (are you missing a cast?)	Microsoft.ML.CpuMath(netcoreapp3.0)	F:\git\machinelearning2\src\Microsoft.ML.CpuMath\SseIntrinsics.cs	758	Active

                float* pDstEnd = pdst + dst.Length;
                float* pDstCurrent = pdst;
                int destinationEnd = pDstEnd - 4;

pDstEnd is a float*. Subtracting 4 still gives you a float*, which is not convertable to int.

The Intrinsics/netcoreapp3.0 build is broken because of this.

@shauheen @danmosemsft - maybe now would be a good time to get a netcoreapp3.0 CI leg running.

I've opened #1495 for the build break.

Update loops to be more efficient

828b9e0

sfilipi requested a review from tannergooding October 8, 2018 16:30

tannergooding reviewed Oct 11, 2018

View reviewed changes

Make temp variable for loop

1893294

tannergooding mentioned this pull request Oct 22, 2018

Made loop bound checking in hardware intrinsics more efficient #994

Closed

tannergooding reviewed Oct 24, 2018

View reviewed changes

Update SSE case and where loop variable gets declared

2b03984

danmoseley approved these changes Oct 29, 2018

View reviewed changes

tannergooding reviewed Oct 29, 2018

View reviewed changes

tannergooding approved these changes Oct 29, 2018

View reviewed changes

Remove extra new line

a100d66

shauheen merged commit 71c9ff3 into dotnet:master Oct 31, 2018

jwood803 deleted the cpu-loops branch October 31, 2018 23:56

eerhardt mentioned this pull request Nov 1, 2018

the netcoreapp3.0 build is broken #1495

Closed

ghost locked as resolved and limited conversation to collaborators Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update loops in CpuMath to be more efficient #1177

Update loops in CpuMath to be more efficient #1177

jwood803 commented Oct 6, 2018

Ivanidzo4ka commented Oct 7, 2018

danmoseley commented Oct 11, 2018

danmoseley commented Oct 11, 2018 •

edited

Loading

tannergooding Oct 11, 2018

tannergooding commented Oct 11, 2018 •

edited

Loading

jwood803 commented Oct 15, 2018

tannergooding Oct 24, 2018 •

edited

Loading

tannergooding Oct 24, 2018

danmoseley commented Oct 29, 2018

tannergooding Oct 29, 2018

jwood803 Oct 29, 2018

tannergooding left a comment

eerhardt commented Nov 1, 2018

		@@ -22,6 +22,8 @@ internal static class AvxIntrinsics

		private const int Vector256Alignment = 32;

		private const int destinationEnd = pDstEnd - 4;

Update loops in CpuMath to be more efficient #1177

Update loops in CpuMath to be more efficient #1177

Conversation

jwood803 commented Oct 6, 2018

Ivanidzo4ka commented Oct 7, 2018

danmoseley commented Oct 11, 2018

danmoseley commented Oct 11, 2018 • edited Loading

tannergooding Oct 11, 2018

Choose a reason for hiding this comment

tannergooding commented Oct 11, 2018 • edited Loading

jwood803 commented Oct 15, 2018

tannergooding Oct 24, 2018 • edited Loading

Choose a reason for hiding this comment

tannergooding Oct 24, 2018

Choose a reason for hiding this comment

danmoseley commented Oct 29, 2018

tannergooding Oct 29, 2018

Choose a reason for hiding this comment

jwood803 Oct 29, 2018

Choose a reason for hiding this comment

tannergooding left a comment

Choose a reason for hiding this comment

eerhardt commented Nov 1, 2018

danmoseley commented Oct 11, 2018 •

edited

Loading

tannergooding commented Oct 11, 2018 •

edited

Loading

tannergooding Oct 24, 2018 •

edited

Loading