Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in non-pipelined Platform benchmarks #33669

Closed
adamsitnik opened this issue Mar 17, 2020 · 25 comments
Closed

Regression in non-pipelined Platform benchmarks #33669

adamsitnik opened this issue Mar 17, 2020 · 25 comments
Assignees
Labels
area-System.Net.Sockets os-linux Linux OS (any supported distro) tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

It looks like I've introduced a regression to non-pipelined platform benchmarks in #2346

obraz

This has been noticed by @sebastienros immediately, I am creating the issue with a huge delay (it was impossible to gather traces on TE machines for a while)

@adamsitnik adamsitnik added area-System.Net.Sockets os-linux Linux OS (any supported distro) tenet-performance Performance related issue labels Mar 17, 2020
@adamsitnik adamsitnik self-assigned this Mar 17, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Mar 17, 2020
@adamsitnik
Copy link
Member Author

It looks like most of the time is spent for waiting on a lock:

obraz

@stephentoub stephentoub removed the untriaged New issue has not been triaged by the area owner label Mar 17, 2020
@stephentoub stephentoub added this to the 5.0 milestone Mar 17, 2020
@adamsitnik
Copy link
Member Author

adamsitnik commented Mar 17, 2020

It looks that only for the non-pipelined version of the platform benchmarks the epoll thread spends most of the time in EnsureThreadRequested method which is part of thread pool work enqueuing

obraz

Edit: it's not the only thread responsible for enqueuing work on thread pool:

obraz

@adamsitnik
Copy link
Member Author

@halter73 could you shortly describe how the non-pipelined platform code works? What I am interested in is who spawns the task for the new connection and for how long is it executed?

I can see that it's configured here:

https://github.com/aspnet/Benchmarks/blob/master/src/BenchmarksApps/Kestrel/PlatformBenchmarks/Program.cs#L34-L39

And the actual work is done here:

https://github.com/aspnet/Benchmarks/blob/5c25b87d9c5c46be0c7a8b3a1428b53c2b4205e5/src/BenchmarksApps/Kestrel/PlatformBenchmarks/BenchmarkApplication.HttpConnection.cs#L22-L79

@halter73
Copy link
Member

Kestrel's ConnectionDispatcher is executing the ConnectionDelegate defined by HttpApplication.ExecuteAsync which is defined in the Benchmarks repo. This is what ultimately calls BenchmarkApplication.ExecuteAsync.

Does that answer your question?

@adamsitnik
Copy link
Member Author

Does that answer your question?

Yes, thank you very much!

I have one more question left: do I understand correctly that BenchmarkApplication.ExecuteAsync is executed for all the incoming requests for a given connection? Which means that it's a long running task?

I am trying to understand why after my change the epoll_thread spends so much time in scheduling new work on ThreadPool (and only for the platform benchmarks).

int count = numOutstandingThreadRequests;
while (count < Environment.ProcessorCount)
{
int prev = Interlocked.CompareExchange(ref numOutstandingThreadRequests, count + 1, count);
if (prev == count)
{
ThreadPool.RequestWorkerThread();
break;
}
count = prev;
}

@halter73
Copy link
Member

I have one more question left: do I understand correctly that BenchmarkApplication.ExecuteAsync is executed for all the incoming requests for a given connection? Which means that it's a long running task?

Correct.

I am trying to understand why after my change the epoll_thread spends so much time in scheduling new work on ThreadPool (and only for the platform benchmarks).

Json resulting in more dispatching than pipelined plaintext makes sense since there are more requests per read in the pipelined plaintext case. I'm not sure way platform is dispatching more than middleware though. Middleware is basically built on top of the platform, so that doesn't make much sense to me yet.

@sebastienros
Copy link
Member

Side note, Plaintext NP is now slower than Json (NP). There might be something to dig here too.
And there is a clear regression on Plaintext NP from 3.1 to 5.0 (probably the same commit we are talking about here), but not on Json which is faster.

You might want to create a brand new Plaintext and Json app, completely independent from the current Benchmarks app to be sure we are really comparing the Json serialization as the single change.

@adamsitnik
Copy link
Member Author

Minor update on what I've learned so far:

The main difference between platform and non-platform benchmarks is that platform benchmarks use FlushAsync:

https://github.com/aspnet/Benchmarks/blob/826544381cd670e2a5f4b0e78f35bed6355449b0/src/BenchmarksApps/Kestrel/PlatformBenchmarks/BenchmarkApplication.HttpConnection.cs#L162

which internally schedules a job on ThreadPool and competes with epoll_thread for TP resources:

scheduler.UnsafeSchedule(completion, completionData.CompletionState);

While the non-platform benchmarks use simple socketStream.WriteAsync:

https://github.com/aspnet/Benchmarks/blob/aa253347ad9a41365446d5706e81c7eb166e61d9/src/Benchmarks/Middleware/PlaintextMiddleware.cs#L41

which in case of TechEmpower is always executing the fast path - the write is non-blocking and nothing is added to the socket queue (it's very small write, 131 bytes for JSON platform benchmark):

if (_sendQueue.IsReady(this, out observedSequenceNumber) &&
SocketPal.TryCompleteSendTo(_socket, buffer.Span, ref offset, ref count, flags, socketAddress, socketAddressLen, ref bytesSent, out errorCode))

Plaintext (the default, pipelined version) has not regressed (on this hardrware) because with pipelining enabled the flushing is much less frequent (16 times with current settings).

The problem is gone when we replace FlushAsync with socketStream.Write, but only temporarily.

It still does not answer the question of why replacing a few epoll threads that were enqueueing work on ThreadPool with a single one leads to degradation of the performance of enqueueing

@stephentoub @kouvel have you ever faced a similar problem?

@kouvel
Copy link
Member

kouvel commented Mar 19, 2020

It still does not answer the question of why replacing a few epoll threads that were enqueueing work on ThreadPool with a single one leads to degradation of the performance of enqueueing

A possibility could be that more epoll threads are more frequently able to keep the thread pool busy enough for that path to go down the fast path:

int count = numOutstandingThreadRequests;
while (count < Environment.ProcessorCount)
{
int prev = Interlocked.CompareExchange(ref numOutstandingThreadRequests, count + 1, count);
if (prev == count)
{
ThreadPool.RequestWorkerThread();
break;
}
count = prev;
}

One epoll thread may not be queuing things fast enough and causing EnsureThreadRequested to go down the slow path. Based on the time spent under that method most of the time is spent under RequestWorkerThread, which is the slow path where at least one thread had not found work in the thread pool. Hence the time spent spin-waiting for a short time in CLRLifoSemaphore::Wait, which means for about 20% of the time a thread pool thread is running out of work for a very short duration.

There is also a possibility the epoll thread is getting starved a bit by the spin-waiting, though if that is removed it would likely translate into higher context-switch CPU time. COMPlus_ThreadPool_UnfairSemaphoreSpinLimit=0 disables the spin-waiting there, may be interesting to try, to see if it noticeably affects the time spent under EnsureThreadRequested on the epoll thread. If there is an effect then maybe it could use an earlier sleep.

@kouvel
Copy link
Member

kouvel commented Mar 20, 2020

Oops copied the wrong lines above, I meant this (edited above as well):

int count = numOutstandingThreadRequests;
while (count < Environment.ProcessorCount)
{
int prev = Interlocked.CompareExchange(ref numOutstandingThreadRequests, count + 1, count);
if (prev == count)
{
ThreadPool.RequestWorkerThread();
break;
}
count = prev;
}

@adamsitnik
Copy link
Member Author

COMPlus_ThreadPool_UnfairSemaphoreSpinLimit=0 disables the spin-waiting there, may be interesting to try

I've tried that. With this change, only 11% of the epoll thread time is spent in EnsureThreadRequested but the difference is spent in futex related methods and the overall RPS is not improving.

obraz

@adamsitnik
Copy link
Member Author

adamsitnik commented Mar 20, 2020

Based on the time spent under that method most of the time is spent under RequestWorkerThread, which is the slow path where at least one thread had not found work in the thread pool.

This great insight made me try reducing the number of min and max threads in ThreadPool. When I set the values to <19, 20> for JSON and <15, 16> for Plaintext the problem is gone and RPS is back to normal.

obraz

Without it, there are on average 62 threads in the thread pool (Environment.ProcessorCount returns 28) and as we can see in the histogram below, more than 10 of them are almost never busy:

 + Thread (32551)	   3.6	    19,447	 9999990_5999999999999999999993__	     	  0.131	 21,322.505
 + Thread (32609)	   3.3	    17,951	 9988980_4897899987899999988992__	     	  2.155	 21,335.517
 + Thread (32595)	   2.7	    14,806	 9971_00_5791599987899799978990__	     	  1.061	 21,201.385
 + Thread (32600)	   2.7	    14,585	 9988990_47988989756993___39991__	     	  0.154	 21,201.386
 + Thread (32608)	   2.6	    14,395	 8988980_579789984268989996___0__	     	  0.047	 21,201.386
 + Thread (32606)	   2.6	    14,197	 6879990__69788997779989975___0__	     	  0.091	 21,201.385
 + Thread (32543)	   2.5	    13,691	 8968990_478159897657989995___0__	     	  0.136	 21,201.386
 + Thread (32628)	   2.5	    13,549	 8988990__2088889679995___39992__	     	  2.138	 21,335.518
 + Thread (32635)	   2.5	    13,341	 1_18980_58689999544__599989990__	     	  0.030	 21,201.386
 + Thread (32593)	   2.5	    13,332	 1_11_00_5898869953699999979992__	     	  3.039	 21,345.529
 + Thread (32589)	   2.4	    13,210	 8980_00_579789987669999996___0__	     	  2.026	 21,201.385
 + Thread (32592)	   2.4	    13,141	 8968990__1074___45699897988992__	     	  1.018	 21,334.517
 + Thread (32598)	   2.4	    13,047	 8798990__5815999547994___39992__	     	  0.121	 21,335.518
 + Thread (32588)	   2.3	    12,741	 1_11_00__591599977699799989992__	     	  0.060	 21,333.515
 + Thread (32602)	   2.3	    12,705	 1_18970_56984___23899898979991__	     	  0.012	 21,201.385
 + Thread (32641)	   2.3	    12,401	 8887990_53088999774__58985___2__	     	  0.042	 21,335.518
 + Thread (32549)	   2.3	    12,320	 1_11_00_5892688744799678989992__	     	  0.125	 21,334.516
 + Thread (32596)	   2.3	    12,249	 8881_00_440899984489989987___0__	     	  0.037	 21,198.381
 + Thread (32605)	   2.2	    12,023	 1_18990__10889997669879996___2__	     	  2.031	 21,335.530
 + Thread (32626)	   2.2	    11,949	 8980_00__1088999533__599999891__	     	      0	 21,335.518
 + Thread (32643)	   2.2	    11,914	 8981_00__1088999533__589999972__	     	  1.092	 21,335.518
 + Thread (32638)	   2.2	    11,786	 0_19990_48815999515994___49991__	     	  1.164	 21,201.386
 + Thread (32546)	   2.1	    11,650	 1_11_00_58925999683__599989992__	     	  0.067	 21,335.518
 + Thread (32629)	   2.1	    11,394	 8988990_44011___5779989996___2__	     	  1.101	 21,335.518
 + Thread (32599)	   2.1	    11,198	 1_10_00_54088899883__599979992__	     	  1.122	 21,335.518
 + Thread (32640)	   2.0	    10,988	 7988890__5921___322__598989892__	     	  0.110	 21,335.517
 + Thread (32633)	   2.0	    10,677	 0_11_00_58811___48899899979991__	     	 13.113	 21,201.386
 + Thread (32590)	   1.9	    10,424	 1_10_00_39874___22598899869992__	     	  0.024	 21,335.518
 + Thread (32632)	   1.9	    10,173	 8999990_55077999882__1___2___2__	     	  0.077	 21,335.516
 + Thread (32597)	   1.9	    10,079	 8880_00_48984___457994___39990__	     	  1.008	 21,201.386
 + Thread (32603)	   1.8	    10,044	 1_10_00_48921___22689999889990__	     	  0.096	 21,201.387
 + Thread (32594)	   1.8	    10,016	 9988590_57811___574__59995___2__	     	  3.086	 21,335.518
 + Thread (32634)	   1.8	     9,759	 1_17990__5874___2379988996___2__	     	  2.044	 21,334.517
 + Thread (32610)	   1.8	     9,716	 8981_00__2098999863__1___49990__	     	  4.040	 21,201.385
 + Thread (32604)	   1.8	     9,660	 1_17990__1177999871__1___49992__	     	  0.116	 21,335.518
 + Thread (32636)	   1.8	     9,533	 8999990_44084___583__1___39772__	     	  1.077	 21,335.518
 + Thread (32611)	   1.7	     9,312	 8990_00__5925999512__1___49992__	     	  0.105	 21,335.518
 + Thread (32607)	   1.7	     9,210	 6999890__5915999762__1___2___0__	     	  1.013	 21,201.386
 + Thread (32591)	   1.6	     8,493	 1_28990__1010___211__599879992__	     	  0.164	 21,335.517
 + Thread (32550)	   1.5	     8,416	 8988970_54111___462__1___39981__	     	  1.048	 21,201.386
 + Thread (32637)	   1.5	     8,372	 8988990_54121___578695___2___0__	     	  0.083	 21,200.397
 + Thread (32642)	   1.5	     8,266	 8980_00__2015999437993___3___2__	     	  2.107	 21,335.518
 + Thread (32625)	   1.4	     7,560	 8891_00_44084___583__1___49991__	     	  0.017	 21,201.386
 + Thread (32630)	   1.3	     7,236	 1_19990__1094___233__59996___0__	     	  1.072	 21,201.385
 + Thread (32631)	   1.3	     7,115	 1_16990__5883___347994___1___2__	     	      1	 21,345.540
 + Thread (32587)	   1.3	     7,070	 9997990_58921___312__0___2___2__	     	  0.053	 21,335.517
 + Thread (32639)	   1.3	     6,994	 0_18990_44084___476963___1___2__	     	  0.100	 21,335.517
 + Thread (32601)	   1.1	     5,889	 1_10_00__6925998762__1___2___2__	     	  0.007	 21,335.519
 + Thread (32627)	   1.0	     5,664	 1_11_00__5675___588993___1___0__	     	  0.159	 21,201.386
 + Thread (32415)	   0.0	         2	 ___________________________o____	     	19,740.961	 19,996.211
 + Thread (32389)	   0.0	         1	 __________o_____________________	     	7,602.992	  7,603.992
 + Thread (32392)	   0.0	         1	 __________o_____________________	     	7,604.994	  7,605.994
 + Thread (32411)	   0.0	         1	 __________o_____________________	     	7,604.995	  7,605.995
 + Thread (32414)	   0.0	         1	 __________o_____________________	     	7,615.004	  7,616.004
 + Thread (32396)	   0.0	         1	 __________o_____________________	     	7,615.004	  7,616.004
 + Thread (32402)	   0.0	         1	 __________o_____________________	     	7,615.004	  7,616.004
 + Thread (32410)	   0.0	         1	 __________o_____________________	     	7,615.004	  7,616.004
 + Thread (32401)	   0.0	         1	 __________o_____________________	     	7,615.004	  7,616.004
 + Thread (32400)	   0.0	         1	 __________o_____________________	     	7,615.005	  7,616.005
 + Thread (32394)	   0.0	         1	 __________o_____________________	     	7,615.005	  7,616.005
 + Thread (32544)	   0.0	         1	 _______________________o________	     	17,020.276	 17,021.276
 + Thread (348)  	   0.0	         1	 _____________________________o__	     	21,389.574	 21,390.574

@kouvel what conditions need to be meet by ThreadPool to allocate new thread?

@kouvel
Copy link
Member

kouvel commented Mar 20, 2020

Hill climbing probably is the main cause of the additional threads. It monitors throughput of work items and occasionally tries a higher or lower thread count (with proc count threads as minimum by default). If it sees a correlation between increasing thread count and increasing throughput, then it settles on a higher thread count and tries again from there.

Hill climbing takes CPU utilization of the process into account but the current thresholds are pretty high (does not increase thread count if process CPU utilization is >= 95% over ~500 ms). The 13 threads that are barely used are probably from spikes in thread count changes from hill climbing, given how little they are used they wouldn't be contributing much to throughput in either direction.

Could you try with COMPlus_HillClimbing_Disable=1 to see what would happen just from disabling hill climbing without changing the default thread limits?

From a brief look, hill climbing may not be taking into account active requests for threads. In bursty cases like this, work item throughput would drop frequently just because the thread pool is out of work, and hill climbing may be reacting to this by increasing thread count. There are thread count change events (ThreadPoolWorkerThreadAdjustment/Adjustment, Sample, Stats) that would provide more info about when thread changes are occurring and throughput samples that hill climbing sees, which could provide more info.

Also may help to allow hill climbing to go below proc count threads by default, but from experiments so far it doesn't look like it works very well for that, needs more investigation.

COMPlus_ThreadPool_UnfairSemaphoreSpinLimit=0
With this change, only 11% of the epoll thread time is spent in EnsureThreadRequested but the difference is spent in futex related methods and the overall RPS is not improving.

Still looks like it helps to decrease time in that method. The PAL's full wait-for-semaphore is a bit slow and an earlier sleep could make the spin-waiting less intrusive without taking the full expense of waiting on the semaphore. I'll do some experiments.

@adamsitnik
Copy link
Member Author

adamsitnik commented Mar 20, 2020

Could you try with COMPlus_HillClimbing_Disable=1 to see what would happen just from disabling hill climbing without changing the default thread limits?

With this setting we have 33 threads in ThreadPool, however it's still "too many" and the RPS is still not good enough

Thread (7133)  3.6	17,678	  9999__499999999999999999999992__	      0.132	 19,480.384
Thread (7195)  3.5	17,339	  99990_499999999999899999998993__	      0.141	 19,492.398
Thread (7186)  3.5	17,306	  99990_499999899999999999999993__	      0.101	 19,492.396
Thread (7197)  3.5	17,271	  99980_499999999999999999999993__	      0.113	 19,492.395
Thread (7190)  3.5	17,261	  99990_489999989998999999999993__	      0.086	 19,492.395
Thread (7194)  3.5	17,238	  99890_499999998999899999999991__	      0.022	 19,492.544
Thread (7200)  3.5	17,234	  99990_399999999999999899999993__	      0.015	 19,492.395
Thread (7206)  3.5	17,222	  89990_499979999999999999999993__	      0.028	 19,491.394
Thread (7204)  3.5	17,202	  99980_499999999999999989989993__	      0.092	 19,492.396
Thread (7196)  3.5	17,190	  99990_489999999999999999989993__	      0.078	 19,492.396
Thread (7199)  3.5	17,155	  98990_499999999989999999899993__	      0.118	 19,492.395
Thread (7187)  3.5	17,148	  98890_499998899999999999999993__	          0	 19,492.395
Thread (7132)  3.5	17,103	  99990_399999999998999989889993__	      0.007	 19,492.397
Thread (7127)  3.5	17,099	  99890_398989999999999999999993__	      0.071	 19,492.396
Thread (7205)  3.5	17,093	  97990_499999998999998999999993__	      0.049	 19,492.396
Thread (7202)  3.5	17,074	  99990_499999789999999988998992__	      0.106	 19,491.394
Thread (7191)  3.5	17,072	  99980_499799999999999999998991__	      0.096	 19,492.395
Thread (7129)  3.4	17,059	  99990_499989999999978999998993__	      0.060	 19,502.406
Thread (7207)  3.4	17,036	  98990_499999999999999989999793__	      0.081	 19,492.395
Thread (7193)  3.4	17,035	  99980_489999899999999999899993__	      0.128	 19,492.395
Thread (7184)  3.4	16,998	  99990_499999887999999899999993__	      0.033	 19,491.543
Thread (7188)  3.4	16,996	  89970_399999999999999999999983__	      0.123	 19,492.398
Thread (7192)  3.4	16,918	  99990_388799999999889999999993__	      0.044	 19,492.394
Thread (7185)  3.4	16,866	  89990_399999999869999988999993__	      2.115	 19,492.395
Thread (7131)  3.4	16,792	  99890_499998999879899889999993__	      0.054	 19,492.396
Thread (7198)  3.4	16,783	  99990_499999989999998997989973__	      2.029	 19,492.397
Thread (7189)  3.4	16,581	  89990_498999989798998899789973__	      0.038	 19,492.739
Thread (7183)  3.3	16,455	  99990_298799999999899788999693__	      0.137	 19,492.396
Thread (7203)  3.3	16,358	  78990_499997999799999999788773__	      0.066	 19,491.394
Thread (7401)  0.0	     4	  _____________________oo_________	 14,094.079	 15,238.205
Thread (7128)  0.0	     2	  ___o__________________o_________	  2,653.745	 14,656.637
Thread (6972)  0.0	     1	  _______o________________________	  4,679.806	  4,680.806
Thread (6979)  0.0	     1	  _______o________________________	  4,863.988	  4,864.988

@kouvel
Copy link
Member

kouvel commented Mar 21, 2020

How much effect does it have on RPS? It sounds like starving the epoll thread might be a reason why more are necessary currently. It totally makes sense that fewer threads would work better in this system, we'd have to figure out how to determine that ideal number.

@kouvel
Copy link
Member

kouvel commented Mar 21, 2020

There may be better solutions. Ideally in the current system I would like to see that one epoll thread is effective enough for perf (or very few on very large systems). The indication that it doesn't seem to be enough in some cases seems to point to other issues, I don't think they would be unsolvable, but more more investigation may be necessary to determine what those causes are and how best to resolve them. There may also be very different long-term alternatives, but those are beyond this scope.

@adamsitnik
Copy link
Member Author

adamsitnik commented Apr 6, 2020

@kouvel as agreed offline, I've prepared a repro.

First of all, we need a modified version of System.Net.Sockets.dll that allows configuring the minimum number of opened socket connections to allocate a new epoll thread:

https://github.com/adamsitnik/runtime/blob/636cc62615c79d33a67a301d551e093498d1f97b/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs#L108

This env var must be set to some value using the BenchmarkDriver console line argument. Example: --env "MinHandles=32"

The second thing is a modified JSON Platform benchmark that allows for setting the maximum numer of threads in Thread Pool:

https://github.com/adamsitnik/runtime/blob/636cc62615c79d33a67a301d551e093498d1f97b/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs#L108

This is configurable via optional maxThreadCount env var. Example: --env "maxThreadCount=24"

I've created a new branch of my fork (with a copy of modified System.Net.Sockets.dll) to make it easier to run for you. The default settings:

git clone https://github.com/adamsitnik/Benchmarks-1.git repro
cd repro
git checkout threadPoolRepro
cd src\BenchmarksDriver
dotnet run -- --jobs ..\BenchmarksApps\Kestrel\PlatformBenchmarks\benchmarks.json.json --scenario "JsonPlatform"   --server "$secret1" --client "$secret2" --display-output --collect-trace --output-file ".\System.Net.Sockets.dll" --env "MinHandles=32"

With the default settings from above (256 connections, create epoll thread for every 32 connections, don't touch ThreadPool) I am getting an RPS of 1,119,517

The best config that I was able to find was:

--env "MinHandles=120" --env "maxThreadCount=24"

Which gives a very impressive 1,249,282 RPS and puts us very close to our goal ;)

I am going to send you an email with both trace files and the secret names of the machines.

@adamsitnik
Copy link
Member Author

Edit: important note: the machine has 28 cores (14 physical)

@kouvel
Copy link
Member

kouvel commented Apr 9, 2020

Thanks Adam. I found the threadPoolRepro branch in a different repo (https://github.com/adamsitnik/Benchmarks-1.git). I think you must have used 512 connections for those machines because with 256 connections I see much lower numbers and ~16% regression in throughput with the best config you mentioned compared to default, I'm getting similar numbers to what you mentioned for 512 connections so I'm able to repro what you're seeing now.

@kouvel
Copy link
Member

kouvel commented Apr 9, 2020

@adamsitnik, would you be able to summarize the changes in your threadPoolRepro branch, which seems to include a bunch of changes to the platform benchmarks? Because I'm seeing some rather large improvements and regressions with those changes depending on config, compared to without the changes with the same configs.

@adamsitnik
Copy link
Member Author

I found the threadPoolRepro branch in a different repo

I have no idea how I could provide a link to a wrong repo... Apologies for that!
I am glad that you got it working!

would you be able to summarize the changes in your threadPoolRepro branch

To tell the long story short, I am planning to create a very simple kestrel transport that has fewer features but is much faster due to lack of the extra overhead.

I've just sent a PR with hopefully a final version of it: aspnet/Benchmarks#1480

I am going to forward you an email with a conversation about the overhead that I am talking about

@karelz
Copy link
Member

karelz commented May 7, 2020

@adamsitnik wasn't your change reverted?

@adamsitnik
Copy link
Member Author

wasn't your change reverted?

@karelz it was, but now we want to get it back in #35800 and understanding the reason behind this regression was required to do that

@adamsitnik
Copy link
Member Author

having said that, I am going to close this issue when #35800 gets merged

@karelz
Copy link
Member

karelz commented May 8, 2020

And #35800 was merged, so closing ... I hope you didn't want to wait for more confirmation from official perf lab runs ...

@karelz karelz closed this as completed May 8, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Sockets os-linux Linux OS (any supported distro) tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

7 participants