Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async opening of connections in parallel is slow/blocking #601

Open
OQO opened this issue Jun 12, 2020 · 26 comments
Open

Async opening of connections in parallel is slow/blocking #601

OQO opened this issue Jun 12, 2020 · 26 comments
Labels
📈 Performance Issues that are targeted to performance improvements.

Comments

@OQO
Copy link

OQO commented Jun 12, 2020

Describe the bug

Multiple opens of connections are very slow and seem to be bocking. In pooled mode it seems only a very limited amount of new connections is being opened concurrently (in example below it seems only 1 seems to be opened up at a time). In non-pooled mode more connections are opened concurrently (in example below 8 connections seem to be opened at a time).

Issue has big impact when connecting to SQL Azure from location with higher latency connection. Issue does not have measurable impact when connecting to SQL Azure when in the same datacenter due to super fast connection times.

This issue has a great impact on startup time of our servers, since it takes up to several minutes before our web servers can handle all the requests due to the slow opening of the SQL connections.

To reproduce

using Microsoft.Data.SqlClient;
using System;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Threading.Tasks;

namespace ConsoleExperiments
{
    class Program
    {

        static void Main(string[] args)
        {
            TestConnections().Wait();
        }

        static async Task TestConnections()
        {
            string connectionString = "Server=tcp:XXXXXX.database.windows.net,1433;Initial Catalog=des;Persist Security Info=False;User ID=XXXXXXX;Password=XXXXXXX;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;";

            Console.WriteLine("WARM UP");
            await MeasureSingleConnectionAndReuse(connectionString);

            ClearPools();
            await MeasureSingleConnectionAndReuse(connectionString);

            ClearPools();
            await MeasureSingleConnectionAndReuse(connectionString);

            Console.WriteLine("\n\nCONCURRENT POOLED CONNECTIONS");
            ClearPools();
            MeasureParallelConnections(connectionString);

            Console.WriteLine("\n\nCONCURRENT NON-POOLED CONNECTIONS");
            ClearPools();
            MeasureParallelConnections(connectionString + "Pooling=false;");


            Console.WriteLine("\nTesting finished");
            Console.ReadLine();
        }

        private static void ClearPools()
        {
            SqlConnection.ClearAllPools();
            Console.WriteLine("ALL POOLS CLEARED");
        }
        static ConcurrentDictionary<Guid, object> _connectionIDs = new ConcurrentDictionary<Guid, object>();
        private static void MeasureParallelConnections(string connectionString)
        {
            Console.WriteLine("Start delay      OpenAsync time     Connection ID                        ReusedFromPool");
            Stopwatch sw = new Stopwatch();
            sw.Start();
            int numOpens = 100;
            Task[] tasks = new Task[numOpens];
            Stopwatch start = new Stopwatch();
            start.Start();
            for (int i = 0; i < numOpens; i++)
            {
                tasks[i] = MeasureSingleConnection(i, start, connectionString);
            }
            Task.WaitAll(tasks);
            Console.WriteLine($"{sw.Elapsed} {numOpens} connections opened in paralel");
        }

        private static async Task MeasureSingleConnection(int index, Stopwatch start, string connectionString)
        {
            TimeSpan startDelay = start.Elapsed;
            Stopwatch sw = new Stopwatch();
            using (SqlConnection connection = new SqlConnection(connectionString))
            {
                sw.Start();
                await connection.OpenAsync();
                Console.WriteLine($"{startDelay} {sw.Elapsed} {index} {connection.ClientConnectionId} {IsReuse(connection)}");
                await Task.Delay(4000);
            }
        }

        private static async Task MeasureSingleConnectionAndReuse(string connectionString)
        {
            Stopwatch sw = new Stopwatch();
            using (SqlConnection connection = new SqlConnection(connectionString))
            {
                sw.Start();
                await connection.OpenAsync();
                Console.WriteLine($"{sw.Elapsed} {connection.ClientConnectionId} {IsReuse(connection)} Single open time ");
            }
            using (SqlConnection connection = new SqlConnection(connectionString))
            {
                sw.Restart();
                await connection.OpenAsync();
                Console.WriteLine($"{sw.Elapsed} {connection.ClientConnectionId} {IsReuse(connection)} Single open time with one previously opened connection");
            }
        }

        private static bool IsReuse(SqlConnection connection)
        {
            return !_connectionIDs.TryAdd(connection.ClientConnectionId, null);
        }
    }
}

Expected behavior

The code first connects sequentially three times to get a good base line of the OpenAsync times of a new connection and a reused connection from the pool.

Then it opens 100 connections in parallel first in pooled mode, then in non-pooled mode. It keeps every connection open for 4 seconds to simulate a slow query (note that even with lower query times the same problems are observed). Note that number of threads of application stays low the whole time, so this is not a thread pool issue.

In the parallel runs we log the time since the start of the experiment to ensure that delays are not due to slow thread pool scheduling. All these numbers are very low and show that all OpenAsync calls are more or less done at the same time. Note that CPU usage is also extremely low during these tests.

The second displayed time shows the time it took till for OpenAsync to return.

Note that doing the total test of 100 queries in non-pooled mode is on average twice as fast as in pooled mode which is very unexpected.

Even in non-pooled mode looking at the times it seems connections are created in batches (in this case of eight) and block the creation of new connections.

In pooled mode only one new connection to SQL server seems to get opened at a time.

In pooled mode we observe one worker thread at a time working (and mostly waiting/blocking) opening new connections. In pooled mode we observe 8 threads at a time working on new connections (and mostly waiting/blocking). It seems that the code internally for OpenAsync for opening connections seem to be blocking worker threads instead of using async to wait for the network. Note however, that this seems not to be the root cause of the slowness in opening new connections. Otherwise we would see an attempt at using more threads to open connections.

Further technical details

Microsoft.Data.SqlClient version: 2.0.0-preview4.20142.4 (same problem on 1.1.3 and on System.Data.SqlClient)
.NET target: .net Core 3.1 (Same issue on regular .net)
SQL Server version: SQL Azure
Operating system: Windows 10 (But also on Windows Server editions and on Ubuntu)

Additional context
Results obtained from test run:

WARM UP
00:00:01.3203266 3ed47bf5-3135-458c-afbb-7c1d8a4fed9d False Single open time
00:00:00.0021387 3ed47bf5-3135-458c-afbb-7c1d8a4fed9d True Single open time with one previously opened connection
ALL POOLS CLEARED
00:00:00.8309554 e40ebf65-e6ac-4a4f-a4ce-343cd5f8f10c False Single open time
00:00:00.0001948 e40ebf65-e6ac-4a4f-a4ce-343cd5f8f10c True Single open time with one previously opened connection
ALL POOLS CLEARED
00:00:00.8375330 ba33b7fe-ab77-43c3-abb5-8e7627f5d496 False Single open time
00:00:00.0001340 ba33b7fe-ab77-43c3-abb5-8e7627f5d496 True Single open time with one previously opened connection


CONCURRENT POOLED CONNECTIONS
ALL POOLS CLEARED
Start delay      OpenAsync time     Connection ID                        ReusedFromPool
00:00:00.0009171 00:00:00.9431287 0 3f18705a-c52d-4afd-b84f-eb0d6802d41a False
00:00:00.0072154 00:00:01.7782236 1 7385c47a-7372-46ef-b9d0-c4d3078b6e71 False
00:00:00.0076698 00:00:02.5752513 2 d9408f86-55ee-4ac1-94f8-7a63b6aa2392 False
00:00:00.0080704 00:00:03.3927741 3 c1d8e999-c036-4bb4-b4e6-482804b05143 False
00:00:00.0085352 00:00:04.3548366 4 957de7e5-a666-4410-924a-6423fb234cb4 False
00:00:00.0090255 00:00:05.1882539 5 5eeb97b1-608c-4d3f-a5e6-5a11fee50156 False
00:00:00.0094626 00:00:05.1882544 6 3f18705a-c52d-4afd-b84f-eb0d6802d41a True
00:00:00.0098549 00:00:06.1260157 7 94da8202-814b-4dd4-95c6-c0114e28bb33 False
00:00:00.0101717 00:00:06.1256924 8 7385c47a-7372-46ef-b9d0-c4d3078b6e71 True
00:00:00.0104807 00:00:06.9705199 9 d859272d-c6c3-4792-b8ab-fee5b1b6d35d False
00:00:00.0107941 00:00:06.9703312 10 d9408f86-55ee-4ac1-94f8-7a63b6aa2392 True
00:00:00.0111062 00:00:07.9178724 11 9b44a56e-fa2d-4ac2-a625-bcf65871fb06 False
00:00:00.0113950 00:00:07.9175811 12 c1d8e999-c036-4bb4-b4e6-482804b05143 True
00:00:00.0116639 00:00:08.7822981 13 6baa5a1d-3b90-4987-a15e-fce475baf607 False
00:00:00.0119990 00:00:08.7820060 14 957de7e5-a666-4410-924a-6423fb234cb4 True
00:00:00.0123118 00:00:09.6348765 15 815f751b-ba0c-479d-ab8a-327ea3e078b0 False
00:00:00.0125002 00:00:09.6347587 16 3f18705a-c52d-4afd-b84f-eb0d6802d41a True
00:00:00.0127372 00:00:09.6346398 17 5eeb97b1-608c-4d3f-a5e6-5a11fee50156 True
00:00:00.0129625 00:00:10.4972251 18 b08a8c47-5721-48ad-82c1-6191b5c1e588 False
00:00:00.0134389 00:00:10.4968212 20 94da8202-814b-4dd4-95c6-c0114e28bb33 True
00:00:00.0132040 00:00:10.4970294 19 7385c47a-7372-46ef-b9d0-c4d3078b6e71 True
00:00:00.0136632 00:00:11.3346273 21 df93b12f-ee15-4425-aa81-bfd9a2aaba5a False
00:00:00.0139049 00:00:11.3345546 22 d9408f86-55ee-4ac1-94f8-7a63b6aa2392 True
00:00:00.0141698 00:00:11.3345670 23 d859272d-c6c3-4792-b8ab-fee5b1b6d35d True
00:00:00.0144175 00:00:12.1801269 24 012e06ab-1fd1-4611-b428-880e490747b4 False
00:00:00.0146370 00:00:12.1799529 25 9b44a56e-fa2d-4ac2-a625-bcf65871fb06 True
00:00:00.0148358 00:00:12.1799797 26 c1d8e999-c036-4bb4-b4e6-482804b05143 True
00:00:00.0150640 00:00:12.9915829 27 c8b47e2a-124c-4e85-a649-ee43c070029f False
00:00:00.0152143 00:00:12.9914348 28 6baa5a1d-3b90-4987-a15e-fce475baf607 True
00:00:00.0153475 00:00:12.9913237 29 957de7e5-a666-4410-924a-6423fb234cb4 True
00:00:00.0155914 00:00:13.8347511 31 5eeb97b1-608c-4d3f-a5e6-5a11fee50156 True
00:00:00.0157942 00:00:13.8345878 32 3f18705a-c52d-4afd-b84f-eb0d6802d41a True
00:00:00.0159306 00:00:13.8348588 33 815f751b-ba0c-479d-ab8a-327ea3e078b0 True
00:00:00.0154739 00:00:13.8349024 30 740c33a5-85a7-4c97-82dd-f7339679fcfe False
00:00:00.0161629 00:00:14.6559129 34 7419a61b-42c5-40aa-810a-02b770e69378 False
00:00:00.0164252 00:00:14.6557047 36 94da8202-814b-4dd4-95c6-c0114e28bb33 True
00:00:00.0165175 00:00:14.6556305 37 7385c47a-7372-46ef-b9d0-c4d3078b6e71 True
00:00:00.0162897 00:00:14.6558366 35 b08a8c47-5721-48ad-82c1-6191b5c1e588 True
00:00:00.0166906 00:00:15.4744707 38 35e77023-4eec-4421-992e-06f984fb4eda False
00:00:00.0170624 00:00:15.4743611 41 d859272d-c6c3-4792-b8ab-fee5b1b6d35d True
00:00:00.0169470 00:00:15.4743077 40 d9408f86-55ee-4ac1-94f8-7a63b6aa2392 True
00:00:00.0168105 00:00:15.4744503 39 df93b12f-ee15-4425-aa81-bfd9a2aaba5a True
00:00:00.0172026 00:00:16.2950864 42 5e254da9-f6e0-480d-a7a1-113b050c6f27 False
00:00:00.0173236 00:00:16.2949979 43 c1d8e999-c036-4bb4-b4e6-482804b05143 True
00:00:00.0175217 00:00:16.2951648 45 9b44a56e-fa2d-4ac2-a625-bcf65871fb06 True
00:00:00.0174389 00:00:16.2949583 44 012e06ab-1fd1-4611-b428-880e490747b4 True
00:00:00.0176272 00:00:17.1048935 46 ed96c994-ccc9-47fc-84fc-efb6e71147ca False
00:00:00.0177680 00:00:17.1048173 47 6baa5a1d-3b90-4987-a15e-fce475baf607 True
00:00:00.0178830 00:00:17.1047111 48 957de7e5-a666-4410-924a-6423fb234cb4 True
00:00:00.0180121 00:00:17.1046429 49 c8b47e2a-124c-4e85-a649-ee43c070029f True
00:00:00.0181488 00:00:17.9378530 50 e064dc06-71ce-453c-b9a3-0250c7e1be2a False
00:00:00.0183879 00:00:17.9376187 52 3f18705a-c52d-4afd-b84f-eb0d6802d41a True
00:00:00.0182909 00:00:17.9377202 51 815f751b-ba0c-479d-ab8a-327ea3e078b0 True
00:00:00.0184989 00:00:17.9375345 53 740c33a5-85a7-4c97-82dd-f7339679fcfe True
00:00:00.0186764 00:00:17.9374013 54 5eeb97b1-608c-4d3f-a5e6-5a11fee50156 True
00:00:00.0188466 00:00:18.7644839 55 a482b6ee-2c2f-42ad-9d7e-e56846c14d61 False
00:00:00.0190005 00:00:18.7643761 56 b08a8c47-5721-48ad-82c1-6191b5c1e588 True
00:00:00.0191638 00:00:18.7642665 57 7385c47a-7372-46ef-b9d0-c4d3078b6e71 True
00:00:00.0193381 00:00:18.7641476 58 94da8202-814b-4dd4-95c6-c0114e28bb33 True
00:00:00.0195314 00:00:18.7640385 59 7419a61b-42c5-40aa-810a-02b770e69378 True
00:00:00.0196506 00:00:19.5878196 60 8fa4dd6d-9501-4d3d-b7c9-84f20d2e31e0 False
00:00:00.0198747 00:00:19.5876152 61 35e77023-4eec-4421-992e-06f984fb4eda True
00:00:00.0200245 00:00:19.5875333 62 df93b12f-ee15-4425-aa81-bfd9a2aaba5a True
00:00:00.0201936 00:00:19.5873959 63 d9408f86-55ee-4ac1-94f8-7a63b6aa2392 True
00:00:00.0203734 00:00:19.5872999 64 d859272d-c6c3-4792-b8ab-fee5b1b6d35d True
00:00:00.0205397 00:00:20.4272665 65 56b40e9a-6034-4007-b285-23c96fe3293e False
00:00:00.0207900 00:00:20.4270913 66 9b44a56e-fa2d-4ac2-a625-bcf65871fb06 True
00:00:00.0209187 00:00:20.4269767 67 c1d8e999-c036-4bb4-b4e6-482804b05143 True
00:00:00.0210743 00:00:20.4268443 68 012e06ab-1fd1-4611-b428-880e490747b4 True
00:00:00.0211555 00:00:20.4267739 69 5e254da9-f6e0-480d-a7a1-113b050c6f27 True
00:00:00.0212630 00:00:21.2580633 70 b3141e98-6467-4919-ad90-1bb122a4d775 False
00:00:00.0214264 00:00:21.2579739 72 c8b47e2a-124c-4e85-a649-ee43c070029f True
00:00:00.0213442 00:00:21.2580963 71 6baa5a1d-3b90-4987-a15e-fce475baf607 True
00:00:00.0215609 00:00:21.2579599 73 ed96c994-ccc9-47fc-84fc-efb6e71147ca True
00:00:00.0217174 00:00:21.2579228 74 957de7e5-a666-4410-924a-6423fb234cb4 True
00:00:00.0219972 00:00:22.0892099 77 740c33a5-85a7-4c97-82dd-f7339679fcfe True
00:00:00.0222874 00:00:22.0890948 80 5eeb97b1-608c-4d3f-a5e6-5a11fee50156 True
00:00:00.0218242 00:00:22.0893555 75 6b26fd72-0c0f-4d67-866b-e16832d1a020 False
00:00:00.0220810 00:00:22.0891502 78 e064dc06-71ce-453c-b9a3-0250c7e1be2a True
00:00:00.0222042 00:00:22.0890623 79 3f18705a-c52d-4afd-b84f-eb0d6802d41a True
00:00:00.0219162 00:00:22.0892959 76 815f751b-ba0c-479d-ab8a-327ea3e078b0 True
00:00:00.0223719 00:00:22.9019644 81 53dea494-8023-442e-b35b-2886a087512a False
00:00:00.0228857 00:00:22.9017196 86 a482b6ee-2c2f-42ad-9d7e-e56846c14d61 True
00:00:00.0228039 00:00:22.9018486 85 7419a61b-42c5-40aa-810a-02b770e69378 True
00:00:00.0226841 00:00:22.9018725 84 94da8202-814b-4dd4-95c6-c0114e28bb33 True
00:00:00.0225875 00:00:22.9019097 83 7385c47a-7372-46ef-b9d0-c4d3078b6e71 True
00:00:00.0224887 00:00:22.9019135 82 b08a8c47-5721-48ad-82c1-6191b5c1e588 True
00:00:00.0229687 00:00:23.7220373 87 4712191e-622b-427f-9b78-712b3485800a False
00:00:00.0233975 00:00:23.7217751 92 8fa4dd6d-9501-4d3d-b7c9-84f20d2e31e0 True
00:00:00.0230647 00:00:23.7220062 88 df93b12f-ee15-4425-aa81-bfd9a2aaba5a True
00:00:00.0232251 00:00:23.7218647 90 d9408f86-55ee-4ac1-94f8-7a63b6aa2392 True
00:00:00.0231460 00:00:23.7219399 89 d859272d-c6c3-4792-b8ab-fee5b1b6d35d True
00:00:00.0233148 00:00:23.7218031 91 35e77023-4eec-4421-992e-06f984fb4eda True
00:00:00.0234788 00:00:25.0500933 93 a9a0b6a9-73f2-4c91-95af-09fbd3d946f5 False
00:00:00.0236017 00:00:25.0500057 94 012e06ab-1fd1-4611-b428-880e490747b4 True
00:00:00.0237120 00:00:25.0499216 95 9b44a56e-fa2d-4ac2-a625-bcf65871fb06 True
00:00:00.0237925 00:00:25.0498670 96 c1d8e999-c036-4bb4-b4e6-482804b05143 True
00:00:00.0239113 00:00:25.0498248 97 5e254da9-f6e0-480d-a7a1-113b050c6f27 True
00:00:00.0240025 00:00:25.0497718 98 56b40e9a-6034-4007-b285-23c96fe3293e True
00:00:00.0240937 00:00:25.8741867 99 09335432-62e5-4b66-bebd-24bc0d2831d2 False
00:00:29.9078228 100 connections opened in paralel


CONCURRENT NON-POOLED CONNECTIONS
ALL POOLS CLEARED
Start delay      OpenAsync time     Connection ID                        ReusedFromPool
00:00:00.0082260 00:00:00.8791363 1 8731f258-004c-497f-b2a4-25552e26d729 False
00:00:00.0350304 00:00:00.8652540 3 17f15f45-b019-429c-9d3f-8899fe1df403 False
00:00:00.0371347 00:00:00.8631625 5 4e49a0b4-c08b-4a32-ba41-4089b1bead79 False
00:00:00.0360206 00:00:00.8642774 4 e7465ab5-8c7f-4c0e-bcdd-d6013976c179 False
00:00:00.0000371 00:00:00.8998700 0 b5e8c122-4062-4355-82fd-eda3fae4e523 False
00:00:00.0200965 00:00:00.8982651 2 a8e63ae5-3dbe-4069-8a67-c2995a76fc2f False
00:00:00.0571568 00:00:00.8762416 6 271a82a0-9ddc-406f-a2a6-98e13a57b27b False
00:00:00.0582054 00:00:00.8899584 7 b6bfeb52-20e9-48ee-a563-1ff8c8cbaebd False
00:00:00.0684170 00:00:01.6434896 9 0dc9ede3-b4e9-417c-91c3-90b98823c7fa False
00:00:00.0688354 00:00:01.6937596 12 bae994c0-9125-4827-b77a-139d56f11f89 False
00:00:00.0680801 00:00:01.7104839 8 f3f6092c-4819-4c98-b612-c3b36acd9e8b False
00:00:00.0689648 00:00:01.7096002 13 1edda644-fc71-4b37-9199-187d305c2085 False
00:00:00.0691544 00:00:01.7206482 14 a887b006-c62c-4427-9bc0-d4ea12869c27 False
00:00:00.0693677 00:00:01.7278080 15 3beae8d2-1021-4a09-9bfc-84ece5f22f95 False
00:00:00.0685392 00:00:01.7524461 10 e4e1dc62-5da7-42c7-b672-a966f0e3834d False
00:00:00.0686659 00:00:01.8065612 11 72c8444f-b9df-4e0a-b06a-328ddc24c1d6 False
00:00:00.0697225 00:00:02.5167275 17 ee4c4888-269b-406e-8e81-2624b815c17f False
00:00:00.0702812 00:00:02.5473149 20 53c6083b-0211-4b0d-9d9e-906906281f1f False
00:00:00.0695300 00:00:02.5659749 16 abcec0f6-6385-47b9-b6b7-8c3d234b093d False
00:00:00.0704230 00:00:02.5745920 21 cc2c0bcc-bb38-4d84-a6b6-667780e4c7f2 False
00:00:00.0707054 00:00:02.5826280 22 88cb769e-b322-492b-a338-9c6e2039bf85 False
00:00:00.0698824 00:00:02.6065776 18 d7fdb271-d8e7-4a84-947e-ac95a81ff6f3 False
00:00:00.0708796 00:00:02.6286011 23 04c3acd2-51c1-49dd-8f68-414082606868 False
00:00:00.0700641 00:00:02.6421916 19 a6e22387-1ca4-4325-b6bd-1d032a760087 False
00:00:00.0715125 00:00:03.3698606 28 ef539a49-720d-48ba-8e70-6c046f61fd89 False
00:00:00.0716264 00:00:03.3854168 29 b2d50bb7-0aa8-42de-afca-18b5498eda9d False
00:00:00.0710162 00:00:03.4017447 24 80ed7be9-099e-458a-9842-c1136c30db0d False
00:00:00.0718106 00:00:03.4272883 30 e258160b-e0de-45cd-ba77-1b2af817328f False
00:00:00.0719921 00:00:03.4531244 31 ab0fd968-301f-4d97-9482-e9b831048b4d False
00:00:00.0712771 00:00:03.4631496 26 3fb9ac94-71bd-4bfa-9269-baa97eb91ee6 False
00:00:00.0714134 00:00:03.4958048 27 4707197c-bafa-4937-9564-79b568949fc9 False
00:00:00.0711336 00:00:04.0101721 25 e16b12ac-5601-41ee-9c04-2950a252ddc8 False
00:00:00.0728221 00:00:04.2175383 36 0c303f53-1535-41ee-bdfd-d153671c2ee4 False
00:00:00.0721636 00:00:04.2292704 32 f41fd3cc-7c54-4d90-8c16-a375c608c85d False
00:00:00.0730081 00:00:04.2558787 37 366255a8-220e-40f6-9839-5731de3a5c6d False
00:00:00.0731411 00:00:04.2677209 38 c1e20e7d-783c-4792-8287-b93c339f5ef3 False
00:00:00.0724948 00:00:04.2981276 34 ee8f7d85-e807-4eee-bea1-c01d9cecabfd False
00:00:00.0732596 00:00:04.3059342 39 ae67dc4d-93f0-4799-8074-b572ff5501fe False
00:00:00.0726470 00:00:04.3136311 35 e365336b-822d-4fc6-a610-35a26861d71a False
00:00:00.0722916 00:00:05.0003683 33 04899942-88f2-4338-a0fb-cbddc52cdd28 False
00:00:00.0733867 00:00:05.0761585 40 5b1f2a6f-416c-40a7-9f31-3ea4e8b5740c False
00:00:00.0737738 00:00:05.0860370 44 9ae397b5-33da-45bb-b14b-19f561696a09 False
00:00:00.0738788 00:00:05.1092445 45 22989d51-2269-44d8-9bf7-8a9acdb40b84 False
00:00:00.0735618 00:00:05.1281717 42 3f3b39b0-5dc3-4441-b713-20b0fc7d6ffe False
00:00:00.0740926 00:00:05.1407761 47 75329416-2c67-42d7-a277-7a1e18872811 False
00:00:00.0739899 00:00:05.2262708 46 0f7e6536-2b91-483e-a299-59da9f18a676 False
00:00:00.0736598 00:00:05.2556036 43 bb42232d-305c-4715-a6d0-2b205333de6c False
00:00:00.0734775 00:00:05.8277152 41 d8a03135-1c93-4a84-b5b2-6ebf102de91e False
00:00:00.0741782 00:00:05.8981467 48 feed50c7-9557-40c2-8fbd-d0b72628d42d False
00:00:00.0746467 00:00:05.9351707 52 7657291b-8495-4de1-a0df-678e7ffcef51 False
00:00:00.0747517 00:00:05.9628818 53 4238c7e7-18dc-4833-95f7-abbdc425baf3 False
00:00:00.0744323 00:00:05.9667434 50 270c0faa-78c3-45b0-9940-bcc38521ea68 False
00:00:00.0749549 00:00:05.9916126 55 c71185e0-df71-4433-a99b-8b8f2283b5f5 False
00:00:00.0748360 00:00:06.0382909 54 a34dc629-6580-42ec-a03c-6a66e5ffe449 False
00:00:00.0745356 00:00:06.1885920 51 9d1a89d0-c294-4fec-a515-718edb776bc2 False
00:00:00.0742898 00:00:06.6767112 49 6c46ab13-6e7c-4949-99cb-449ecc76c435 False
00:00:00.0750552 00:00:06.7550748 56 5ab576c2-48f3-4e31-a234-977bbb54315d False
00:00:00.0754456 00:00:06.7788956 60 24b4f30b-bae6-4a2d-aeed-9d440f988928 False
00:00:00.0755614 00:00:06.8270143 61 b4047183-4979-43d9-83c0-176e5b538e70 False
00:00:00.0752463 00:00:06.8366775 58 00f80cc6-c9b1-4b8b-a9dc-9db1b1c97210 False
00:00:00.0757391 00:00:06.8500023 63 1310b2bb-3ef1-48b7-8a24-c27b7ee63395 False
00:00:00.0756542 00:00:06.9057249 62 596a137e-df75-44bd-86db-34c999110b92 False
00:00:00.0753376 00:00:07.1438823 59 c875a1f0-9bc2-44b1-9dc5-2ba30469aa2d False
00:00:00.0751607 00:00:07.5257331 57 58356ba6-ea73-452d-aa44-f3dc607d8a6e False
00:00:00.0758308 00:00:07.6171905 64 9346588d-70fc-403c-a420-55d83ecd3cbd False
00:00:00.0762123 00:00:07.6339721 68 41f8d753-e182-4781-942f-b0485955e882 False
00:00:00.0762974 00:00:07.6729452 69 7144364b-7b02-4283-9d94-7704da06114b False
00:00:00.0760340 00:00:07.6812526 66 e47960ce-2aac-4ee8-8875-9540a34cbfb6 False
00:00:00.0764745 00:00:07.6915261 71 f5a7d10d-a935-4638-b9d2-65e653db11a8 False
00:00:00.0763879 00:00:07.7451479 70 d2cebd2b-fc1d-41c5-8a57-3eff6ced30ad False
00:00:00.0761178 00:00:07.9741511 67 fb2c0d36-2b56-4c31-9af6-8e1f04cc89b2 False
00:00:00.0759295 00:00:08.3640216 65 864ded55-c992-4375-a233-a95260fe75c3 False
00:00:00.0765942 00:00:08.4569460 72 debb7cad-6a0b-43c2-8433-b16f43cf50eb False
00:00:00.0769734 00:00:08.4853892 76 0e8365d0-c847-4266-ae52-3980f141f246 False
00:00:00.0767699 00:00:08.5296376 74 7cb528ae-fd19-468a-9cbe-ed41713d674e False
00:00:00.0772640 00:00:08.5375782 79 d3e56ddd-3157-40ff-85a7-e14694a565ee False
00:00:00.0771791 00:00:08.5838424 78 d886230f-d5e0-438e-88de-7f26cfa702bd False
00:00:00.0770874 00:00:08.6423799 77 473a7f6b-0250-41c1-a22e-ff45fff065c8 False
00:00:00.0768533 00:00:08.8495856 75 0b6836f4-72fb-4619-926a-ab34e5fb3f73 False
00:00:00.0766794 00:00:09.2128967 73 ee72ffeb-af13-4846-af89-bdb2cb827cd2 False
00:00:00.0774130 00:00:09.2726382 80 c84168b8-5f11-4b23-a390-6722cda4cac6 False
00:00:00.0777960 00:00:09.3322906 84 8e339bfd-b0b5-4291-b86a-0e3541d2368a False
00:00:00.0780933 00:00:09.3622697 87 3891adf7-4a86-43b5-bc78-e3b99da7b4c5 False
00:00:00.0776200 00:00:09.3807072 82 e43e999e-d88e-48e0-840e-72db230f8fb3 False
00:00:00.0780089 00:00:09.4039901 86 717eff11-d5d6-4f9e-813a-7ce5b3eda339 False
00:00:00.0778938 00:00:09.5096703 85 a0722782-3818-48d2-9c21-39af8f72d057 False
00:00:00.0777114 00:00:09.6978332 83 456a93a6-3cb7-4ba1-91ab-2990499f828d False
00:00:00.0775323 00:00:10.0408988 81 00dd42c3-f5ec-499a-b257-ca1bd4af9e27 False
00:00:00.0781787 00:00:10.1375742 88 d5a595e3-338b-434b-9950-1580b1fa3517 False
00:00:00.0785687 00:00:10.1639699 92 c79dd1d5-bc64-4a95-9eb9-661b2dcc481f False
00:00:00.0787588 00:00:10.2344570 94 17b13eda-16b0-4bea-8fc7-d437b9783093 False
00:00:00.0783596 00:00:10.3108077 90 3629a7f2-5db9-444c-8fd8-3fe4f2d528b0 False
00:00:00.0788632 00:00:10.3322536 95 243ce086-72f5-4232-8cd4-a2464464daf9 False
00:00:00.0786604 00:00:10.3564750 93 09edbf53-f941-432c-a1a3-ef1052c1a7e7 False
00:00:00.0784644 00:00:10.6122603 91 e7acdcfd-6224-4cc4-b106-0ce55486a017 False
00:00:00.0782738 00:00:10.9160100 89 eaaab7ec-48d0-418d-88ab-f9ddf03e62e2 False
00:00:00.0789701 00:00:10.9910167 96 f96aec13-5c90-4be2-a40c-ce3c470b39a0 False
00:00:00.0791587 00:00:11.1634095 98 a2116011-1a36-453b-8269-ccd09469a7ab False
00:00:00.0792433 00:00:11.4738357 99 23bb5ff1-46ab-4091-8d1f-fde2cff4fb78 False
00:00:00.0790698 00:00:11.7702779 97 64883b3c-9ae4-4d9b-99ea-bf4f5b8f0cd4 False
00:00:15.8613486 100 connections opened in paralel

Testing finished

@OQO
Copy link
Author

OQO commented Jun 13, 2020

@cheenamalhotra @OQO FWIW this serial opening of concurrent connections for the same pool, even in case of multithreaded scenario is intentional and by design of the Connection Pool.

From the past owners of SqlClient I understand that the idea was to not overwhelm a SQL server because of a flood of connections from the single Connection Pool which needs many connections as well as many clients trying to connect to a single SQL server when failover is in progress.
The recommendation is to use MinPoolSize in Connection String for connection pool and allow the pool to warm up to the MinPoolSize after the first connection attempt is made. This is something that can be tried. After that the connection pool will try to maintain MinPoolSize count of connections in the pool.

As of whether the behavior should be changed or not, it may be OK to relax the requirement for Azure SQL DBs, but it may be good to keep this behavior for on-prem SQL servers.
There are scenarios where a server fails over and multiple clients with connection pools are trying to login to the same server. In those cases, we may not want the server overwhelmed by a slew of connections coming in.

@saurabh500 Thank you for your comment on issue #408, I replied here to be able to refer to the code above. I do understand your point, there definitely are some valid concerns that warrant rate limiting connections.

But I don't think they fully apply for our case. The main problem for us is that we are opening connections over a high latency connection to SQL server (in this case SQL Azure with a round trip latency of over 100ms between SQL Azure and our web server). So the huge majority of the waiting is done due to the latency of the network connection and not due to any load on the SQL server. In our scenario opening a single connection takes nearly one second (see above).

This means that while our web server has been optimized to start (or restart) in less than 3 seconds, it takes several minutes till it is able to open sufficient connections to serve the users who have connected in the meantime. During this time only roughly a single connection per second is being opened!

Of course SQL Azure is happy to open easily 10-100 times more connections per second, as is quickly proven by connecting to SQL Azure from within the same datacenter.

Maybe the rate limiting should be on the number of connections opened, and should be less strict on the number of connections being in the process of being opened concurrently.

Note that even in non-pooled mode only 8 connections at a time seem to be getting opened so we did not find a good workaround. So in our scenario in non-pooled mode we can only open about 8 connections per second!

@OQO
Copy link
Author

OQO commented Jun 15, 2020

This is another case where a user is asking to create n (where n>30) connections instantly. Unless those are n connections to n different database servers it seems like something of an antipattern. What can a single process be doing where it's capable of feeding 50 queries at once?

@Wraith2 I saw your comment on #408. I answered it here to be able to relate to the example above.

In our application we have hundreds of users on a single server and we are able to feed 10-20 queries at once.

However, for our servers that are in locations with high network latency between SQL Azure and our web servers we need many more connections to keep the same throughput.

Let's assume a network round trip network latency of 2ms in a local data center and a query time of 2ms. So total time from the client perspective is 4ms. Let's now assume a location with a round trip network latency of 78ms. In this case the total time from the client perspective for the query is 80ms. So in order to achieve the same amount of queries executed per second on SQL server we need 20 times as many connections because the huge majority of the time is spent waiting due to network latency.

So in order to get the same throughput in high latency environments one needs a significant larger number of connections between the application server and database server. So where in a low latency environment 10 connections might suffice, one might need 200 connections in a higher latency environment to achieve the same throughput. (Note that these number of open connections do definitely not overwhelm SQL Azure.)

So that is why it would be great: 1. open new connections at a much faster rate. 2. open the connection in a truly async way (no thread blocking).

@JRahnama
Copy link
Contributor

@OQO I was able to reproduce the issue. I took the 4 seconds task delay out and replace it with actual query the result was better after, but as you mentioned the results of comparison between pooled connections and non pooled ones are not as we expect them to be. We will look into the issue and will response back soon.

@JRahnama
Copy link
Contributor

JRahnama commented Jul 8, 2020

@OQO I have a question to understand better,

Why are we clearing the pool each time?
image

clearing the pool forces the driver to create another pool each time and that is time consuming. If we only clear the pool before the non pooled connection as below:
image

this will make a huge difference in re using pooled connection.

Is there any specific reason behind clearing the pool?

@OQO
Copy link
Author

OQO commented Jul 8, 2020

In this test we want to compare two scenarios of opening multiple connections with high latency network at application startup: 1. pooled and 2. non-pooled.

In order to compare these two scenarios we reset the connection pools between the experiments so that we can have a fair comparison. If you would like you could just make separate executables to run the two different scenarios (Results will be the same)

The WARM UP scenario just serves to measure a baseline so that we know how long it takes to set up a single connection. Opening a single connection is done multiple times, since the first time is always slower due to the loading of the required assemblies etc. doing it then twice more shows a stable time so that proves that our measurements are no longer influenced by loading assemblies etc.

Hope this clarifies it!

@JRahnama
Copy link
Contributor

JRahnama commented Jul 8, 2020

@OQO I made some changes and it seems the code runs fine. Can you kindly make below changes and test your application again:

  1. Add a MinPoolSize of 100 to your connection string. If you add any number greater than 100 you need to change
    MaxPoolSize as well. Its default value is 100 and cannot be less than the value of MinPoolSize.
  2. For pooled connection please add Pooling=true to your connection string and Pooling=false to your non-pooled connections
  3. Replace the Task.Delay(4000) with an actual sqlcommand. I believe this is the main reason of the issue. Each call is waiting for four seconds and no pool is created in that period of time. Each connection tries to create a new pool and causes the delay.

Here is the result I got with this run:

WARM UP
ALL POOLS CLEARED
00:00:00.5658719 8d64ca61-e6c0-40bc-bfff-e4065c04441a False Single open time
00:00:00.0002643 8d64ca61-e6c0-40bc-bfff-e4065c04441a True Single open time with one previously opened connection
ALL POOLS CLEARED
00:00:00.2208180 159fd79a-fa22-49d1-9221-ecbaab3aa1e7 False Single open time
00:00:00.0000798 159fd79a-fa22-49d1-9221-ecbaab3aa1e7 True Single open time with one previously opened connection
ALL POOLS CLEARED
00:00:00.2175882 ee30b57f-4d74-44e2-a646-3b91fb907a85 False Single open time
00:00:00.0000638 ee30b57f-4d74-44e2-a646-3b91fb907a85 True Single open time with one previously opened connection


CONCURRENT POOLED CONNECTIONS
ALL POOLS CLEARED
Start delay      OpenAsync time     Connection ID                        ReusedFromPool
00:00:00.0009645 00:00:00.2125541 0 a44afca6-84ee-4739-8257-841eb336eeac False
00:00:00.0017354 00:00:00.2848127 1 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0018036 00:00:00.3237833 2 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0018484 00:00:00.3504944 3 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0018931 00:00:00.3738208 4 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0019455 00:00:00.3961843 5 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0019828 00:00:00.4185117 6 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0020194 00:00:00.4248015 7 970418af-0377-4baa-9e43-8fbb7ec057db False
00:00:00.0020619 00:00:00.4423684 8 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0020953 00:00:00.4667225 9 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0021357 00:00:00.4712881 10 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0021622 00:00:00.4897947 11 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0022053 00:00:00.4959876 12 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0022395 00:00:00.5156536 13 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0022780 00:00:00.5196160 14 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0023019 00:00:00.5383633 15 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0023279 00:00:00.5430932 16 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0023658 00:00:00.5626749 17 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0023884 00:00:00.5665512 18 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0024116 00:00:00.5885240 19 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0024405 00:00:00.5936663 20 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0024678 00:00:00.6114261 21 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0024879 00:00:00.6146811 22 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0025107 00:00:00.6344081 23 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0025318 00:00:00.6348700 24 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0025567 00:00:00.6463720 25 d0132087-2ff8-47d0-a4ea-c9637237352e False
00:00:00.0025823 00:00:00.6579774 26 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0026027 00:00:00.6583110 27 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0026106 00:00:00.6817343 28 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0026181 00:00:00.6817323 29 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0026230 00:00:00.6969113 30 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0026322 00:00:00.7081931 32 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0026276 00:00:00.7081975 31 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0026390 00:00:00.7218787 33 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0026626 00:00:00.7320723 34 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0026719 00:00:00.7323766 35 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0026768 00:00:00.7468168 36 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0026813 00:00:00.7527362 37 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0026898 00:00:00.7527289 38 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0026948 00:00:00.7720170 39 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0026995 00:00:00.7720194 40 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0027043 00:00:00.7769340 41 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0027124 00:00:00.7910900 42 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0027169 00:00:00.7969304 43 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0027212 00:00:00.8008876 44 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0027285 00:00:00.8155257 45 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0027334 00:00:00.8204605 46 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0027379 00:00:00.8252616 47 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0027448 00:00:00.8348439 48 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0027497 00:00:00.8453721 49 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0027543 00:00:00.8456617 50 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0027639 00:00:00.8542243 51 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0027686 00:00:00.8685864 52 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0027730 00:00:00.8896550 53 8d19776e-66f4-4c27-bb2d-1ae783aa2298 False
00:00:00.0027797 00:00:00.8902260 54 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0027844 00:00:00.8902921 55 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0027889 00:00:00.8931239 56 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0027945 00:00:00.9141225 57 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0028015 00:00:00.9181820 58 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0028059 00:00:00.9238160 59 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0028102 00:00:00.9365150 60 8d19776e-66f4-4c27-bb2d-1ae783aa2298 True
00:00:00.0028169 00:00:00.9369906 61 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0028213 00:00:00.9413880 62 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0028257 00:00:00.9464264 63 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0028345 00:00:00.9601610 64 8d19776e-66f4-4c27-bb2d-1ae783aa2298 True
00:00:00.0028392 00:00:00.9603352 65 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0028435 00:00:00.9643761 66 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0028501 00:00:00.9696277 67 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0028547 00:00:00.9835216 68 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0028592 00:00:00.9836422 69 8d19776e-66f4-4c27-bb2d-1ae783aa2298 True
00:00:00.0028636 00:00:00.9877402 70 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0028701 00:00:00.9932701 71 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0028744 00:00:01.0080811 72 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0028790 00:00:01.0080823 73 8d19776e-66f4-4c27-bb2d-1ae783aa2298 True
00:00:00.0028862 00:00:01.0129749 74 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0028908 00:00:01.0129850 75 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0028951 00:00:01.0315222 76 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0029020 00:00:01.0388836 77 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0029066 00:00:01.0389158 78 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0029109 00:00:01.0389803 79 8d19776e-66f4-4c27-bb2d-1ae783aa2298 True
00:00:00.0029177 00:00:01.0549180 80 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0029221 00:00:01.0601843 81 8d19776e-66f4-4c27-bb2d-1ae783aa2298 True
00:00:00.0029264 00:00:01.0601964 82 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0029329 00:00:01.0604757 83 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0029376 00:00:01.0782453 84 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0029421 00:00:01.0840301 85 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0029465 00:00:01.0841778 86 8d19776e-66f4-4c27-bb2d-1ae783aa2298 True
00:00:00.0029532 00:00:01.0842424 87 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0029576 00:00:01.1016888 88 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0029618 00:00:01.1079855 89 8d19776e-66f4-4c27-bb2d-1ae783aa2298 True
00:00:00.0029703 00:00:01.1080267 90 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0029747 00:00:01.1083378 91 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0029792 00:00:01.1098207 92 43aeffaa-e3e7-4043-b447-bb25dc788010 False
00:00:00.0029887 00:00:01.1296268 93 a44afca6-84ee-4739-8257-841eb336eeac True
00:00:00.0029936 00:00:01.1309511 94 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0029980 00:00:01.1309694 95 d0132087-2ff8-47d0-a4ea-c9637237352e True
00:00:00.0030047 00:00:01.1311000 96 8d19776e-66f4-4c27-bb2d-1ae783aa2298 True
00:00:00.0030112 00:00:01.1599673 97 43aeffaa-e3e7-4043-b447-bb25dc788010 True
00:00:00.0030199 00:00:01.1603473 98 970418af-0377-4baa-9e43-8fbb7ec057db True
00:00:00.0030275 00:00:01.1603798 99 d0132087-2ff8-47d0-a4ea-c9637237352e True
**00:00:01.1877216 100 connections opened in paralel**


CONCURRENT NON-POOLED CONNECTIONS
ALL POOLS CLEARED
Start delay      OpenAsync time     Connection ID                        ReusedFromPool
00:00:00.0015674 00:00:00.2354028 5 d92f7623-d11c-4756-8b61-0c711fdf4975 False
00:00:00.0009907 00:00:00.2382715 1 bb7b5599-60ce-4033-8111-1861ff79d672 False
00:00:00.0020465 00:00:00.2372214 7 003ab253-e0e5-43e1-bccf-8c5133363531 False
00:00:00.0013089 00:00:00.2538747 3 e6df8b89-442b-4556-bbaf-0c7ccb1499eb False
00:00:00.0000114 00:00:00.2581702 0 ebc4b086-dafb-40ef-a887-318523adab84 False
00:00:00.0014228 00:00:00.2552560 4 82eabb5f-bd9e-4e9b-8a76-3b570d3bc650 False
00:00:00.0011820 00:00:00.2571401 2 96e79b10-b3e8-4c6c-b553-554f45618091 False
00:00:00.0018006 00:00:00.2593064 6 45306a7c-5b39-43dd-b324-6b3264e23c04 False
00:00:00.0023857 00:00:00.4769152 15 06c3a19e-3b77-48f7-9faa-760b3dbcff63 False
00:00:00.0023181 00:00:00.4769820 9 1a863657-57c9-4e69-a8af-538ae328530f False
00:00:00.0023703 00:00:00.4769531 13 a6a6b455-d98a-4dad-bd19-76485b524545 False
00:00:00.0023606 00:00:00.4940845 12 0507227b-0329-4dc6-a06c-185deabce77d False
00:00:00.0022457 00:00:00.5097288 8 1bb0f6cc-cbad-4efc-9e3f-0e01d16c308f False
00:00:00.0023550 00:00:00.5096994 11 0b02b77e-f986-42b6-8426-e5b9303b53f2 False
00:00:00.0023755 00:00:00.5096468 14 1936a29d-f247-47bb-8035-831a106b3c7b False
00:00:00.0023310 00:00:00.5096935 10 a7f258af-f472-4be0-ade0-d62575f4aae0 False
00:00:00.0024017 00:00:00.7236330 17 24a34a68-101d-4303-ae33-d14b97838c24 False
00:00:00.0024706 00:00:00.7235693 23 d453b0df-1c49-49ef-b161-a5111c4011b4 False
00:00:00.0024483 00:00:00.7305378 21 b51018ac-adfc-4c61-91a6-d916bea8e527 False
00:00:00.0024390 00:00:00.7305450 20 6b423024-a9fb-4f11-bcfb-f569358a0be4 False
00:00:00.0024614 00:00:00.7351283 22 d865acf7-6ebf-4812-86fc-c19dd9d41762 False
00:00:00.0023940 00:00:00.7487760 16 28e4e049-d093-40ba-8da7-ee02725cc422 False
00:00:00.0024251 00:00:00.7702248 19 2a428dba-8ee1-416a-aeaa-70aa026baf0b False
00:00:00.0024160 00:00:00.7702481 18 61bc0db2-56a1-491b-b2cd-cc3ab51a76e7 False
00:00:00.0024939 00:00:00.9610591 25 78c8060c-29cb-4acc-bd3f-b3c547268b10 False
00:00:00.0024792 00:00:00.9974713 24 eb253aa2-fea8-4de7-8814-c2d408eb8c69 False
00:00:00.0025433 00:00:00.9974117 29 5ae87421-2cd9-4fd6-9fba-1be5bf315098 False
00:00:00.0025530 00:00:00.9974126 30 4739909f-6746-43df-8fbe-5b37b5390fa7 False
00:00:00.0025031 00:00:01.0001629 26 ef72c499-e79c-45fe-9f26-5e1d240595c4 False
00:00:00.0025615 00:00:01.0001019 31 45336018-44f6-4504-9199-9f93188d7b93 False
00:00:00.0025252 00:00:01.0001413 28 9de54941-ae39-4feb-ad28-7f4a8999b3de False
00:00:00.0025165 00:00:01.0103940 27 a6ab2921-f1c7-4bfe-aceb-2d799f3137bc False
00:00:00.0045902 00:00:01.1909512 33 79aa6621-c586-4061-8235-dc7ac7ebe6d6 False
00:00:00.0025758 00:00:01.2261350 32 a6165f5f-c104-4e5d-9104-03813f720780 False
00:00:00.0046033 00:00:01.2303122 34 6d9e3125-ffac-4ab5-a354-de331ce9e977 False
00:00:00.0046173 00:00:01.2363577 38 72da110e-56b5-40e4-8afe-07819566ea16 False
00:00:00.0046107 00:00:01.2395185 36 2cc8323c-c257-4fc2-bf40-56369e988bfa False
00:00:00.0046137 00:00:01.2396061 37 6f0ae447-c035-4942-bf9a-38e57aa4b5dc False
00:00:00.0046075 00:00:01.2396379 35 bb667ac4-28c1-4de6-ab41-d0af57e1007b False
00:00:00.0046203 00:00:01.2413661 39 7cc3c4a6-fcf8-4bab-b782-49ab3df03b01 False
00:00:00.0046382 00:00:01.4420154 41 e437847b-3b5d-497c-b507-a7f567974cb4 False
00:00:00.0046452 00:00:01.4873462 43 9c15de44-8527-49e5-aaa9-20025f96a029 False
00:00:00.0046545 00:00:01.5010750 46 786d5cee-c6c3-4973-97b5-19c0fed29ce7 False
00:00:00.0046516 00:00:01.5010777 45 c38c2445-7440-4e91-bb13-70cdd803f097 False
00:00:00.0046578 00:00:01.5203254 47 894a1316-4c6a-4850-9c2f-adf3f0537022 False
00:00:00.0046348 00:00:01.5203843 40 30a9d3a6-8e8d-4581-a642-e3f14381f848 False
00:00:00.0046482 00:00:01.5203790 44 3944eee8-e15b-4e88-a34e-720a051ffeeb False
00:00:00.0046416 00:00:01.5204084 42 4eb494d6-ee6c-42d5-b757-956f4f2ca738 False
00:00:00.0046640 00:00:01.6672600 49 4399663a-b8d8-4f57-a459-7a35b7461058 False
00:00:00.0046761 00:00:01.7286690 53 f2fb7fe6-e48d-447c-8c03-e01d35f984aa False
00:00:00.0046699 00:00:01.7331905 51 8fd6f99f-a940-49aa-bdce-3cdcb06afc9a False
00:00:00.0046799 00:00:01.7425553 54 061b8f50-1982-48a0-85b2-dab312239720 False
00:00:00.0046608 00:00:01.7425760 48 696583b9-32ad-4cc0-b81e-dc05554f9082 False
00:00:00.0046670 00:00:01.7573964 50 7752cef0-e845-40d1-8156-153f606c7e22 False
00:00:00.0046731 00:00:01.7574359 52 9d7f1003-1219-410b-af8a-2819fe17524e False
00:00:00.0046827 00:00:01.7590414 55 9d662432-076c-4bed-9d44-caf64730b69b False
00:00:00.0046890 00:00:01.9044821 57 294bf930-b718-4f1e-9ddd-5fe27c5c1755 False
00:00:00.0046861 00:00:01.9502193 56 61eb9896-0866-49bc-854b-72b00b26ab12 False
00:00:00.0046955 00:00:01.9502205 59 9169839e-798d-4d59-bf1a-90d638284d8a False
00:00:00.0047021 00:00:01.9548800 61 342f55be-db4c-4536-b06b-0312e7e20e42 False
00:00:00.0046987 00:00:01.9880930 60 acbbe8a4-1198-4f0a-b6b1-72495979c0d9 False
00:00:00.0047051 00:00:01.9894790 62 8dde242e-7374-43ff-bc38-78ba0ae81b51 False
00:00:00.0046921 00:00:01.9894926 58 37304eca-cfdb-4632-8a41-19e4ee1e7d1e False
00:00:00.0047084 00:00:02.0105511 63 ef2c4809-e7fc-4519-bef1-df956ed90d5c False
00:00:00.0047145 00:00:02.1298016 65 0ccb1fa8-ff69-4aa0-9f8e-858b966def6a False
00:00:00.0047270 00:00:02.1762375 69 864af172-f2a2-451e-9577-8f7f65abda60 False
00:00:00.0047207 00:00:02.1762405 67 4ca92397-9645-4ded-abdf-d7df0948b993 False
00:00:00.0047115 00:00:02.1781056 64 5ef06898-fc92-4e16-a71b-09098544cb15 False
00:00:00.0047241 00:00:02.2222019 68 5746a3aa-905d-478d-8356-3eaa1434e5c9 False
00:00:00.0047178 00:00:02.2246495 66 cabaefb4-cdfd-4a97-a8cd-25c12ecb54fd False
00:00:00.0047334 00:00:02.2247876 70 5d8234bb-2512-492c-a73b-d671a9c5defe False
00:00:00.0047366 00:00:02.2444297 71 a22fc356-c720-4e19-bbeb-eff4b509c688 False
00:00:00.0047430 00:00:02.3842036 73 66c58acf-3d8b-420e-8be5-890ebfee31eb False
00:00:00.0047395 00:00:02.4170934 72 6226c851-0d7e-46ec-8c70-59b28cd12d84 False
00:00:00.0047492 00:00:02.4170869 75 76ab8982-54e3-47c6-9d8f-3696f7af240b False
00:00:00.0047556 00:00:02.4299743 77 c82e3e18-b172-4ba4-ac52-9d31e976817a False
00:00:00.0047522 00:00:02.4900592 76 b5cea1e3-e446-438d-9814-a4434e568724 False
00:00:00.0047459 00:00:02.4900696 74 da3ca4b5-4ee7-4450-b4e9-bbf9968cba1f False
00:00:00.0047587 00:00:02.4943586 78 c39960cd-9fc8-4e18-a8cc-7bc6e3da27b9 False
00:00:00.0047616 00:00:02.4961259 79 abcc79da-8323-4a61-adae-860045aaee78 False
00:00:00.0047684 00:00:02.6059913 81 2b45005c-c76a-43a3-af8c-2cdf85169388 False
00:00:00.0047654 00:00:02.6463052 80 6120f7f6-c536-472a-b22d-75c7517d3dfb False
00:00:00.0047748 00:00:02.6462959 83 35514aa4-7904-4983-93d0-92a99234320a False
00:00:00.0047812 00:00:02.6500396 85 bdb0ea1e-cf3b-40c5-82ee-086f2d37ecbd False
00:00:00.0047718 00:00:02.7420578 82 373b4495-f496-4e0f-95a5-331c5491826f False
00:00:00.0047843 00:00:02.7420438 86 aec4b195-6bc8-46e0-9803-894d31aafbe1 False
00:00:00.0047878 00:00:02.7424678 87 97678582-f555-49fa-8e8c-f8b7613c4b0b False
00:00:00.0047781 00:00:02.7424993 84 7c4225df-8595-4415-88e3-e31adb2136dc False
00:00:00.0047953 00:00:02.8467297 89 cad5d133-d07d-44f8-871b-1bfdfe1f1d3e False
00:00:00.0048016 00:00:02.9017044 91 71852685-b78f-4164-8163-e6d081761432 False
00:00:00.0047908 00:00:02.9194366 88 a4762e2d-28c5-46f1-a92d-96f1b7493c65 False
00:00:00.0048078 00:00:02.9379883 93 5088e61a-02e7-460a-873a-004be744f13a False
00:00:00.0048108 00:00:02.9667535 94 ef8481e9-cf5b-42b3-9c04-f6556827e8a9 False
00:00:00.0047982 00:00:02.9667635 90 cba61b80-9aea-44d7-901f-0b872fd3d83f False
00:00:00.0048045 00:00:02.9927552 92 4b9ae25c-68d3-4b4e-b83a-cba66dd1120b False
00:00:00.0048137 00:00:02.9927616 95 47958f45-b577-475d-8091-a2f7a397a9f4 False
00:00:00.0048201 00:00:03.0725529 97 84671af9-a0bc-4ef0-b2e9-81db210336d6 False
00:00:00.0048173 00:00:03.1463661 96 173161e1-c3bc-4c10-b5b5-eea77e25d6b2 False
00:00:00.0048284 00:00:03.1463578 99 a8e0b072-1946-41c5-af7a-7ff1e2b1d12f False
00:00:00.0048256 00:00:03.1864995 98 7fe96a38-4865-493d-8534-a77a013c8cc7 False
**00:00:03.2365981 100 connections opened in paralel**

Testing finished

@OQO
Copy link
Author

OQO commented Jul 14, 2020

This is the updated version of the test program based on your feedback. Min and max pool sizes have been set. And a query is executed on the database. The query consists of a wait so that we can reliably test the connection issues. Tests are run with query time between .5 seconds and 3.5 seconds.

Please set lowLatencyConnectionString to a database close to you, and set highLatencyConnectionString to a database far away. In the experiments run below opening a single connection to the close Azure database takes around 200ms, and opening a single new connection to an Azure database further away takes a little over 1000ms.

Note that in your last experiment above you are connecting to a database over a network connection with a low latency. If you are in the USA, try connecting to a database in Asia East for example to check the high latency scenario.

using Microsoft.Data.SqlClient;
using System;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;
using System.Transactions;

namespace ConsoleExperiments
{
    class Program
    {

        static void Main(string[] args)
        {
            TestConnections().Wait();
        }

        static async Task TestConnections()
        {
            // Low latency connection
            string lowLatencyConnectionString = "Server=tcp:XXXX.database.windows.net,1433;Initial Catalog=XXXXs;Persist Security Info=False;User ID=XXXX;Password=XXXX!;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;";
            // High latency connection
            string highLatencyConnectionString = "Server=tcp:XXXX.database.windows.net,1433;Initial Catalog=XXXX;Persist Security Info=False;User ID=XXXXr;Password=XXXX;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;";


            string[] connectionStrings = new string[] { lowLatencyConnectionString, highLatencyConnectionString };

            string connectionType = "LowLatency";
            string csv = "";
            foreach (string connectionString in connectionStrings)
            {
                TimeSpan queryTime = TimeSpan.FromSeconds(0.5);
                for (int i = 0; i < 7; i++)
                {
                    (TimeSpan pooledTime, TimeSpan nonPooledTime) = await SingleRun(connectionString, queryTime);
                    csv += $"{connectionType},{queryTime.TotalSeconds},{pooledTime.TotalSeconds},{nonPooledTime.TotalSeconds}{Environment.NewLine}";
                    queryTime += TimeSpan.FromSeconds(0.5);
                }
                connectionType = "HighLatency";
            }
            File.WriteAllText("ConnectionTest.csv", csv);
            Console.WriteLine("\nTesting finished");
            Console.ReadLine();
        }

        private static async Task<(TimeSpan pooledTime, TimeSpan nonPooledTime)> SingleRun(string connectionString, TimeSpan queryTime)
        {
            connectionString += "Min Pool Size=200;Max Pool Size=500;";
            Console.WriteLine("WARM UP");
            await MeasureSingleConnectionAndReuse(connectionString);
            ClearPools();

            await MeasureSingleConnectionAndReuse(connectionString);
            ClearPools();

            await MeasureSingleConnectionAndReuse(connectionString);
            ClearPools();

            Console.WriteLine("\n\nCONCURRENT POOLED CONNECTIONS");
            TimeSpan pooledTime = MeasureParallelConnections(connectionString + "Pooling=true;", queryTime);
            ClearPools();

            Console.WriteLine("\n\nCONCURRENT NON-POOLED CONNECTIONS");
            TimeSpan nonPooledTime = MeasureParallelConnections(connectionString + "Pooling=false;", queryTime);
            ClearPools();
            return (pooledTime, nonPooledTime);
        }

        private static void ClearPools()
        {
            SqlConnection.ClearAllPools();
            Console.WriteLine("ALL POOLS CLEARED");
        }
        static ConcurrentDictionary<Guid, object> _connectionIDs = new ConcurrentDictionary<Guid, object>();
        private static TimeSpan MeasureParallelConnections(string connectionString, TimeSpan queryTime)
        {
            Console.WriteLine("Start delay      OpenAsync time     Connection ID                        ReusedFromPool");
            Stopwatch sw = new Stopwatch();
            sw.Start();
            int numOpens = 100;
            Task[] tasks = new Task[numOpens];
            Stopwatch start = new Stopwatch();
            start.Start();
            for (int i = 0; i < numOpens; i++)
            {
                tasks[i] = MeasureSingleConnection(i, start, connectionString, queryTime);
            }
            Task.WaitAll(tasks);
            start.Stop();
            Console.WriteLine($"{sw.Elapsed} {numOpens} connections opened in paralel");
            return start.Elapsed;
        }

        private static async Task MeasureSingleConnection(int index, Stopwatch start, string connectionString, TimeSpan queryTime)
        {
            TimeSpan startDelay = start.Elapsed;
            Stopwatch sw = new Stopwatch();
            using (SqlConnection connection = new SqlConnection(connectionString))
            {
                sw.Start();
                await connection.OpenAsync();
                Console.WriteLine($"{startDelay} {sw.Elapsed} {index} {connection.ClientConnectionId} {IsReuse(connection)}");
                //await Task.Delay(4000);
                ExecuteQuery(connection, queryTime);
            }
        }

        private static async Task MeasureSingleConnectionAndReuse(string connectionString)
        {
            Stopwatch sw = new Stopwatch();
            using (SqlConnection connection = new SqlConnection(connectionString))
            {
                sw.Start();
                await connection.OpenAsync();
                Console.WriteLine($"{sw.Elapsed} {connection.ClientConnectionId} {IsReuse(connection)} Single open time ");
            }
            using (SqlConnection connection = new SqlConnection(connectionString))
            {
                sw.Restart();
                await connection.OpenAsync();
                Console.WriteLine($"{sw.Elapsed} {connection.ClientConnectionId} {IsReuse(connection)} Single open time with one previously opened connection");
            }
        }

        private static bool IsReuse(SqlConnection connection)
        {
            return !_connectionIDs.TryAdd(connection.ClientConnectionId, null);
        }

        private static void ExecuteQuery(SqlConnection connection, TimeSpan queryTime)
        {
            SqlCommand command = connection.CreateCommand();
            command.CommandText = $"WAITFOR DELAY '{queryTime:hh\\:mm\\:ss\\:fff}';";
            command.ExecuteNonQuery();
        }
    }
}

@OQO
Copy link
Author

OQO commented Jul 14, 2020

Here are the results of a single run (Please run it multiple times to get smoother curves). (Low latency network connection opening of single connection is around 200ms and High latency network connection opening of single connection is around 1000ms)

It still shows that the non pooled approach is still faster in almost all scenarios. Only opening one connection at a time when in pooled mode when connecting to a database over a high latency connection is shown to be significantly slower than using the non pooled mode!

As a separate second issue note also that in both modes the opening of connections is done by blocking threads, and not by awaiting the network traffic! This needlessly blocks threads instead of awaiting them internally. Let me know if you would like me to file a separate issue for this.

image
image

@JRahnama
Copy link
Contributor

JRahnama commented Jul 14, 2020

@OQO Thanks for the update. Just a quick test recommendation:
By design, when pooling is enabled the best practice is to create one single connection first. That will create the pool and after go for the rest. By doing this you create a pool based on MinPoolSize first and then it goes through all other connections. we call this pool warm up. Can you try this scenario please
Thanks.

@JRahnama
Copy link
Contributor

JRahnama commented Jul 15, 2020

There is another interesting issue in here. This only happens on delays. Right now the server Waits for the queryTime. When I change the query to a buffer, for example

 var buffer = new object[100];
command.CommandText = $"SELECT * FROM <tableeName>";
using SqlDataReader reader = command.ExecuteReader();
            while (reader.Read())
            {
                reader.GetValues(buffer);
            }

code runs again as expected. I will look into the part that a single connection is made and pool is created properly, but a clarifying question, we have to put the delay either in the code task or in the server?

I will take a deeper look tomorrow.

@OQO
Copy link
Author

OQO commented Jul 15, 2020

You can also change the query time to 50 milliseconds or similar value. This will lead to the same results as the query you are putting in instead of waiting. I prefer to put wait times in so that we get more reliable query times during testing.

The problem observed in the graphs only occur if we keep the connections busy. If the connections are only busy with queries for very short amount of times, then they can be reused more quickly, and there will be no problems. However, in many real life scenarios (Like in ours)queries take more than 500ms to complete.

@mheskamp
Copy link

mheskamp commented Nov 10, 2021

Thanks to @OQO for raising this issue and the comment

As a separate second issue note also that in both modes the opening of connections is done by blocking threads, and not by awaiting the network traffic! This needlessly blocks threads instead of awaiting them internally. Let me know if you would like me to file a separate issue for this.

We ran into an issue in production when switching from System.Data.SqlClient to Microsoft.Data.SqlClient with upgrading to netcore3.1 where we had some sync over async code that we have not had the time to switch over completely yet (bad I know). The end result was that when a new server came up under load, a whole bunch of threads got blocked waiting on the connections to be created for the pool and then our the threads consumed continued to increase on our servers and never recovered (partly also because our sync over async increased the threads blocked). We are working on async all the way, but even as we're making progress can still see this issue. We used the dotnet counters to look at the sql connections and the active connections ended up pegged at the max and not freed up. The retrieved/released number dropped really low when we saw this problem as well.

Thanks to @JRahnama for the suggestion to warm up the connection pools with the MinPoolSize - that resolved this issue for us. We're now testing out what those numbers should be in our multi-tenant environment and how it affects our SQL servers.

erikbra added a commit to erikbra/grate that referenced this issue Dec 17, 2021
* Turned OFF connection pooling on SQL Server connection string for tests
  - Found several discussions on this, e.g. dotnet/SqlClient#601
  - Tried turning off pooling, and it solved the problem...
* Upgraded SQL server docker image used in test from 2017-latest to
  latest (2019 at the time) - didn't make any difference, just to stay
  current.
erikbra added a commit to erikbra/grate that referenced this issue Dec 17, 2021
* Turned OFF connection pooling on SQL Server connection string for tests
  - Found several discussions on this, e.g. dotnet/SqlClient#601
  - Tried turning off pooling, and it solved the problem...
* Upgraded SQL server docker image used in test from 2017-latest to
  latest (2019 at the time) - didn't make any difference, just to stay
  current.
erikbra added a commit to erikbra/grate that referenced this issue Dec 17, 2021
* Turned OFF connection pooling on SQL Server connection string for tests
  - Found several discussions on this, e.g. dotnet/SqlClient#601
  - Tried turning off pooling, and it solved the problem...
* Upgraded SQL server docker image used in test from 2017-latest to
  latest (2019 at the time) - didn't make any difference, just to stay
  current.
@benrr101
Copy link
Contributor

benrr101 commented Mar 6, 2024

It looks like this bug might be significantly improved by #2384 ?

@OQO
Copy link
Author

OQO commented Mar 7, 2024

@benrr101 It definitely sounds very promising to resolve the blocking part of async opening of a connection, Looking forward to test it once it is available.

Note that the changes in #2384 will not solve the other core issue of the connection pool being too conservative in opening new connections for high latency connection scenarios between client and SQL server.

@saurabh500
Copy link
Contributor

saurabh500 commented May 18, 2024

I am putting some debugging notes here.

I looked into the code for DbConnectionPool for async scenarios. I put a bunch of stopwatches inside the code to understand what is going on

I had set out to get the following answers:

  1. Is SqlClient connectionPool code blocking? i.e. calling connection.OpenAsync() should take no time at all. However await connection.OpenAsync() should take time since it requires TCP connection and then requires pre-login and login to happen.
  2. What is the significant bottleneck inside SqlClient where the time is taken to open the connection esp in async scenarios.

Before I explain my findings so far, here is how the connection establishment part of connection pool works

  1. If a new connection is needed one connection is established at a time, no matter sync or async.

  2. When connection is requested from the pool, we wait on any of the 3 semaphores to be signaled. Creation/Pool/Error. Creation semaphore handle means that there is no available connection in the pool, so the pool goes ahead and create a new one. Pool semaphore means that we have connections in the pool, retrieve and reuse. Error semaphore means that there was an error that was occurred while establishing a connection.

  3. When a sync request arrives, we wait for any of these semaphores to be triggered. Hence multiple sync requests will wait on these semaphores and will get their chance at new connection or reusing a connection. The wait on semaphore is equivalent to the timeout for the connection.

  4. In case of an async connection, we wait for 0 millis on these semaphores, so that the connection.OpenAsync is not a blocking call. If there is an available connection in the pool, the connection is returned immediately, else we create a queue for these requests for connections. A new thread is spun up, which processes the request for the objects in this queue.

  5. The new thread uses the logic for Sync open, by invoking the semaphore wait for the time out period of the connection string and acquire a connection. After processing all the requests, the thread terminates. A new thread is spun up if another batch of requests come in.

Findings

  1. The repro posted here, works quite well when it comes to sync scenarios. however when it comes to async, we observe performance bottlenecks.

  2. In case of the repro above, all the requests are being processed by the same background thread and are drained from the queue.

  3. I then decided to see if there are any bottlenecks while this thread processes all the connections. It turns out that each request is processed by the background thread in the same amount of time, which is taken by sync execution.

After a request is processed, the thread sets the Task completion next.Completion.TrySetResult(connection) and lets the result propagate to the caller.

According to this investigation, I don't see that the connection pool as something blocking, but likely there is a context switch causing the delay in the propagation of this result from the Thread which is opening the connections to the thread which has called await connection.OpenAsync()

After this investigation, I know that the connections are drained very quickly (I tried this with an Azure SQL DB instance on East US with my client in Redmond) to simulate the high latency scenario. The DB is a free tier DB to make sure that we dont get any preferential treatment from the server with resource governance.

image

Ask:

Any suggestion for any debugging techniques to further investigate the context switch? I tried to take perfview traces and got lost, had no idea where to investigate. It was quite easy to get lost.

Are there better techniques in Modern C# to make sure that we can avoid the expense of context switching between threads. I understand that writing the Code as async/await and making sure that we address async scenarios nicely and gracefully, makes sense, but I was wondering if there are cheaper ways of trying to optimize the connection pool in the context of my investigation, before a more expensive engineering exercise is taken on this.

Leaving some references here in code, about the spinning up of the background thread to process the connection requests in async Open

if (_pendingOpensWaiting == 0)
{
Thread waitOpenThread = new Thread(WaitForPendingOpen);
waitOpenThread.IsBackground = true;
waitOpenThread.Start();
}

Few edits:

  1. My attempt with Sync Open() had a bug. I was opening the connection on the same thread which was meant to be scheduled. Hence it was a wrong way of simulating the issue. I ran connection.Open() in a task, and then awaited the task.
  2. With sync also, the connection pool becomes a bottleneck.
  3. To get higher throughput from the pool, we truly need to parallelize the opening of connections, rather waiting on the semaphores to be freed. This applies to both sync and async and has lot more benefits for the async cases because thats where true parallelization benefits will show impact.

@David-Engel
Copy link
Contributor

Ask:

Any suggestion for any debugging techniques to further investigate the context switch? I tried to take perfview traces and got lost, had no idea where to investigate. It was quite easy to get lost.

Are there better techniques in Modern C# to make sure that we can avoid the expense of context switching between threads. I understand that writing the Code as async/await and making sure that we address async scenarios nicely and gracefully, makes sense, but I was wondering if there are cheaper ways of trying to optimize the connection pool in the context of my investigation, before a more expensive engineering exercise is taken on this.

@Wraith2 as the person with the most perf experience here. Any thoughts?

  1. To get higher throughput from the pool, we truly need to parallelize the opening of connections, rather waiting on the semaphores to be freed. This applies to both sync and async and has lot more benefits for the async cases because thats where true parallelization benefits will show impact.

@saurabh500
I agree that we should add parallelization to the pool. We've had customers ask for this who deploy massive servers that can handle just about any amount of parallelization we can throw at them. I've wondered if we just want to "open the floodgates" and not serialize pooled connection creation anymore. Or some balanced or configurable approach in between.

@saurabh500
Copy link
Contributor

saurabh500 commented May 22, 2024

Just a correction, I dont think I need to debug the context switch. Sorry I should have mentioned this earlier.
I realized its the algorithm, and not the implementation that is problematic.

@David-Engel Re: "Open the floodgates"

My take is that we make the connection pool configurable, by default one connection open at a time. But this behavior can be configured by the app developer.
Anyone who needs a higher throughput, opts-in to the parallel opens, rather that everyone getting parallel opens.

I know of both kinds of users, the ones who are happy with the current behavior, and we wouldn't want want any side effects of behavior change on their SQL connectivity.
At the same time, we have users who definitely could benefit from parallelization of connection open, and then would need an opt-in way of opening connections.

I still don't have a point of view on whether this opt-in switch will be a number to say how many connections open in parallel vs a boolean which says, "just open what I am asking for"
Something to ponder upon for me. However thoughts / suggestions are welcome.

@Wraith2
Copy link
Contributor

Wraith2 commented May 22, 2024

@Wraith2 as the person with the most perf experience here. Any thoughts?

  1. To get higher throughput from the pool, we truly need to parallelize the opening of connections, rather waiting on the semaphores to be freed. This applies to both sync and async and has lot more benefits for the async cases because thats where true parallelization benefits will show impact.

I've tried to understand the pooling implementation a few times and always given up. It's very complex and confusing. My general understanding matches the description @saurabh500 gives in #601 (comment) earlier.

Overall I agree that:

  1. maintaining the current behaviour by default is desirable and minimizing any change in observed behaviour is required
  2. providing an opt-in truly parallel connection open capability is desirable for modern high resource workloads

@roji has mentioned a few times in the past that npgsql moved to a lock free pool implementation and had good speed increases. This might be worth investigating.

Engineering a new pool entirely and switching to it with a strategy pattern approach is likely to be simpler than trying to carefully tweak new behaviours onto the existing pool implementation.

@roji
Copy link
Member

roji commented May 22, 2024

@roji has mentioned a few times in the past that npgsql moved to a lock free pool implementation and had good speed increases. This might be worth investigating.

Yeah, we had a custom lock-free implementation which itself wasn't trivial; but we switched to a much simpler approach based on System.IO.Channels, where the "idle connector list" is simply a Channel (aka queue) where renting out a connector is implemented as dequeueing, and returning a connection to the pool is enqueueing. It really isn't very complex and performs well.

Engineering a new pool entirely and switching to it with a strategy pattern approach is likely to be simpler than trying to carefully tweak new behaviours onto the existing pool implementation.

I tend to agree... The only possible problem is various "special" behaviors implemented in the current pool that would need to be carried over (or consciously dropped) - that would require at least some level of understanding of how/what the current pool implementation does...

@Wraith2
Copy link
Contributor

Wraith2 commented May 22, 2024

the only possible problem is various "special" behaviors implemented in the current pool that would need to be carried over (or consciously dropped) - that would require at least some level of understanding of how/what the current pool implementation does...

Or.. a decision and documentation that the new pool strategy does not get configured by the old pool configuration
parameters, then we can do whatever we want without worrying about legacy stuff.

@saurabh500
Copy link
Contributor

@Wraith2 can you chime in with some examples of what is considered "special" behavior?

@saurabh500
Copy link
Contributor

@roji we have been looking into Channels as an option for pools as well (inspired by our conversations).

We will start posting some design thoughts for discussion as we form points of view.

@Wraith2
Copy link
Contributor

Wraith2 commented May 23, 2024

@Wraith2 can you chime in with some examples of what is considered "special" behavior?

Not really, roji mentioned them i was just responding. My pool configuration tends to start and end with size.

@vonzshik
Copy link

Yeah, we had a custom lock-free implementation which itself wasn't trivial; but we switched to a much simpler approach based on System.IO.Channels, where the "idle connector list" is simply a Channel (aka queue) where renting out a connector is implemented as dequeueing, and returning a connection to the pool is enqueueing. It really isn't very complex and performs well.

Channels aren't going to help here because the problem isn't how connections are pooled, but how they're opened.

internal bool TryGetConnection(DbConnection owningObject, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions userOptions, out DbConnectionInternal connection)
{
uint waitForMultipleObjectsTimeout = 0;
bool allowCreate = false;
if (retry == null)
{
waitForMultipleObjectsTimeout = (uint)CreationTimeout;
// Set the wait timeout to INFINITE (-1) if the SQL connection timeout is 0 (== infinite)
if (waitForMultipleObjectsTimeout == 0)
waitForMultipleObjectsTimeout = unchecked((uint)Timeout.Infinite);
allowCreate = true;
}
if (_state != State.Running)
{
SqlClientEventSource.Log.TryPoolerTraceEvent("<prov.DbConnectionPool.GetConnection|RES|CPOOL> {0}, DbConnectionInternal State != Running.", ObjectID);
connection = null;
return true;
}
bool onlyOneCheckConnection = true;
if (TryGetConnection(owningObject, waitForMultipleObjectsTimeout, allowCreate, onlyOneCheckConnection, userOptions, out connection))

The main takeaway here is allowCreate flag which allows to open a physical connection immediately, instead of queuing it on a background thread. And it's set if and only if retry argument is not null. The name retry is very misleading, because it has nothing to do with a retry, instead it's just a way for OpenAsync to pass a TaskCompletionSource so it can receive that physical connection from background thread.

Async open:

TaskCompletionSource<DbConnectionInternal> completion = new TaskCompletionSource<DbConnectionInternal>(transaction);
TaskCompletionSource<object> result = new TaskCompletionSource<object>(state: this);
if (s_diagnosticListener.IsEnabled(SqlClientDiagnosticListenerExtensions.SqlAfterOpenConnection) ||
s_diagnosticListener.IsEnabled(SqlClientDiagnosticListenerExtensions.SqlErrorOpenConnection))
{
result.Task.ContinueWith(
continuationAction: s_openAsyncComplete,
state: operationId, // connection is passed in TaskCompletionSource async state
scheduler: TaskScheduler.Default
);
}
if (cancellationToken.IsCancellationRequested)
{
result.SetCanceled();
return result.Task;
}
bool completed;
try
{
completed = TryOpen(completion);

Sync open:

if (!(IsProviderRetriable ? TryOpenWithRetry(null, overrides) : TryOpen(null, overrides)))

I have no idea why someone would want for synchronous open to open physical connections concurrently, but for async open sequentially, but that's the current behavior.

@roji
Copy link
Member

roji commented May 27, 2024

@roji we have been looking into Channels as an option for pools as well (inspired by our conversations).

You're welcome to take a look at Npgsql's implementation here.

Of course, as @vonzshik points out, the actual pool implementation is only part of this larger puzzle, and the problem specifically tracked by this issue may be outside of the pool per se.

@ajcvickers
Copy link

Nothing much to add here, but great to see this happening!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 Performance Issues that are targeted to performance improvements.
Projects
None yet
Development

No branches or pull requests