Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for streaming to a MySQL parameter from System.IO.Stream? #943

Closed
kendallb opened this issue Feb 13, 2021 · 49 comments
Closed

Support for streaming to a MySQL parameter from System.IO.Stream? #943

kendallb opened this issue Feb 13, 2021 · 49 comments

Comments

@kendallb
Copy link

Working on porting Rebus to MySqlConnector, and there is support for streaming from a source directly into the database via a database parameter, but this appears to not be supported with MySqlConnector? is this a limitation of MySqlConnector, or a limitation of the MySQL wire protocol itself? Streaming the result set works, I just cannot find a way to stream data on insert?

System.NotSupportedException : Parameter type MemoryStream (DbType: Binary) not currently supported. Value: System.IO.MemoryStream

I am using it like this (stolen directly from the SQL Server version):

command.Parameters.Add("data", MySqlDbType.VarBinary, MathUtil.GetNextPowerOfTwo((int)source.Length)).Value = source;
@kendallb
Copy link
Author

I am getting it working by simply turning it into a byte array and submitting that, but that is not particularly ideal if the data is large.

@bgrainger
Copy link
Member

is this a limitation of MySqlConnector, or a limitation of the MySQL wire protocol itself?

A little bit of both. 😀

MySqlConnector doesn't have support for it, hence the exception.

It could certainly be added (as a convenience method for this use case). However, the MySQL text protocol requires the entire INSERT SQL statement to be sent as a length-prefixed packet. Thus, MySqlConnector could have to convert the entire Stream to an in-memory blob; it couldn't really be streamed. (Various reserved bytes in the Stream have to be escaped, so the length can't be known ahead of time just by checking Stream.Length. MySqlConnector could read the Stream twice, once to compute the length, and once to send it, but this might be inefficient or just not even supported for some forward-only Streams.)

Still, this is probably worth adding, even if just for API compatibility with other ADO.NET providers, even if it doesn't support actual streaming, at least initially.

@kendallb
Copy link
Author

Right, it would be nice to support it just to make it more compatible with other ADO.net connectors like SQL Server, even if under the hood it just does what I did, and convert the stream into a byte array. That does mean it would crash out if someone tries to insert a massive blob, but most folks don't have the max packet size for MySQL set to a massive number anyway (in our case it's 16MB for production) so unless the MySQL wire protocol supported proper streaming, it would still be limited to whatever that size is. Clearly SQL Server supports massive blobs as the Rebus unit test for the code in question inserts a 100MB file, which naturally crapped itself on my test system with a 16MB max packet size :)

At the end of the day folks shouldn't be inserting massive blobs into MySQL anyway, as it's the wrong tool for the job. if you need to store massive blobs, they should be stored in a storage bucket somewhere (Amazon S3, Google etc) and a link to the resource stored in MySQL, so I don't think it's a big limitation.

But, it would be nice when porting code like this for it to just 'work', at least until you actually try to insert something massive :).

@bgrainger
Copy link
Member

That does mean it would crash out if someone tries to insert a massive blob, but most folks don't have the max packet size for MySQL set to a massive number anyway (in our case it's 16MB for production)

That's a good point: usually one is quite limited in the maximum blob size they can insert, due to this limit.

@bgrainger
Copy link
Member

Updated ADO.NET tests; it appears that SqlClient is the only other library that supports using a Stream as a parameter value.

Npgsql has decided not to support it (and throws a NotSupportedException): npgsql/npgsql#550

Most other libraries "succeed" but write "System.IO.MemoryStream" to the database column, introducing silent data corruption.

@bgrainger bgrainger self-assigned this Feb 14, 2021
@kendallb
Copy link
Author

Yeah it's 6 of one, half a dozen of another. When you get a nice exception at least you know stuff is not working and can fix it. In some ways I suppose having the developer specifically write the code like I did to turn it into a byte array means you are then fully aware of the consequences (memory usage). That might be better than having someone think they can stream a 1G file into a database blob only to find it runs out of memory :)

@bgrainger
Copy link
Member

I'm adding support for MemoryStream. If TryGetBuffer succeeds, that will be written directly to the packet (which should be efficient). Otherwise, ToArray() will be called, which will duplicate all the data before copying it (inefficient).

A different approach (e.g., that read repeatedly into a 64KiB buffer) could be implemented, which would work for all Stream types. That could replace the ToArray() code path in the future if that were a pressing need.

(A concern, as discussed above, with supporting any arbitrary Stream is that it might make people think they can stream a very large binary file to MySQL Server, but that's not actually possible.)

@kendallb
Copy link
Author

Ok, I think supporting any stream should be implemented to make it really useful. A memory stream itself is not all that useful as all the data will already be in memory in that case so not sure it buys you much other than avoiding the second copy. A lot of the time it’s likely to be a real stream like a file stream or network stream, so the buffering approach would make the most sense to me?

If the buffering approach is implemented, does that mean it would avoid the max packet size issue? Or will that still be a problem? If it can avoid that then at least it would be possible to stream large blob data in, larger that the max packer size.

@bgrainger bgrainger reopened this Feb 14, 2021
@bgrainger
Copy link
Member

MemoryStream is in 1.3.0-beta.4: https://www.nuget.org/packages/MySqlConnector/1.3.0-beta.4

MySqlBulkCopy is the closest thing to streaming that MySqlConnector has; the right way to support Stream is probably to allow it to be part of a DataTable that can be loaded through MySqlBulkCopy.

@kendallb
Copy link
Author

It’s crazy but I have never had a need for the data table class myself (we use everything via a micro ORM), so it’s not clear to me how you would stream a file off disk for instance, into a data table that can the be streamed to MySQL via the bulk copy class? Unfortunately the SQL Server sample code just shows how to toss data from one table to another using it.

@kendallb
Copy link
Author

Assuming it’s possible to copy the data via the MySQL wire protocol via a 64K buffer and avoid the max packet size issues, I think that’s the best approach. If that is not possible then maybe I need to learn how to use a data table and bulk copy in that way for rebus.

@bgrainger
Copy link
Member

DataTable is just currently the public API for MySqlBulkCopy; it wouldn't be a hard requirement.

@bgrainger
Copy link
Member

Assuming it’s possible to copy the data via the MySQL wire protocol via a 64K buffer and avoid the max packet size issues, I think that’s the best approach.

AFAIK that's only possible by first executing a LOAD DATA command, then streaming the data to the server. MySqlConnector exposes that through MySqlBulkLoader (which would be too low-level for this need) and MySqlBulkCopy (which uses it under the hood).

But an INSERT statement can never be streamed that way, so supporting an arbitrary Stream for MySqlParameter.Value would run into hard max_packet_size limitations quickly.

@kendallb
Copy link
Author

Looks like there has to be a way to support it, as it’s officially supported for the C API?

https://dev.mysql.com/doc/c-api/8.0/en/mysql-stmt-send-long-data.html

@bgrainger
Copy link
Member

bgrainger commented Feb 14, 2021

I don't think so, since from that page:

The max_allowed_packet system variable controls the maximum size of parameter values that can be sent with mysql_stmt_send_long_data().

EDIT: This may just mean the maximum size of each chunk, not the total length?

@kendallb
Copy link
Author

@bgrainger
Copy link
Member

A good point from that SO question:

Still, even if I manage to INSERT, say, 20MB BLOB with max_allowed_packet=4MB how do I SELECT it back? I don't see how I can do it.

It seems pointless to invent a way of inserting arbitrarily large data that can't be retrieved. Additionally, using MySqlBulkCopy to insert a single row would be a rather awkward API.

So perhaps INSERT should just support arbitrary Stream parameters and it's not going to be that big a problem in practice to buffer them in memory before sending them to the server.

@kendallb
Copy link
Author

You can already receive large data as it’s already possible with the connector to read data into a stream.

@bgrainger
Copy link
Member

You can already receive large data as it’s already possible with the connector to read data into a stream.

No, because there's no way for the server to send it without exceeding max_allowed_packet. The Stream is just an abstraction. (And in MySqlConnector, it just creates a read-only MemoryStream on the data that's already in memory.)

@bgrainger
Copy link
Member

people say it works for Java and PHP, so it must be possible?

I can look into implementing mysql_stmt_send_long_data in MySqlConnector. It only works for the "binary" protocol, so you would have to call MySqlCommand.Prepare in order to use it.

@kendallb
Copy link
Author

So this example us pretty pointless then? Never tried it, since I can’t get data into the DB. Lol

https://dev.mysql.com/doc/connector-net/en/connector-net-programming-blob-reading.html

@kendallb
Copy link
Author

Also as much as I think storing large data in MySQL is silly, in the case of rebus for a message transport it quite useful to support data larger than the max packet size because the point of a message queue is it’s all transient data. It does in and out really fast and it makes things so much simpler if it’s all just handled in the message structures and apis and not something you need to resort to something like a shared storage block to support.

The next thing I need to figure out is why the message reading via MySQL is so much slower than the SQL server version. Alas we still use MySQL 5.7 and it does not support ignoring locked row level data like SQL server does, so their simple approach of deleting the row in the transaction and having other readers simply ignore it can’t work. It just results in deadlocks.

@bgrainger
Copy link
Member

The documentation for COM_STMT_SEND_LONG_DATA: https://dev.mysql.com/doc/internals/en/com-stmt-send-long-data.html

MySqlConnector could determine if a prepared command had a Stream supplied for any parameter. If so, it could send the data from the stream in repeated chunks using COM_STMT_SEND_LONG_DATA. Then it could execute the statement, binding the rest of the parameters.

The implementation problem would be that currently, multiple commands are batched together into one packet. (MySQL doesn't allow multiple statements (e.g., INSERT INTO tbl1 VALUES(...); INSERT INTO tbl2 VALUES(...);) to be prepared, but MySqlConnector permits this by breaking the statement into multiple parts, which are buffered in memory and then all sent together. To enable data to be streamed, this buffering would have to be undone. (This isn't impossible; it just makes the implementation slightly more difficult than would be ideal.)

@kendallb
Copy link
Author

Interesting. So the solution would be to not do the statement batching when a stream is involved.

Does MySqlConnector only do the splitting if the statements are prepared? I just found something interesting. As I mentioned above I was looking into how to speed up the performance of Rebus.MySqlConnector compared to SQL Server. The MySqlConnector version was able to insert data into the message queue way faster than SQL does when using the least transport, but SQL blew the doors off MySQL when it came to pulling the data back out. Part of the problem is the lack of row level lock ignoring for select statements in MySQL 5.7, but it's not all of it. I tried changing things somewhat to speed it up, but did not succeed. Here are some results:

SQL Server:

*** Using NORMAL SQL transport ***
Inserted 1000 messages in 0.1 s - that's 6792.9 msg/s
1000 messages received in 0.1 s - that's 10363.0 msg/s

*** Using LEASE-BASED SQL transport ***
Inserted 1000 messages in 3.3 s - that's 305.7 msg/s
1000 messages received in 0.3 s - that's 2944.6 msg/s

MySQLConnector:

*** Using NORMAL SQL transport ***
Inserted 1000 messages in 0.2 s - that's 6537.2 msg/s
1000 messages received in 5.3 s - that's 188.4 msg/s

*** Using LEASE-BASED SQL transport ***
Inserted 1000 messages in 0.2 s - that's 4059.2 msg/s
1000 messages received in 5.3 s - that's 188.7 msg/s

Now what is super interesting, is that I just back ported the same code to run on the Oracle connector, and it was WAY faster than MySqlConnector for the read operations (not as fast as SQL Server, but significantly faster):

Oracle Connector:

*** Using NORMAL SQL transport ***
Inserted 1000 messages in 0.5 s - that's 2061.7 msg/s
1000 messages received in 0.8 s - that's 1295.1 msg/s

*** Using LEASE-BASED SQL transport ***
Inserted 1000 messages in 0.6 s - that's 1757.0 msg/s
1000 messages received in 0.9 s - that's 1090.4 msg/s

So clearly from the above the MySqlConnector version is quite a bit faster than the oracle version, but for receiving it is not. The big difference is this library is written to be async all the way through, hence all of the SQL operations in the library are async calls, but as we know the Oracle connector is not really async at all, it just does sync calls with async semantics. I suspect that is why the inserts are so much faster since it can send a lot more over due to using async, but for the receives, I wonder if either the async stuff is making it slower, or if it's something to do with the statement batching you mentioned above?

The reason I ask is the receive operation is implemented as multiple SQL statements that get all sent together in the same transaction, designed in such a way that we pull out the next message, then update it to mark it as being processed (to avoid the row level locking stuff I mentioned) and then selecting the data out. Very similar to how SQL Server is done, except that with SQL Server it does not mark it as processing, it deletes it and the transaction is kept open until the message is successfully consumer (which will not work without row level lock ignores for select, which is not in 5.7).

https://github.com/kendallb/Rebus.MySqlConnector/blob/master/Rebus.MySql/MySql/Transport/MySqlTransport.cs#L295

SELECT id INTO @id
FROM {tableName} 
WHERE visible < now(6) AND 
	  expiration > now(6) AND
	  processing = 0 
ORDER BY priority DESC, 
		 visible ASC, 
		 id ASC 
LIMIT 1
FOR UPDATE;

SELECT id,
	   headers,
	   body
FROM {tableName}
WHERE id = @id
LIMIT 1;

UPDATE {tableName} 
SET processing = 1 
WHERE id = @id;

SET @id = null";

So either this statement splitting stuff you mentioned is what is slowing it down, or it's that this particular operation does not do so well with async?

@kendallb
Copy link
Author

No, I don't think it's the async stuff. In that test I can easily disable all the async so only one operation runs at a time, and that dramatically slowed down the MySqlConnector version as well as the Oracle one, but the oracle one was much faster still. 9.3 msg/s for MySqlConnector and 626.4 msg/s or Oracle. So quite a bit slower in both cases.

If you are interested in profiling this to see where it's so slow, the test to run is this one:

https://github.com/kendallb/Rebus.MySqlConnector/blob/master/Rebus.MySql.Tests/Transport/TestMySqlTransportReceivePerformance.cs

You need a database user called mysql with the password mysql that has full access to the rebus% databases (and make a schema called rebus2_test.

@bgrainger
Copy link
Member

bgrainger commented Feb 15, 2021

MySqlConnector could determine if a prepared command had a Stream supplied for any parameter. If so, it could send the data from the stream in repeated chunks using COM_STMT_SEND_LONG_DATA.

Well, so much for that theory. If you actually try this, you get the exception Parameter of prepared statement which is set through mysql_send_long_data() is longer than 'max_allowed_packet' bytes.

Sample code to implement this is here: https://github.com/bgrainger/MySqlConnector/tree/send-long-data

The only benefit this provides is being able to insert a row whose individual columns are less than max_allowed_packet bytes, but whose total size would exceed max_allowed_packet bytes (by sending the individual columns separately instead of combined into one packet for the whole row).

@bgrainger
Copy link
Member

If you are interested in profiling this to see where it's so slow

I am. This library should be faster than Oracle's MySQL Connector/NET for all use cases, except that XyzAsync methods do add some overhead when they're actually implemented as async. Thanks for the test case; I will take a look later.

Which version of MySql.Data were you using? 8.0.22/8.0.23 has a severe performance regression when reading rows, so I'd be very surprised if it were the faster library.

@kendallb
Copy link
Author

8.0.23 was the version I was testing against. I just grabbed the latest one when I back ported the code.

@kendallb
Copy link
Author

Whelp, it's async overhead. Damn, that was a lot more than I expected. I had a sneaking suspicion it was, and it was one reason we have never done async for SQL programming because I think the overheads just pile up when the operations you are doing can run so fast. It only makes sense if the MySQL operation is going to take a while.

*** Using NORMAL SQL transport ***
Inserted 1000 messages in 0.3 s - that's 2993.7 msg/s
1000 messages received in 0.7 s - that's 1524.0 msg/s

Note that it's still way slower than SQL Server, but it is faster than the Oracle connector. Note however that the insert speed did drop off to about half the performance, but that's because inserts generally take longer so there is a gain to be had there. But I am leaving it disabled because the other upside of not doing any async for the SQL transport in Rebus, is that you can better guarantee ordering of results on both inserts and reads. But it would be interesting to perhaps do a hybrid, and leave async on for inserts, but have it off for reads.

@kendallb
Copy link
Author

What is interesting though, is the SQL Server code I cloned for Rebus was also pure async, and the SQL Server version does not have these problems. Might be well worth taking a closer look at the performance bottlenecks without MySqlConnector when using async as perhaps that will shed some light on my SQL Server is so much faster here and perhaps it can be optimized so it's just as fast when doing async?

Here is my feature branch with async remove to compare it:

https://github.com/kendallb/Rebus.MySqlConnector/tree/feature/remove-async

@bgrainger
Copy link
Member

Thanks, I'll take a look.

@kendallb
Copy link
Author

Cool. Very curious to see what shakes out. In the short term I might do a version that removes the async for receive and leaves it in for inserts, but either way the SQL Server version blows the doors off both MySQL versions. That could just be a SQL Server vs MySQL performance issue itself (not a connector issue), but there could be other things in there.

It does make me wonder that perhaps the SQL Server connector is tuned in such a way that async is only used where it makes sense and has a performance advantage, and not just all the time?

@bgrainger
Copy link
Member

That could just be a SQL Server vs MySQL performance issue itself

I'm not aware of head-to-head tests of those databases, but the TechEmpower Framework Benchmarks use PostgreSQL and MySQL (SQL Server isn't supported). The top 43 results are all PostgreSQL; the fastest MySQL client comes in at 41% of the speed of the top pg client. So there may be bottlenecks in the server itself.

It does make me wonder that perhaps the SQL Server connector is tuned in such a way that async is only used where it makes sense and has a performance advantage, and not just all the time?

I'm not sure, but usually the point of async is to use async TCP/IP socket I/O (instead of blocking) so that the current thread can be freed up while waiting for a network response. It's not primarily to improve performance. Since (almost) all DB operations involve client/server I/O, it does really mandate using async all the time.

(There are minor exceptions, such as GetFieldValueAsync: MySqlConnector returns a completed task if you call that API because it doesn't need to be async; using that async API does just introduce overhead.)

@kendallb
Copy link
Author

For comparison purposes and to determine what the highest potential performance is, I ran the same test with the in memory and file system transports:

*** Using Filesystem transport ***
Inserted 1000 messages in 4.3 s - that's 235.3 msg/s
1000 messages received in 0.3 s - that's 3900.6 msg/s

*** Using Memory transport ***
Inserted 300000 messages in 4.4 s - that's 68040.3 msg/s
300000 messages received in 2.2 s - that's 136568.7 msg/s

So clearly the memory transport is ridiculously fast, which indicates the bottlenecks are clearly all in the transport layers, and the file system transport is super slow to insert (makes sense) and about twice as fast to retrieve as MySQL without async. It does make me wonder how SQL server is so damn fast, and I suspect the overheads here at MySQL itself. I do not have the query cache enabled in MySQL (for our systems it always just made stuff slower in production), so it's possible that is a big difference between MySQL and SQL Server.

So I suspect the upper bound for performance with MySQL is the non-async version, and the key part is figuring out why using async for receive causes it to be so much slower. Ideally we would leave the async there as well, and just figure out what is making it slow.

Not sure how to profile that myself or I would give it a go.

@kendallb
Copy link
Author

Oh I guess they killed the query cache for good anyway :)

@bgrainger
Copy link
Member

I added a console performance test harness app (that runs just that one test) that can be executed under dotTrace.

Unfortunately, comparing an async run to a sync run is almost impossible, due to the callbacks.

I did update MySqlConnector's benchmarks and while async adds a slight performance overhead, it's nowhere near 2x worse:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-10875H CPU 2.30GHz, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.200-preview.21079.7
  [Host]     : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
  Job-TQEJBX : .NET Framework 4.8 (4.8.4300.0), X64 RyuJIT
  Job-ISFNWE : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT

Method Job Runtime Library Mean Error StdDev StdErr Median Min Q1 Q3 Max Op/s Gen 0 Gen 1 Gen 2 Allocated
OpenFromPoolAsync Job-TQEJBX .NET 4.8 MySql.Data 1,135.6 μs 15.97 μs 13.33 μs 3.70 μs 1,134.1 μs 1,113.9 μs 1,128.1 μs 1,138.8 μs 1,165.1 μs 880.6 - - - 704 B
OpenFromPoolSync Job-TQEJBX .NET 4.8 MySql.Data 1,161.3 μs 23.07 μs 31.58 μs 6.19 μs 1,155.7 μs 1,121.1 μs 1,135.8 μs 1,182.4 μs 1,234.1 μs 861.1 - - - 608 B
ExecuteScalarAsync Job-TQEJBX .NET 4.8 MySql.Data 638.4 μs 12.61 μs 24.89 μs 3.59 μs 632.1 μs 606.3 μs 620.8 μs 647.4 μs 705.4 μs 1,566.5 - - - 3664 B
ExecuteScalarSync Job-TQEJBX .NET 4.8 MySql.Data 921.8 μs 61.81 μs 177.34 μs 18.19 μs 925.6 μs 612.1 μs 812.3 μs 1,017.2 μs 1,348.3 μs 1,084.9 - - - 3584 B
ReadBlobsAsync Job-TQEJBX .NET 4.8 MySql.Data 3,743.6 μs 74.78 μs 128.99 μs 20.92 μs 3,720.4 μs 3,547.2 μs 3,637.0 μs 3,816.1 μs 4,019.3 μs 267.1 46.8750 39.0625 39.0625 229045 B
ReadBlobsSync Job-TQEJBX .NET 4.8 MySql.Data 3,485.3 μs 67.70 μs 101.34 μs 18.50 μs 3,476.3 μs 3,307.2 μs 3,412.3 μs 3,528.3 μs 3,704.7 μs 286.9 46.8750 39.0625 39.0625 229133 B
ManyRowsAsync Job-TQEJBX .NET 4.8 MySql.Data 2,544.7 μs 50.16 μs 97.84 μs 14.27 μs 2,519.2 μs 2,429.0 μs 2,460.2 μs 2,612.4 μs 2,804.7 μs 393.0 35.1563 - - 246330 B
ManyRowsSync Job-TQEJBX .NET 4.8 MySql.Data 2,442.6 μs 42.29 μs 37.49 μs 10.02 μs 2,447.3 μs 2,362.5 μs 2,422.7 μs 2,467.7 μs 2,502.1 μs 409.4 35.1563 - - 246170 B
OpenFromPoolAsync Job-ISFNWE .NET Core 5.0 MySql.Data 1,249.8 μs 24.79 μs 47.17 μs 7.03 μs 1,231.4 μs 1,180.1 μs 1,215.0 μs 1,277.2 μs 1,364.7 μs 800.1 - - - 473 B
OpenFromPoolSync Job-ISFNWE .NET Core 5.0 MySql.Data 1,257.8 μs 21.39 μs 22.89 μs 5.39 μs 1,258.1 μs 1,228.3 μs 1,239.4 μs 1,272.9 μs 1,309.7 μs 795.0 - - - 473 B
ExecuteScalarAsync Job-ISFNWE .NET Core 5.0 MySql.Data 707.0 μs 13.26 μs 12.40 μs 3.20 μs 709.4 μs 672.0 μs 706.7 μs 712.4 μs 722.0 μs 1,414.4 - - - 3208 B
ExecuteScalarSync Job-ISFNWE .NET Core 5.0 MySql.Data 693.2 μs 13.66 μs 16.26 μs 3.55 μs 689.0 μs 669.8 μs 682.2 μs 706.3 μs 727.1 μs 1,442.5 - - - 3136 B
ReadBlobsAsync Job-ISFNWE .NET Core 5.0 MySql.Data 3,577.0 μs 47.62 μs 42.21 μs 11.28 μs 3,581.0 μs 3,513.4 μs 3,545.6 μs 3,599.3 μs 3,650.3 μs 279.6 46.8750 39.0625 39.0625 228566 B
ReadBlobsSync Job-ISFNWE .NET Core 5.0 MySql.Data 3,704.4 μs 73.57 μs 103.14 μs 19.85 μs 3,698.1 μs 3,550.9 μs 3,623.6 μs 3,769.1 μs 3,932.6 μs 269.9 46.8750 39.0625 39.0625 228492 B
ManyRowsAsync Job-ISFNWE .NET Core 5.0 MySql.Data 2,513.5 μs 49.05 μs 74.90 μs 13.45 μs 2,500.3 μs 2,418.8 μs 2,459.0 μs 2,556.7 μs 2,694.3 μs 397.9 15.6250 3.9063 - 149194 B
ManyRowsSync Job-ISFNWE .NET Core 5.0 MySql.Data 2,493.5 μs 47.79 μs 117.24 μs 13.91 μs 2,452.4 μs 2,340.2 μs 2,402.3 μs 2,558.0 μs 2,817.5 μs 401.0 15.6250 3.9063 - 149050 B
OpenFromPoolAsync Job-TQEJBX .NET 4.8 MySqlConnector 691.7 μs 21.26 μs 62.03 μs 6.27 μs 667.8 μs 615.8 μs 638.6 μs 733.7 μs 844.7 μs 1,445.7 - - - 4912 B
OpenFromPoolSync Job-TQEJBX .NET 4.8 MySqlConnector 620.6 μs 12.32 μs 21.90 μs 3.46 μs 614.6 μs 594.4 μs 604.0 μs 631.2 μs 683.6 μs 1,611.3 - - - 640 B
ExecuteScalarAsync Job-TQEJBX .NET 4.8 MySqlConnector 686.2 μs 11.10 μs 8.66 μs 2.50 μs 687.7 μs 672.5 μs 678.6 μs 693.6 μs 697.8 μs 1,457.2 - - - 5808 B
ExecuteScalarSync Job-TQEJBX .NET 4.8 MySqlConnector 660.5 μs 12.33 μs 10.93 μs 2.92 μs 660.9 μs 644.5 μs 652.0 μs 666.2 μs 683.8 μs 1,514.1 - - - 1336 B
ReadBlobsAsync Job-TQEJBX .NET 4.8 MySqlConnector 3,566.9 μs 70.47 μs 58.85 μs 16.32 μs 3,587.6 μs 3,417.8 μs 3,548.6 μs 3,605.3 μs 3,643.4 μs 280.4 50.7813 42.9688 39.0625 246023 B
ReadBlobsSync Job-TQEJBX .NET 4.8 MySqlConnector 3,849.4 μs 138.49 μs 403.99 μs 40.81 μs 3,708.8 μs 3,355.5 μs 3,531.6 μs 4,076.6 μs 4,961.3 μs 259.8 46.8750 39.0625 39.0625 226784 B
ManyRowsAsync Job-TQEJBX .NET 4.8 MySqlConnector 1,660.6 μs 33.46 μs 93.81 μs 9.83 μs 1,656.2 μs 1,492.4 μs 1,591.3 μs 1,708.5 μs 1,922.1 μs 602.2 - - - 6464 B
ManyRowsSync Job-TQEJBX .NET 4.8 MySqlConnector 1,466.5 μs 30.05 μs 88.13 μs 8.86 μs 1,432.5 μs 1,359.2 μs 1,395.9 μs 1,524.0 μs 1,675.9 μs 681.9 - - - 1744 B
OpenFromPoolAsync Job-ISFNWE .NET Core 5.0 MySqlConnector 616.2 μs 11.90 μs 12.73 μs 3.00 μs 612.0 μs 602.2 μs 607.1 μs 625.3 μs 646.4 μs 1,622.9 - - - 2272 B
OpenFromPoolSync Job-ISFNWE .NET Core 5.0 MySqlConnector 595.2 μs 10.80 μs 10.61 μs 2.65 μs 593.7 μs 579.4 μs 586.2 μs 603.6 μs 611.7 μs 1,680.1 - - - 520 B
ExecuteScalarAsync Job-ISFNWE .NET Core 5.0 MySqlConnector 656.9 μs 9.38 μs 7.83 μs 2.17 μs 658.5 μs 643.3 μs 652.0 μs 661.6 μs 671.3 μs 1,522.4 - - - 3216 B
ExecuteScalarSync Job-ISFNWE .NET Core 5.0 MySqlConnector 689.7 μs 13.71 μs 34.14 μs 4.00 μs 678.9 μs 637.7 μs 666.3 μs 709.9 μs 771.2 μs 1,449.8 - - - 1280 B
ReadBlobsAsync Job-ISFNWE .NET Core 5.0 MySqlConnector 3,478.1 μs 50.98 μs 45.19 μs 12.08 μs 3,474.9 μs 3,408.8 μs 3,450.2 μs 3,491.0 μs 3,588.1 μs 287.5 46.8750 39.0625 39.0625 232421 B
ReadBlobsSync Job-ISFNWE .NET Core 5.0 MySqlConnector 3,571.8 μs 67.23 μs 66.03 μs 16.51 μs 3,557.6 μs 3,481.2 μs 3,521.2 μs 3,599.7 μs 3,701.5 μs 280.0 46.8750 39.0625 39.0625 226389 B
ManyRowsAsync Job-ISFNWE .NET Core 5.0 MySqlConnector 1,273.3 μs 21.72 μs 16.96 μs 4.90 μs 1,277.8 μs 1,235.6 μs 1,266.4 μs 1,284.5 μs 1,293.7 μs 785.4 - - - 3745 B
ManyRowsSync Job-ISFNWE .NET Core 5.0 MySqlConnector 1,195.7 μs 23.78 μs 34.85 μs 6.47 μs 1,189.0 μs 1,151.8 μs 1,169.3 μs 1,214.0 μs 1,279.1 μs 836.3 - - - 1681 B

@bgrainger
Copy link
Member

@kendallb
Copy link
Author

Thanks, I was doing some more testing and noticed that the performance difference is way less if you increase the number of messages for whatever reason. Testing with 10000 messages it's 232.2 msg/s vs 188 msg/s so significantly less. So that that point the Async is adding some overhead, but it's not nearly as much (about 20% or so). I think I will change it to not bother doing async for the actual receive part of the transport since that is expected to get in and out as fast as possible.

I am also going to check the query explanation to see if perhaps something is not set up correctly with the way I configured the indexes for the queries in question. I wonder if perhaps an index is not being used when I thought it was...

@kendallb
Copy link
Author

Ok well I didn't have the index set up correctly, so I have improved it slightly so the at least the SQL explain says it is using the index now, but it did not fix much. Clearly the performance difference between async and non-async drops off as the message queue gets bigger and the overhead of the query starts to play a bigger role but it is still slower than SQL server. For now I am leaving async off for the reading code and will play with restructuring the queries a bit.

I think the fundamental difference in why MySQL is slower than SQL server in this instance is the way indexing works with timestamp filtering and sorting. It is one of those areas where MySQL seems to fall down, but I don't see any other way to structure the query to be any faster. Might be interesting to see if it's better on MySQL 8.

@kendallb
Copy link
Author

Ahh, I know what the core problem is. It's the lack of row level lock ignore for MySQL earlier than 8. When we select out the new message row with this query:

SELECT id,
       headers,
       body
FROM {tableName}
WHERE visible < now(6) AND 
      expiration > now(6) AND
      processing = 0 
ORDER BY priority DESC, 
         visible ASC, 
         id ASC 
LIMIT 1
FOR UPDATE";

even though I fixed the indexing on the tables and split the query up, it's only a minor performance improvement. Instead what is happening is that the 20 async tasks that are attempting to read the next message in the queue are all getting transaction locked on the query, because it's not possible for MySQL to issue a row lock for just the next row so I believe it's locking the entire table. So it's in essence, a serialization point and only one thread at a time can actually be snagging the next message from the queue.

The fix for MySQL 8 is to implement SKIP LOCK for the reads:

https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html

which I think will then allow other threads to get in and grab the next message. If that actually worked, then doing a full async read might actually be a performance benefit, but until then may as well leave it out. I have to get MySQL 8 installed and test this theory out.

@kendallb
Copy link
Author

Well MySQL 8 is faster overall, but SKIP LOCK didn't change anything. I am not sure if it's possible with MySQL to avoid a full table lock for that query, and locking is clearly a big part of it.

With MySQL 8 I got some lock exceptions on deletes and when I changed to retry the operation for deadlocks on receive it sped it up quite a bit where before I would simply return an empty message, which means the message queue goes back to sleep. For whatever reason the difference in Async vs non-Async with MySQL 8 is larger, probably in part because it's faster to process that query, but it's not a whole lot.

MySQL 8 Async:
3000 messages received in 10.4 s - that's 288.6 msg/s

MySQL 8 Non-Async:
3000 messages received in 3.7 s - that's 802.6 msg/s

MySQL 5.7 Async:
3000 messages received in 11.1 s - that's 269.5 msg/s

MySQL 5.7 Non-Async:
3000 messages received in 4.3 s - that's 692.6 msg/s

Anyway, at this point I have fixed the issues with MySQL 8 so probably just gonna roll with what I have which is to not use async for retrieving the messages and deal with MySQL being slow.

Still way slower than SQL Server. I think that's just how it works.

@bgrainger
Copy link
Member

Support for MemoryStream was added in 1.3.0.

@kendallb
Copy link
Author

kendallb commented Mar 7, 2021

Awesome thanks!

@2mik
Copy link

2mik commented Sep 27, 2021

Support for MemoryStream was added in 1.3.0.

Is streaming now supported only by MySqlConnector (in memory), or either by MySQL wire protocol itself?

@kendallb
Copy link
Author

It is emulated, not in the wire protocol.

@bgrainger
Copy link
Member

This issue turned into a very wide-ranging discussion, but the original point in #943 (comment) stands: due to the MySQL protocol (length-prefixed packets with a maximum size), it's not possible to stream data when executing a MySqlCommand.

As a workaround, use code similar to the following:

var stream = ...
var bytes = new byte[stream.Length];
stream.ReadExactly(bytes);
command.Parameters.AddWithValue("@data", bytes);

This makes it clear that all data must be buffered in memory before MySqlConnector can process it (as opposed to setting MySqlParameter.Value to a Stream object, which would imply streaming).

@bgrainger bgrainger closed this as not planned Won't fix, can't repro, duplicate, stale Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants