Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C#] Added more string read/write methods to the DirectBuffer #729 #845

Merged
merged 9 commits into from
May 25, 2021

Conversation

MFrpurdy
Copy link
Contributor

Address #729

  • Added DirectBuffer methods for reading and writing string that need encoding.
  • Added convenience accessors for the various Encodings defined in the schema.
  • Changed the C# Code generator to create read/write methods using the new DirectBuffer methods.
  • Changed the sample to use the new methods.

The suggestions that the ticket creator had require .net48+ as far as I can see - using the Encoding.GetString() methods. I kept with the current .net45 version and implmented the 'spirit' of of the request.

@mjpt777
Copy link
Contributor

mjpt777 commented May 11, 2021

How is the character encoding accounted for? ASCII vs UTF-16 for example.

@MFrpurdy
Copy link
Contributor Author

How is the character encoding accounted for? ASCII vs UTF-16 for example.

In the generated code the Encoding is resovled and available:

public const string ModelCharacterEncoding = "UTF-8";
public static Encoding ModelResolvedCharacterEncoding = Encoding.GetEncoding(ModelCharacterEncoding);

And used in the generated code like this:
return _buffer.GetStringFromBytes(ModelResolvedCharacterEncoding, limit + sizeOfLengthField, dataLength);

@mjpt777
Copy link
Contributor

mjpt777 commented May 12, 2021

@billsegall Do you have a view on this?

@billsegall
Copy link
Contributor

@mjpt777 It looks ok but I wanted to find the time to write a couple of benchmark tests so we know it actually fixes any performance issues it is addressing.

@MFrpurdy
Copy link
Contributor Author

@mjpt777 It looks ok but I wanted to find the time to write a couple of benchmark tests so we know it actually fixes any performance issues it is addressing.

I'm happy to do some benchmarking and publish the results here.

@billsegall
Copy link
Contributor

MFrpurdy That would be great thankyou

@MFrpurdy
Copy link
Contributor Author

MFrpurdy commented May 19, 2021

CarBenchmark

In order to compare apples to apples I modifed the CarBenchmark to encode from a string to bytes every call and decode to a string.
In the original CarBenchmark the strings are encoded to byte[] once and those same bytes used in each iteration. Likewise the decoding of string values are only done to byte[] not to a string.
In use cases like this, where raw bytes are written and read, the existing byte based methods are much faster.
However if the use case it to take an arbitrary string and write it to the buffer and read convert a byte[] to a string; the new methods allocate less (or zero) and perform faster. I also hope it makes the client code look a little neater.

BenchmarkDotNet=v0.12.1, OS=ubuntu 20.04
Intel Core i7-10710U CPU 1.10GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.202
[Host] : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT
DefaultJob : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT

BEFORE - Modified to Encode and Decode From/To Strings
Encoding example:
private static readonly Encoding VehicleCodeEncoding = Encoding.GetEncoding(Car.VehicleCodeCharacterEncoding);
car.Engine.SetManufacturerCode(ManufacturerCodeEncoding.GetBytes("123"), 0);

Decoding example:
length = car.GetManufacturer(_buffer, 0, _buffer.Length);
var usage = ManufacturerEncoding.GetString(_buffer, 0, length);

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Encode 408.6 ns 3.10 ns 2.75 ns 0.0458 - - 288 B
Decode 379.6 ns 1.10 ns 1.03 ns 0.0467 - - 296 B

AFTER
Using the new string methods.

Encoding example:
car.SetVehicleCode("CODE12");

Decoding example:
var actCode = car.GetActivationCode();

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Encode 271.0 ns 1.74 ns 1.63 ns - - - -
Decode 329.9 ns 0.60 ns 0.50 ns 0.0467 - - 296 B

@billsegall
Copy link
Contributor

billsegall commented May 20, 2021

@MFrpurdy I suspect the answer is to document the performance and leave the choice up to the user. I think the CarExample should change as you suggest as it is simpler.

Could you please add the benchmarking code to the pr. I might have a play.

…hmark which encodes and decodes to/from string; and a version wich uses the new methods to encode and decode strings
@MFrpurdy
Copy link
Contributor Author

@billsegall I've added a modified CarBenchmark file that encodes to and from strings using the original methods. I've also added the same CarBenchamark test but using the new methods.

@mjpt777
Copy link
Contributor

mjpt777 commented May 23, 2021

@billsegall When you are happy with this let me know and I'll merge.

@billsegall
Copy link
Contributor

I think the simplicity alone is worth it and people can always make performance dependent choices to fit their circumstances.

@rca22
Copy link

rca22 commented May 25, 2021

I'd just like to comment that my company has been using a version of Rob's changes, and they were critical in terms of usability.
Some strings that you send, you know in advance and can pre-compute the bytes, and use the previous methods, but there are cases where this isn't practical, in which case the new methods are much simpler. Similarly if you ever want to parse an SBE message and turn it into an intermediate C# representation, you'll generally want to convert bytes to System.String before storing it.

Rob's changes cut down the amount of boilerplate code we would have had to have written to pull these strings out of complex messages - and if users want to be super careful about allocations etc., they still have the option of using the previously available methods. You can see that because the encoding is pre-computed as a static variable on assembly loading, the new methods are a bit quicker compared to a piece of code that has to work that out on the fly from the encoding string.
In our experience, because after you retrieve these values you'll want to compare them to other strings, it would be a major exercise in .Net (read total PITA) to completely avoid string creation, which should only be undertaken if you really need the latency performance - which of course SBE is designed to allow - but which may not be necessary depending on the application.

@mjpt777 mjpt777 merged commit 7690fae into real-logic:master May 25, 2021
mjpt777 added a commit that referenced this pull request May 25, 2021
@rca22
Copy link

rca22 commented May 25, 2021

@mjpt777 when does Real Logic next plan to do a release of the tool, for my information? Thanks for getting this merged.

@mjpt777
Copy link
Contributor

mjpt777 commented May 25, 2021

@rca22 Sometime within the next month.

@rca22
Copy link

rca22 commented Jun 14, 2021

@mjpt777 thanks very much for doing a release of this and the other changes. Would you mind adding a new package to NuGet? This hasn't been done since 1.20.4.

@mjpt777
Copy link
Contributor

mjpt777 commented Jun 14, 2021

@mjpt777 thanks very much for doing a release of this and the other changes. Would you mind adding a new package to NuGet? This hasn't been done since 1.20.4.

@billsegall has been doing the NuGet releases.

@billsegall
Copy link
Contributor

I'll try and freshen a release this week

@billsegall
Copy link
Contributor

billsegall commented Jun 15, 2021

A new release should now be availble at nuget.org

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants