Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The implementation of ToCamelCase can be optimized #230

Open
wmjordan opened this issue Aug 10, 2020 · 22 comments
Open

The implementation of ToCamelCase can be optimized #230

wmjordan opened this issue Aug 10, 2020 · 22 comments

Comments

@wmjordan
Copy link
Contributor

I am studying the source code today and found that ToCamelCase could be optimized.

The current implementation stack allocates a new char array, works on it and create a string from that char array.

You can use String.Copy to copy an instance of a string, and directly pin the string with a fixed statement and work on that copy. This saves one allocation and quite a few char assignments.

@rpgmaker
Copy link
Owner

Thanks for the idea. Would be nice to see a before and after numbers with an example .

@wmjordan
Copy link
Contributor Author

wmjordan commented Aug 10, 2020

The implementation of that function sounds a little bit weird to me.

ToCamelCase("ABC_CD") => "aBCCD"
ToCamelCase("ABC_cD") => "aBCCD"

Why does it remove the underscore and uppercase the following character?

EDIT:
Oh, it seems that it is supposed to deal with the following situation:

ToCamelCase("Abc_De") => "abcDe"
ToCamelCase("Abc de") => "abcDe"

@rpgmaker
Copy link
Owner

Cool. Np

@rpgmaker
Copy link
Owner

rpgmaker commented Apr 4, 2021

Are intending to still create a pull request for this optimization? Thanks

@wmjordan
Copy link
Contributor Author

wmjordan commented Apr 5, 2021

It is too long time ago that I have forgotten this.

@wmjordan
Copy link
Contributor Author

wmjordan commented Apr 5, 2021

Here's my implementation of ToCamelCase. I am not gonna handle the underscores or spaces.

If the name has two or more characters, and the first 2 or 3 characters are uppercase, it preserves the case, for instance "UI", "XML", "HTTP", etc., otherwise, it changes the first character to lowercase.

static string ToCamelCase(string name) {
	return Char.IsLower(name[0]) ? name
		: name.Length == 1 ? name.ToLowerInvariant()
		: name.Length == 2 && Char.IsUpper(name[1]) ? name
		: name.Length > 2 && Char.IsUpper(name[1]) && Char.IsUpper(name[2]) ? name
		: ToLower(String.Copy(name));

	unsafe string ToLower(string t) {
		fixed(char* p = t) {
			*p = Char.ToLower(t[0]);
		}
		return t;
	}
}

@rpgmaker
Copy link
Owner

rpgmaker commented Apr 5, 2021

Thanks for the solution. I can see how it can integrate with space and underscore.

@wmjordan
Copy link
Contributor Author

wmjordan commented Apr 5, 2021

Perhaps we can check whether the string contains underscore or space, if so, jump to the previous slow path.

@wmjordan
Copy link
Contributor Author

wmjordan commented Apr 5, 2021

I just rechecked the source code, and saw the logic of property name writing in WritePropertiesFor. The ToCamelCase is only called in this place.

name = propertyName
If setting useCamelCase
 name = ToCamelCase(name)
else
 just use the name variable

We can furthermore optimize this to pre-calculate the camel-cased string, and store it to the dynamic assembly, then load the const string according to the corresponding setting like the following pseudo-code.

name = setting useCamelCase ? loadPrecalculatedCamelCaseName : propertyName

Since the camel-cased names are calculated only once in the above solution, there is little impact to the performance. Therefore, your existing function and mine in previous post won't make too much difference.

Here is the code to replace the original lines in NetJSON.cs.

var useCamelCaseLabel = il.DefineLabel();
var endOfCaseLabel = il.DefineLabel();

il.Emit(OpCodes.Ldloc, camelCasing);
il.Emit(OpCodes.Brtrue_S, useCamelCaseLabel); // use short jump to save some bytes

il.Emit(OpCodes.Ldstr, name); // use original name
il.Emit(OpCodes.Stloc, nameLocal);
il.Emit(OpCodes.Br_S, endOfCaseLabel);

il.MarkLabel(useCamelCaseLabel);
il.Emit(OpCodes.Ldstr, ToCamelCase(name)); // precalculate the camelCase name
il.Emit(OpCodes.Stloc, nameLocal);

il.MarkLabel(endOfCaseLabel);

@rpgmaker
Copy link
Owner

rpgmaker commented Apr 5, 2021

The only difference will be startup time for the first call there after it will be same. Thanks for the input so far. Could you still make this into a pull request? And could you run a simple test showing the difference in start up vs current?

Thanks,

@wmjordan
Copy link
Contributor Author

wmjordan commented Apr 5, 2021

The only difference will be startup time for the first call there after it will be same.

It is not. The existing code calls ToCamelCase each time when writing any property if the setting enables using camel case. Consequently it consumes more CPU time and generates more garbage for GC to collect.

My previous post turns the camel-cased property names into IL const strings, so the ToCamelCase will only be called when the IL is generated. We can furthermore optimize the code to compare the result of ToCamelCase and the name and eliminate the check of camelCase setting if the names are the same (input name is already camel-cased).

var useCamelCaseLabel = il.DefineLabel();
var endOfCaseLabel = il.DefineLabel();
var camelName = ToCamelCase(name);

if (camelName != name) {
  il.Emit(OpCodes.Ldloc, camelCasing);
  il.Emit(OpCodes.Brtrue_S, useCamelCaseLabel); // use short jump to save some bytes

  il.Emit(OpCodes.Ldstr, name); // use original name
  il.Emit(OpCodes.Stloc, nameLocal);
  il.Emit(OpCodes.Br_S, endOfCaseLabel);

  il.MarkLabel(useCamelCaseLabel);
  il.Emit(OpCodes.Ldstr, camelName); // load the camelCase name from a const string, instead of calling ToCamelCase
  il.Emit(OpCodes.Stloc, nameLocal);

  il.MarkLabel(endOfCaseLabel);
}
else {
  il.Emit(OpCodes.Ldstr, name); // use original name
  il.Emit(OpCodes.Stloc, nameLocal);
}

Could you still make this into a pull request?

Please replace the existing code with the above code. And remove this line. The rest stuff, for instance, the ToCamelCase function can be left there unchanged.

@rpgmaker
Copy link
Owner

rpgmaker commented Apr 5, 2021

It will if cache the result of camel case using either of our implementation was what I referring to.

Without the caching, you are right it will cost more than just the startup.

@wmjordan
Copy link
Contributor Author

wmjordan commented Apr 5, 2021

In the existing code, Emit(OpCodes.Ldstr, name) will cache the property name anyway.

If we are to enable runtime renaming of property names, we can create a type with string fields which holds property names and we read from those fields instead, when writing out property names.

@rpgmaker
Copy link
Owner

I will see what I can do regarding the caching logic. Was busy in the past week.

@rpgmaker
Copy link
Owner

I will try to get to this soon. Been busy with other stuff.

@rpgmaker
Copy link
Owner

rpgmaker commented Mar 8, 2023

@wmjordan , do we still need to make this change?

@wmjordan
Copy link
Contributor Author

wmjordan commented Mar 8, 2023

I am sorry to be too busy with my own projects too :P

Jump to this issue and use the code to pre-calculate field names if you want to further improve the performance.

@wmjordan
Copy link
Contributor Author

wmjordan commented Mar 8, 2023

Once the field names are pre-calculated, the performance impact of ToLowerCase will be very minor.

@rpgmaker
Copy link
Owner

rpgmaker commented Mar 8, 2023

Awesome. Thanks for the clarification.

@rpgmaker
Copy link
Owner

@wmjordan, do we still want to make this changes?

@wmjordan
Copy link
Contributor Author

Hi, I am too busy with my stuff. I have lost track with this project already.
The code had been presented above. You just need to replace the line with copy and paste.

@rpgmaker
Copy link
Owner

Np. I will take care of it. Let me know if I can contribute anything to your projects. Thanks again for the tip.

Repository owner deleted a comment from tj-appsudo Feb 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants