-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add options to allow unicode character in identifier names #4508
Conversation
…s etc. Added option to allow keeping of underscore characters in class names, methods names, etc. Unicode identifiers are supported by some languages, like Java, but the codegen will remove all non ASCII letters by default. Users might want to separate different parts of names using extended punctuation connector characters, like "_, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, _, ・". Underscores cannot be used to separate different parts of the name as they're removed by default by the codegen, the second option allows underscore characters if the user so desires. Both options can be used separately if needed, for example one might allow extended punctuation connector characters, but still disallow underscore.
Thanks for the PR but your commit (as shown in the Commits tab) is not linked to your Github account, which means this PR won't count as your contribution in https://github.com/swagger-api/swagger-codegen/graphs/contributors. Let me know if you need help fixing it. |
I've added the email address in the commits as a secondary email, hopefully that should fix it. If that doesn't work i'll use the script to edit the history, please let me know if I should do that instead. |
Now the commits are correctly linked to your account. Can you elaborate with a use case in which you need to keep the underscore in the property name? As you probably aware, keeping the underscore in the variable name won't conform to C# style guide in terms of variable naming. |
Sure. The reason we need to keep the underscores is to logically separate parts of type names. We're building a proxy server to handle requests to a 3rd party system which has a very weird API and we're building a more RESTful interface for it. The problem is that the 3rd party system types use 2 different names, domain name and type name, and when calling the system we need to know which is which, as the proxy relies on reflection to do it's job (so as not to manually hard-code calls), reflection allows us to be future compatible, any new feature created in the 3rd party system just works without any further changes in the proxy. We've used _ to separate between the domain and type names, but those get removed by the swagger codegen. We'd like to have the option to keep those underscores in certain cases, if the user knows what he's doing and really wants underscores to be present, for any language they might be generating for. The feature has been implemented at the base layer, so going forward the codegen could have different default values, depending on whether underscores conform to each language's style guides or not, or in cases users really want the underscores for other reasons, like the one I mentioned above. Alternatively, some languages (like Java, which we're generating for) allow multiple unicode characters as punctuation connector characters, which would alleviate our need for underscores, as we could use one of these extended connectors to logically separate between domain and type names. Both feature default values are set to false, so they're not activated unless the user really wants to use either of them. And they can be turned on/off independently, which could be used to further configure the generators for all languages that swagger codegen supports, depending on the style guides for those specific languages. |
@eblis thanks for the detailed explanation.
Let's say the name is "domain_type" and it will become "domainType" in Java, "DomainType" in C#. I would like to point out that these are just variable names but not the actual parameter (path, header, form, query) name. When the API client (e.g. C#) sends the request to the RESTful backend, the actual parameter name ( We definitely welcome enhancement to meet different use cases but I would like to point this out to make sure we're on the same page after merging the additional features you've built. |
We are more interested in the variable and class names, as we're using reflection to handle the proxy-ing, and need to know the We can do the logical separation using underscores, but we can also use a unicode extended separator character (e.g. However, I thought that having the option to strip/not strip underscores might be useful for some people (I noticed other reported issues mentioning the fact that underscores get stripped) and also for some languages which may use/approve of this naming styling (C, Python, etc ?) The two features should allow for more customizing options for each supported language, e.g. enable unicode characters for languages that support it, enable underscores for languages (or companies) that use snake case. |
I just noticed the However, I feel like the original pull request is still valid, as you can make use of unicode characters in variable names and allow languages (and companies) that use snake case to generate code using swagger codegen. |
Thanks for the excellent use cases @eblis . I am having a use case where data is coming from the MATLAB type system and the image is already created from that and used from a long time and the response is again send back to the other system to do some analysis. I have to write an application in between that do some transformations and send the data in the same format as it is coming from the first system( snake_case). If I will not have support of snake_case then I need to convert snake_case to camelCase ( already done by swagger codegen , while storing I need to store it in camelCase and while retrieving and send back data to other system , I need to convert it back to snake_case. Which creates complexity while writing code. |
@eblis : Can you provide more insight , how to use KEEP_UNDERSCORES parameter in maven codegen. Shall I use this as a config options. |
Just a note, the KEEP_UNDERSCORES setting will not prevent camelization of the names, as I wasn't really interested in the case that much, I just wanted to keep the underscores in place. I guess it makes more sense to stop camelization if the KEEP_UNDERSCORES setting is enabled, as you probably don't want to have camel case + snake case. The easiest way to use the new settings is to enable them in the configuration options, e.g.: |
…ifier case as is (you probably don't want camel case + snake case, just snake case when KEEP_UNDERSCORES is set). Added unit tests to verify how the case is computed for various scenarios.
I have modified the pull request such that it doesn't camelize the names if KEEP_UNDERSCORES option is selected. The settings acts like a case selector now, if KEEP_UNDERSCORES is |
Thanks @eblis . I am currently using swagger-codegen-maven-plugin-2.2.1 which internally uses swagger codegen -2.2.1 . May I know when a new release will be coming where I can get these above changes merged into master. |
It depends if and when this pull request is accepted, the code from this pull request isn't available in the swagger code. |
@eblis first of all, thank you for your work. The support for unicode identifier is definitely a nice one and we definitely want to include the enhancement in the upcoming release. But for the "underscore" feature, I do have the following feedback:
(ref: If we implement this option for other generators, it should cover what the "KEEP_UNDERSCORE" option provides. Do you agree? What I would propose is that I'll cherry pick your work related to ALLOW_UNICODE_IDENTIFIERS and add the feature to the latest master while we discuss further on the "KEEP_UNDERSCORE" feature. What do you think? |
I agree with both statements.
If swagger already has an option/flag for snake casing we can use that and
make it available for multiple languages (sorry, didn't know about the
feature).
Cherry picking the changes for Unicode support also sounds like a very good
idea. If you have any trouble I can try and modify the original pull
request and only keep the Unicode chances.
Thank you,
Cristi.
…On Wed, Jan 18, 2017, 19:16 wing328 ***@***.***> wrote:
@eblis <https://github.com/eblis> first of all, thank you for your work.
The support for unicode identifier is definitely a nice one and we
definitely want to include the enhancement in the upcoming release.
But for the "underscore" feature, I do have the following feedback:
1.
the switch/flag is done inside the camelize function. IMO, that's not
the best way to do it because a developer will expect the function to
simply convert the input into camel case. A switch/flag inside the function
makes the implementation relatively easy but that will completely change
the behaviour of the function, which can be used elsewhere such as class
naming, function naming, filename naming, parameter naming, etc.
2.
For some generators (e.g. typescript) we do have an option to keep the
original naming, e.g.
modelPropertyNaming
Naming convention for the property: 'camelCase', 'PascalCase', 'snake_case' and 'original', which keeps the original name (Default: camelCase)
(ref: java -jar
modules/swagger-codegen-cli/target/swagger-codegen-cli.jar config-help -l
typescript-node)
If we implement this option for other generators, it should cover what the
"KEEP_UNDERSCORE" option provides. Do you agree?
What I would propose is that I'll cherry pick your work related to
ALLOW_UNICODE_IDENTIFIERS and add the feature to the latest master while we
discuss further on the "KEEP_UNDERSCORE" feature. What do you think?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4508 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAg-HPtvo4bbKX25Hb0foFDlGYzpmiCdks5rTkjkgaJpZM4Lc5LO>
.
|
…nicode characters in identifiers (removed references to keep underscores).
I've reworked the pull request to only include changes related to unicode characters support. |
@eblis thanks. Looks good to me. I'll perform some tests later and merge if no question/feedback from me or anyone in the community. |
Btw, before I forgot, for your upcoming PRs (not this one), I would recommend creating a new branch for the change as per git best practice. |
@eblis PR merged into master. Thanks for your contribution. |
Thank you !
…On Fri, Jan 20, 2017, 08:42 wing328 ***@***.***> wrote:
@eblis <https://github.com/eblis> PR merged into master. Thanks for your
contribution.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4508 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAg-HHF2SD7Ne7aRO2Zg0BgGiuWkfr6rks5rUFdYgaJpZM4Lc5LO>
.
|
) * Added option to allow unicode identifiers in class names, method names etc. Added option to allow keeping of underscore characters in class names, methods names, etc. Unicode identifiers are supported by some languages, like Java, but the codegen will remove all non ASCII letters by default. Users might want to separate different parts of names using extended punctuation connector characters, like "_, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, _, ・". Underscores cannot be used to separate different parts of the name as they're removed by default by the codegen, the second option allows underscore characters if the user so desires. Both options can be used separately if needed, for example one might allow extended punctuation connector characters, but still disallow underscore. * Added new command line options to all required unit tests. * Added KEEP_UNDERSCORES and ALLOW_UNICODE_IDENTIFIERS to Bash tests. * When KEEP_UNDERSCORES is set don't camelize the names, keep the identifier case as is (you probably don't want camel case + snake case, just snake case when KEEP_UNDERSCORES is set). Added unit tests to verify how the case is computed for various scenarios. * Reworked pull request to only include changes related to supporting unicode characters in identifiers (removed references to keep underscores). * These methods and classes can be static again.
PR checklist
./bin/
to update Petstore sample so that CIs can verify the change. (For instance, only need to run./bin/{LANG}-petstore.sh
and./bin/security/{LANG}-petstore.sh
if updating the {LANG} (e.g. php, ruby, python, etc) code generator or {LANG} client's mustache templates)2.3.0
branch for breaking (non-backward compatible) changes.Description of the PR
Added option to allow unicode characters in class names, method names etc.
Added option to allow keeping of underscore characters in class names, methods names, etc.
Unicode identifiers are supported by some languages, like Java, but the codegen will remove all non ASCII letters by default. Users might want to separate different parts of names using extended punctuation connector characters, like "_, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, _, ・"
Underscores cannot be used to separate different parts of the name as they're removed by default by the codegen, the second option allows underscore characters if the user so desires.
Both options can be used separately if needed, for example one might allow extended punctuation connector characters, but still disallow underscore. Sometimes you need to logically separate parts of the identifier names, which isn't supported yet, without these changes.
The default values for both options are set such that the generator will behave as it did before.