Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernize options #1852

Merged
merged 17 commits into from
Oct 24, 2018

Conversation

mwoehlke-kitware
Copy link
Collaborator

@mwoehlke-kitware mwoehlke-kitware commented Jul 4, 2018

This overhauls how we store and manage options, adding convenience and type safety, and especially, getting us down to one place where options and such are defined. The general idea here is that, rather than a great big list of options in a global struct, each individual option is a separate object with a static initializer to set up its information, including its default value. (For the sake of config handling, there will still likely be a giant register_options sort of thing, but it will be generated by a script.) The benefits of this vs. the current approach are:

  • Options are much more readable. Options are now described fully in options.h, which (not by accident!) is not dissimilar to the output of --update-config-with-doc.

  • Options are type safe and can be accessed directly. It is impossible to use the wrong union member to access an option's value, because there is no more union. The only place a map is needed is for dealing with the config file (reading/updating/documenting). While Add and use option accessors #1945 already achieves this to an extent, it added two more places where options needed to be defined... see next item.

  • Options are defined in exactly one place. Defaults are defined along with the option definition. (This is sort of a lie, but as far as what code developers would ever touch, it is true. The rest is auto-generated.) The (debug) logic to enforce ordering consistency can go away.

  • The logic to parse options from the config or convert them back to strings is be more localized and less monolithic.

Old option declarations existed in as many as five places; the enum value in options.h, the registration (with string literals) in options.cpp, and (in some cases) setting the default value in a different location in options.cpp. #1945 added two more places; the accessor declaration and definition.

The new declaration looks like:

// This is an example option. Here is its description.
// The description can be multiple lines. Line breaks are preserved as-is.
extern option<bool>
sp_example; // = true

The script that generates options.cpp recognizes the optional // = X comment as specifying a default value (X is a C++ expression). The group markers should be obvious, and have the benefit of (by katepart, at least) being recognized region markers.

We still have "accessors" (recall, the whole point of #1945 was to introduce an alternate mechanism for accessing option values that was source-compatible with what we introduce here), but the "accessor" is calling operator() on the static option instance. This no longer returns a reference; assignment is instead done directly (sp_foo = val vs. sp_foo() = val). Additionally, places that take options (e.g. blank_line_set) can now take the option object and retrieve the option name from the same, eliminating the need for macros to pass both the name and accessor.

I think this will also encourage writing better documentation, since the descriptions aren't burdened with indenting, quoting (or escaping), or being highlighted as string literals, and it is less awkward to write multi-line descriptions. (Also, one of the things I'm doing in this PR is thoroughly revising all of the option descriptions for consistency of phrasing and, for options that needed it, clarity.)

Here's a comparison of using an option:

// Old
if (cpd.settings[UO_sp_example].a == AV_REMOVE)

// New
if (options::sp_example() == IARF_REMOVE)

(This last part is already mostly achieved by #1945 and #1974, which were done largely in order to push most of the churn out of this PR. However, there are a few instances that #1974 skipped that are now ported, especially in newline.cpp.)

@mwoehlke-kitware
Copy link
Collaborator Author

Current status:
The script doesn't exist yet, and I've only partly completed proofreading the option documentation and converting the old default value assignment to the new model. Most of the conversion has been automated, but not sufficiently that I can just redo it (plus the proofreading is all manual), so I'll be rebasing by hand to deal with any options that get added between now and whenever this is actually ready. (I plan to review the documentation of each option to tweak the line wrapping, grammar, and general presentation in order to improve consistency and overall "visual quality". I'm also attempting to more clearly mark language-specific options. Some of this can be seen already through careful reading.)

option.cpp will implement stuff for dealing with a single option (e.g. reading from config, converting value to string). options.cpp[.in] will deal with the more general aspects of reading/writing configs. The various enum values will be reexposed under more-or-less their old names (AV_ changes to IARF_; I think the others are all the same) by generated code. The mappings of enums to/from strings will also use generated code (see also OPTVAL_ALIAS).

@mwoehlke-kitware
Copy link
Collaborator Author

p.s. I haven't figured out yet how it's going to work code-wise, but the plan is to have code to add non-"zero" defaults to the option descriptions, which is why I have been removing those from the descriptions as in options.h. (Or... should we just print the defaults for everything?)

@CDanU
Copy link
Collaborator

CDanU commented Jul 4, 2018

should we just print the defaults for everything

Unless it makes it substantially more easy to implement your changes I'd say no, as I hope that at some point in the (far?) future we will not have defaults that are causing changes (see #820).
Having defaults printed in that situation is going to be pointless.

@mwoehlke-kitware
Copy link
Collaborator Author

should we just print the defaults for everything

Unless it makes it substantially more easy to implement your changes I'd say no,

Well, it does 😄, maybe (depending on your definition of "substantially", and especially if I'm trying to be lazy), but...

as I hope that at some point in the (far?) future we will not have defaults that are causing changes.
Having defaults printed in that situation is going to be pointless.

...I hadn't really thought about it like that. In that light, I don't think it's worth dodging the effort to do it properly.

@mwoehlke-kitware
Copy link
Collaborator Author

I'm about a third of the way through the description revisions (specifically, I've pushed up to the start of the "Indenting options" group at L990), and have pushed the updates. Now would be a good time to proof-read what I have, since it is difficult to produce meaningful diffs.

What am I doing?

  • Trying to mark language-specific options, like // (OC) Describe option that only apples to Objective-C.
  • Making sure all descriptions end with a period.
  • Removing use of abberviations (e.g. "paren" → "parenthesis").
  • Reducing redundancy by removing certain detailed explanations from all but the first of a group of (consecutive) related options.
  • Improving consistency of phrasing, especially:
    • Using the form , i.e. 'a' vs. 'b' consistently (as opposed to : 'a' vs 'b').
    • Generally trying to avoid colons in prose.
    • Using as in before a non-comparative example.
    • Using Add or remove space instead of phrases like Control spacing.
    • Using Whether to to describe Boolean options.
    • Consistently formatting blocks where possible option values are enumerated.
  • Limiting lines to about 75 characters (not counting the //), and breaking lines early in some cases to avoid awkwardly-placed breaks.

@mihaipopescu
Copy link
Collaborator

mihaipopescu commented Jul 8, 2018

I like the core idea of this PR but I'm concerned for future downstream or upstream merges. This big change will need to be accompanied by an upgrade guide.

@mwoehlke-kitware
Copy link
Collaborator Author

Question: We have the group "Blank line options" with stuff like "The maximum number of consecutive newlines in a function." But we also have e.g. nl_var_def_blk_end, "The number of newlines after a block of variable definitions not at the top of a function body." which is in the "Newline adding and removing options" group. Would it make more sense if this (and similar) was in "Blank line options"?

@mwoehlke-kitware
Copy link
Collaborator Author

This big change will need to be accompanied by an upgrade guide.

Changes affecting options.{cpp,h}, besides just adding options, will be... fun... Hopefully there won't be any of those.

Just adding options should be obvious enough; as always, "do like existing options". The anatomy of an option is:

// Describe option here.
// Using more than one line is okay.
extern TYPE
my_option; // = default_value

...where TYPE is e.g. option<iarf_e>, bounded_option<signed, -16, 16>, etc. The type and name must be separate lines. The default is a C++ expression, and is optional; if no default, omit the trailing comment. If you mess it up, either make_options.py will fail, or you will likely get a link error (unless the option is never actually used).

Changes elsewhere are easier: sed -r 's/cpd.settings\[UO_(\w+)\]\.\w+/options::\1()/g'. For example:

// Old
if (cpd.settings[UO_sp_after_byref].a == AV_REMOVE)

// New (assumes 'using namespace uncrustify', or more preferably,
// code that is in the uncrustify namespace)
if (options::sp_after_byref() == IARF_REMOVE)

(Also, AV_*IARF_*.)

@CDanU
Copy link
Collaborator

CDanU commented Jul 10, 2018

Would it make more sense if this (and similar) was in "Blank line options"?'

Yes that would be better.

About the anatomy of an option: Is there a way to leverage the compiler/ide/other tools for the default options? With a comment I won't see instantly that I have used a wrong type. Maybe a static_assert can be used somehow?

@mwoehlke-kitware
Copy link
Collaborator Author

Is there a way to leverage the compiler/ide/other tools for the default options? With a comment I won't see instantly that I have used a wrong type. Maybe a static_assert can be used somehow?

Well, you will definitely get a compile error if you mess it up, but in the generated options.cpp. I don't know if there's a good way to get your IDE to show you in options.h if you have a problem. I can think of two possibilities, but both have drawbacks that I'm not sure are worth it:

  • Make the default value part of the template type. This would be the best option, except that it doesn't work for option<string>... at least not trivially.

  • Make the default value be a separate thing after the option, e.g. extern TYPE my_option; T my_option_default = ...;. This is fairly easy to do, but the DRY violations increase the chances of things getting out of sync, at which point you're back to just having a build error, at best. (At worst, the default would be silently ignored, but I'd certainly try to write make_options.py in a way that would avoid that.)

...but from the way you worded your question, maybe the build error is enough? Given the incomplete state of this PR, it may not be clear that the default value eventually becomes real C++ code in options.cpp, not just a string that is parsed at run-time.

It is probably useful to show what an option in options.cpp looks like:

option<lineend_e> newlines = {
  "newlines",
  "u8R__(The type of line endings.)__",
  LE_AUTO // note: this is the default value
}

Note how the default is C++ code, not a string (well, unless the option is a string option 😉). So, if you mess it up, you'll get a build error; you just won't necessarily get your IDE showing you a helpful alert before you build at all.

I'm inclined to punt, at least until this is more functional. I may or may not revisit the first option in this PR, but even if not, we can always revisit at any point in the future. That said, I don't think there is any way to make errors show up earlier (i.e. immediately in the IDE) that doesn't involve some trade-off in code complexity.

@mwoehlke-kitware
Copy link
Collaborator Author

Okay, I think I'm more or less done with updating all the descriptions, so I can start working on actual code. I have a bunch of unorganized changes locally that still need to get sorted, some of which may yet end up squashed into the "rework the options" commit, but I should at least be essentially done with options.h. Locally, I also have options.cpp, option_enum.cpp and option.cpp compiling, though in the last case that is because I've (temporarily) commented out almost all of the original code. Also, this means that all of the initializers of the options are compiling, but my current make_options.py isn't emitting the registration yet (though adding that should be uneventful).

I've punted on several options, however (see #1880, and also the TODOs in options.h); if anyone wants to tackle those, or at least can explain to me what those options do, that would be appreciated! (Note: the plan is to leave the TODOs in the code until they are resolved, which may mean they're in the final form of this PR. One of the niceties of this PR is that it is much less awkward to stick ancillary comments like this in options.h, and make_options.py will ignore them as long as there is a blank line before an actual option descriptions.)

Note: I don't plan to rebase (and incorporate option changes since the start of the branch) until everything else is done.

Would it make more sense if this (and similar) was in "Blank line options"?'

Yes that would be better.

I also haven't done this yet. I'll try to remember to get to it, but it's not the end of the world if I forget...

@mihaipopescu
Copy link
Collaborator

How easy do you think it would be for a fork like ours which has some extra options added on it to merge this refactor? Maybe a small upgrade guide would be useful. Otherwise I'll have to really consider integrating our changes upstream before this... and in general it is a good idea but time is not on our side.

@mwoehlke-kitware
Copy link
Collaborator Author

(Not dead yet 🙂.)

I pushed an update to the branch. It isn't ready yet (and still needs to be rebased to account for option changes since the branch was started), but it's getting a lot closer. In particular, the new option classes are all implemented now, as is the remainder of option.cpp, and I've started working on the rest of the sources. (I have some additional changes locally that are not yet committed, though what's left is largely going to be using namespace uncrustify; and AV_IARF_.)

Meanwhile, if anyone wants to start looking at the new option classes API, or the new code generation scripts, those are believed at this point to be "done".

I also need to fix some bugs that are going to affect styling here (and probably have a discussion about not enforcing many blank lines in class definitions before inline method definitions, because that's ugly, especially in enum_flags.h) and I would love if #1913 can land first. I also currently have not implemented print_options, and I'm rather hoping I don't need to; see #1865.

@mihaipopescu
Copy link
Collaborator

and still needs to be rebased to account for option changes since the branch was started

So you are facing the same problem as our fork will have when I'll want to upstream our new options... unless I'm faster but I don't have enough free time right now to do it.

Ideally, if the refactor can be expressed with some clang refactor calls then it would make it a lot simpler for everyone.

@mwoehlke-kitware
Copy link
Collaborator Author

So you are facing the same problem as our fork will have when I'll want to upstream our new options...

Well, yes. I think I said somewhere (though not on this PR) that I will document what I do when I do the rebase.

Ideally, if the refactor can be expressed with some clang refactor calls then it would make it a lot simpler for everyone.

Heh. Considering that trying to build clang[-format] is one of the reasons I'm so heavily in favor of uncrustify instead, that seems... unlikely 🙂. Maybe I'll take a crack at producing a sed script to try to reproduce what I did originally by hand. It won't be perfect, though; I did a lot of hand clean-up on the descriptions, as noted in earlier comments. Even without the consistency changes, I've been sticking rigorously to a 79 column length limit.

In any case, the best you're likely to get is an approximation of old-style options in the new format, which you'll still have to pull out and reinsert changes by hand. No matter what, it's going to involve diffing your current state against the commit immediately preceding the refactor and more or less reapplying that by hand.

If I do write that script, I might inject a commit that just applies the script with no polishing, so that there is a diff between that and the polished descriptions. That might help...

@mihaipopescu
Copy link
Collaborator

that's probably just travis, tried restating the job.

@mwoehlke-kitware mwoehlke-kitware mentioned this pull request Oct 16, 2018
Start converting options to use a more "modern C++" implementation.
Specifically, make each option a C++ class to hold all of the
information about the option, and to use static initialization to set
the default value. These are strongly typed (no accessing the wrong
member of a union) and all of the information is set up in exactly one
(user editable) place.

Right now, this is ONLY the new declarations, and supporting data types;
much of the actual implementation remains to be done. The actual
initialization will be done by code generated from the declarations
using a Python script, as will conversion of the values between text and
their respective enumerations. As such, THIS COMMIT DOES NOT BUILD.
Create the script to generate the option instance declarations and
registration function from options.h. Refactor some of the options
machinery to work with the new options types, and (for now) comment out
the rest. This gets us to the point where option.cpp and options.cpp
will actually build, although the rest of the code has lots of issues
still.
Create the script to generate the conversions to and from option related
enumerators and strings, and to add aliases for the various enumerators
to the main namespace. Start removing some of the old code that this
replaces.
Change add_keyword to take the tag argument as a std::string rather than
a char*. Since the tag was immediately converted to a std::string
anyway, this is more efficient if the caller already has a std::string.
(The need to pass a char* for logging is a wash, even aside from it
being somewhat pointless to care about logging performance. We need the
std::string anyway, and the cost of calling c_str() on the std::string
is negligible.)
Finish implementing the new option classes, and finish refactoring all
of the code in option.cpp to use the new classes, as well as to use more
C++ (particularly std::string and std::ifstream) where it makes sense to
do so. This gets us to the point that option.cpp compiles, though other
sources still need some tweaking.
Tweak output of option descriptions to prevent outputting trailing space
for blank description lines.
Update expected CLI output to account for new option objects and
documentation updates.
Update some enumeration names to more correctly conform with our naming
convention, which specifies that words should be separated with
underscores. Specifically, this changes 'lineend_e' and 'tokenpos_e' and
some related entities.
Rename some class ctor arguments to avoid shadowing class methods of the
same name. This is somewhat gratuitous and not really "necessary", but
avoids tripping a -Wshadow warning.
Modify make_options.py and make_option_enum.py to explicitly do their
I/O in UTF-8. This should make them "safer", and more importantly, fixes
a bug that caused them to hang under Python 3 because the iteration
sentinel was explicitly bytes, which apparently never matches under
Python 3, which would cause the build to just hang.

Thanks go to Jørgen Ibsen for pointing out the issue!
Also use Unicode arrows in documentation epilogue. This is consistent
with using them in the documentation of individual options.
When building with MSVC, instruct the compiler to use UTF-8 for both the
source and execution character sets. This prevents UTF-8 string literals
from being mangled by the compiler, and allows us to correctly output
non-ASCII text.
Tweak AppVeyor configuration to force Python's output encoding to UTF-8.
This avoids an issue where Python throws an exception trying (and
failing) to coerce the output into CP-1252.
Tweak test_cli_options.py to force consuming uncrustify's output
as raw bytes, rather than text (as Python sometimes tries to perform
codec conversions on the text, which can go sideways), and instead
explicitly convert the output from UTF-8. (This also requires us to
handle newline normalization ourselves.)

Incidentally, with these changes, we are always dealing explicitly in
either raw bytes or "proper" Unicode in a way that is consistent between
Python 2 and Python 3, which means we can get rid of the compatibility
shim to promote from locale-encoded strings when using Python 2.
@mwoehlke-kitware
Copy link
Collaborator Author

Okay, rebased to pick up #2044; hopefully that should fix coveralls...

@mwoehlke-kitware
Copy link
Collaborator Author

@gmaurel, is this okay to merge?

Copy link
Collaborator

@gmaurel gmaurel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, do it.

@gmaurel
Copy link
Collaborator

gmaurel commented Oct 24, 2018

what coverage/coveralls "says" has 2 aspects:
It shows us, what is not used, I will still work on it the next time,
if the % coverage is decreasing is not much important

@mwoehlke-kitware mwoehlke-kitware merged commit 8c69fef into uncrustify:master Oct 24, 2018
@mwoehlke-kitware
Copy link
Collaborator Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Anything not visible to end users, e.g. code refactoring.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants