-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: Stop server startup if an unrecognized option is found in a config file #9855
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9855 +/- ##
================================================
- Coverage 77.5172% 77.2161% -0.3011%
================================================
Files 403 405 +2
Lines 81827 81641 -186
================================================
- Hits 63430 63040 -390
- Misses 13689 13932 +243
+ Partials 4708 4669 -39 |
For context: this is the mysql server behavior with parsing config files. If an item is discovered that is unknown, an error is returned. As Kolbe pointed out, the logic is:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind compatibility
@jackysp PTAL |
/run-all-tests |
It seems that there are some illegal configurations in the test, which causes TiDB to fail to start. |
/run-all-tests |
/run-all-tests tidb-test=pr/764 |
/run-unit-tests tidb-test=pr/764 |
/run-mybatis-tests tidb-test=pr/764 |
config/config.go
Outdated
for _, item := range undecoded { | ||
undecodedItems += fmt.Sprintf(" %s \n", item.String()) | ||
} | ||
err = errors.Errorf("config file %s contained unknown configuration options:\n%s", confFile, undecodedItems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error generated by errors.Errorf()
is attached with a call stack, we don't need to wrap a errors.Trace()
in line 391.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine with me, but the trace is not my code (it was there before, so that a non-existent config file for example would be traced for some reason).
@@ -371,10 +372,22 @@ func GetGlobalConfig() *Config { | |||
|
|||
// Load loads config options from a toml file. | |||
func (c *Config) Load(confFile string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When upgrade from old version tidb to a newer version, some old config item maybe is not used in the newer version tidb. The rolling upgrade may meet error in this case, then the rolling upgrade failed.
How about only check the config items when -config-check
is provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this suggestion. I think the behavior should be that tidb-server simply refuses to start if there are unknown configuration options, for the reasons I provided above.
A couple points:
-
Now that there's a
--config-check
option, it can be a part of the recommended procedure for a rolling upgrade, so that the only people running into the issue you describe are those not following the recommended procedure. -
The benefit of a rolling upgrade is that you never have to take the entire cluster offline. When upgrading the first node, it will refuse to start with this error. That's a fine time to address the problem, apply the change to the config files, get that first node upgraded, and then proceed with the rolling upgrade.
@morgo, your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because our own CI broke, we may need to ease this change in slowly. Here would be my suggestion:
- Keep
--config-check
as is, it is independently very useful. - Change from refuse to start to print a
WARN
about each incorrect config item. - Add a new flag for
--config-strict
, which defaults toFALSE
. We can document it that enabling it is a best practice, and we may change the default toTRUE
in the future. This will result in errors for invalid options.
It is not bulletproof of course, because the default log level is INFO
it is possible that the warnings may not be seen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the very fact our own CI broke is evidence of the value of this behavior change. Even we weren't keeping track of whether our config files were well-formed!
-
Your suggestion above is difficult, because the logging system isn't set up yet when the config file is parsed. Either we have to keep around the list of undecoded items and check them later after logging is possible, or we have to pass some other string/structure/flag around, or we have to simply print something to stderr, or ...? @coocood Morgan mentioned that you may have some thoughts on this?
-
I dislike this. Adding this new flag means that people may start relying on it, and then we have to keep it around as a no-op if we later make that behavior the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree with Morgan's first and second suggestions. Our users may use old version ansible to upgrade, that doesn't pre-check the config using tidb-server --config-check
, we need to keep the cluster stable when rolling upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, it sounds like there's a consensus. Can someone offer a suggestion of the best way to technically implement a "warning" before the logging system is set up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolbe I will create a pull request to your branch later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@winkyao I added some comments on your PR. I re-wrote my own interpretation of how I see this working and pushed it to my branch of my fork, please take a look.
Basically I added a kind of unsettling amount of additional stuff to create a custom error in config.go, then use that in the case of failed config validation. That enables checking for the specific error type in main.go, and if configStrict is not enabled, it'll get the string from the warning and keep that until ater logging has been set up.
This is kind of a lot of extra work and adds some ugly things, but the whole idea behind strict config checking (and the --config-strict option) is that this is a temporary situation until we make --config-strict behavior the default (and hopefully only!) behavior of TiDB Server in the future. I think at that point we'd simply remove all this extra stuff with the custom error and strange handling in the server.
Thoughts?
/run-all-tests tidb-test=pr/764 |
… defer a warning from loadConfig until after logging has been set up. Virtually all of this should be removed after configStrict is made the default behavior of TiDB Server!
Ci failed by
The test case is wrong, you should replace (https://github.com/pingcap/tidb/blob/master/config/config_test.go#L48)
with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @zz-jason PTAL
/run-all-tests tidb-test=pr/764 |
Codecov Report
@@ Coverage Diff @@
## master #9855 +/- ##
================================================
- Coverage 78.0225% 77.5191% -0.5035%
================================================
Files 404 403 -1
Lines 82016 81932 -84
================================================
- Hits 63991 63513 -478
- Misses 13324 13711 +387
- Partials 4701 4708 +7 |
/run-all-tests tidb-test=pr/764 |
/run-common-test tidb-test=pr/764 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
#7103
What is changed and how it works?
Previously, tidb-server would simply ignore invalid config options encountered in a config.toml file. This can cause big problems if a user puts an option in the wrong section of the config file, or if they mis-spell an option name. This could conceivably lead to data loss (if, for example, data is not written to the correct device), security issues (if listeners are not bound to the correct interface), or administration headaches (if someone puts an option in the wrong section and cannot figure out why it isn't taking effect).
This change uses the metaData returned by toml.DecodeFile to identify any items not decoded. A list of them is made and is returned as an error from Config.Load, which causes server startup to abort.
Check List
Tests
Yes, check added to config/config_test.go to make sure that an unrecognized option in a config file throws the correct error.
Code changes
No
No
No
No
Side effects
No
No
Yes, if someone is using a config file in product that has unrecognized options.
Related changes
No
Yes
tidb-ansible
repositoryMaybe, if it has a strange config file with invalid options!
Yes