-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider using another data format for configuration files #119
Comments
To be honest, I like JSON with its limitations that make it also less likely to break. Sometimes for the users is hard to write, but I think we should prefer improving the error messages when a configuration is not working instead of changing the format. For example, if there is not a value for process_cmdline and RS realize that it cannot find the processes, it could write "are you sure you want to use the default value for process_cmdline which is ?" or something like that. [And I will ignore your suggestion of XML -.-] |
I would like to move the contest configuration to the database as much as possible. Ideally all services in a contest should just know how to contact the PostgreSQL database (or some kind of reference service) and then get everything they need from there. So far I have no operative proposals for that, though. |
I think that the config should contain only the system and network configuration (e.g. location of lib, log, cache and run directories, paths of external libraries, URL of the database, listening addresses and ports for AWS and CWS, etc.). Everything else should be moved to the database (e.g. maximum submission limits, etc.). The "rule" should be: cms.conf should be edited only by system administrators, whereas contest administrators should only interact with the database. I expect that a dump should contain everything needed to run an identical contest (in every aspect) even on a totally different system (machines, linux distributions, network topology). I don't think I like the idea of services getting their system configuration by connecting to a central service. That would mean adding another single point of failure in addition to the database, which is against our design goal "no service is fundamental", isn't it? |
Uhm, probably both approaches can be useful. I agree with your distinction between things managed by contest and system admins. Yet, I think that a (optional?) centralized service for managing the (system) configuration of a lot of machines could save a lot of adrenaline when running a complex contest (in other words: don't overrely on the system administrator, because they're a SPOF too). |
I think system administators already have their own tools to manage distribution and synchronization of configuration files and management of large sets of machines. Tools designed for that purpose, like puppet, ansible, etc., are more powerful and reliable than those we could code ourselves, and administrators are already familiar with them. I fear that it would end like ResourceService, which is basically doing what supervisor, systemd, upstart, etc. could do much better. |
This kind of tools is usually very complicated to use in a one-shot environment like one usually finds for many contents (in our experience too). Moreover, they're usually better suited for rather large and complex setups, where updates are rather low frequency. This is not our case at all (aside from that, in my experience puppet has no relationship at all with the words "powerful" and "reliable"). |
BTW: JSON5 could be a nice compromise between usability and simplicity. |
I think nowadays the format of choice would be TOML: https://toml.io/en/ But if we'd like to keep the JSON while adding comments, there's this recent one by Microsoft. It looks less intrusive than JSON5 (which would allow, among other things, to use either single or double quotes, which is probably not really an issue that needed to be solved as urgently as the comments). See also: https://github.com/microsoft/node-jsonc-parser |
For me, absence of comments is the second worst problem. The one that annoys me the most is illegality of traling commas (I lost count of how many times I was burned by this!). |
If CMS would switch to a standard made for configuration files, like TOML, it would indeed make configuration files a lot more readable and ease configuration significantly. See this snippet for what it may look like. That would require people to rewrite their existing configuration files (or use a script provided for that). It would be interesting to know if people say they would be affected by that so much it would not be worth it? If CMS does decide to switch to TOML, there would be some things to consider: TOML supports sections. It would however make the configuration file and the code cleaner, by making the sections syntactically meaningful. The alternative, of course, would be to not have sections, and keep the "section comments" as they are. Notifying users of the change.
Update Script. In forks, there will be keys that are not part of upstream CMS. In case a fork's maintainers merge without adapting the script and the config's datastructure: I propose that it should throw a warning when encountering any unknown keys and then put them in a special Also, more technical: I have started implementing this in our fork to see how it would work. Should CMS decide TOML would be a good way to go, I would of course continue to implement it based on any feedback and create a pull request at some point. |
Another candidate is JSON5, which is JSON sans the most annoying features. |
JSON5 would certainly be less intrusive, so it could well be the best option if we want to preserve compatibility! But it is still harder to read and write than TOML. And while TOML will be part of Python's standard library from Python 3.11, I don't really know how good the JSON5 libraries are yet? Also, being able to overhaul the config structure while moving to e.g. TOML would be helpful: E.g., there are currently attributes
which seems inconsistent and, even though being such a little thing, is confusing both in the config and in the Python code. The same holds for other attributes where I remember I have had to double-check what they belong to repeatedly. These things would be made better by having sections, I think. |
Worth noting that from version 3.11 of Python, toml is part of the standard library: https://docs.python.org/3/library/tomllib.html This makes it a more favourable candidate than JSON5 in my opinion since we wouldn't need to install a third party library. |
JSON is cool because it's an intuitive and concise language to describe simple data structures. But I don't think it's suited for configuration files, mainly because it lacks the ability to add comments. We worked around this by using underscore-prefixed keys (and abusing JSON's flexibility on not giving errors when the same key is defined many times?), but they're quite ugly, long, counter-intuitive and can lead to stupid mistakes (see issue #117). I therefore suggest to move to another format, better suited for that purpose.
This [1] could be a good candidate, since it's in the standard library and it's a common format (used, among others, by Python's setup.cfg files, git, openssl, etc.). Yet, I'm not sure it supports list values (which we're using a lot). Unfortunately there's no formal description of the language, so no easy way to discover it but trying...
Another option could be YAML. It's a far more powerful markup language that JSON, we're already using it in YamlImporter and I also like it aesthetically because it resembles Python (because of the indentation, I guess...).
The final option I see is XML, but I think this is too much complexity.
What do you think? Do you also feel that JSON is limited? What format would you prefer to use instead?
[1] http://docs.python.org/2/library/configparser.html
The text was updated successfully, but these errors were encountered: