Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using another data format for configuration files #119

Open
lw opened this issue Mar 6, 2013 · 13 comments
Open

Consider using another data format for configuration files #119

lw opened this issue Mar 6, 2013 · 13 comments

Comments

@lw
Copy link
Member

lw commented Mar 6, 2013

JSON is cool because it's an intuitive and concise language to describe simple data structures. But I don't think it's suited for configuration files, mainly because it lacks the ability to add comments. We worked around this by using underscore-prefixed keys (and abusing JSON's flexibility on not giving errors when the same key is defined many times?), but they're quite ugly, long, counter-intuitive and can lead to stupid mistakes (see issue #117). I therefore suggest to move to another format, better suited for that purpose.

This [1] could be a good candidate, since it's in the standard library and it's a common format (used, among others, by Python's setup.cfg files, git, openssl, etc.). Yet, I'm not sure it supports list values (which we're using a lot). Unfortunately there's no formal description of the language, so no easy way to discover it but trying...

Another option could be YAML. It's a far more powerful markup language that JSON, we're already using it in YamlImporter and I also like it aesthetically because it resembles Python (because of the indentation, I guess...).

The final option I see is XML, but I think this is too much complexity.

What do you think? Do you also feel that JSON is limited? What format would you prefer to use instead?

[1] http://docs.python.org/2/library/configparser.html

@stefano-maggiolo
Copy link
Member

To be honest, I like JSON with its limitations that make it also less likely to break. Sometimes for the users is hard to write, but I think we should prefer improving the error messages when a configuration is not working instead of changing the format.

For example, if there is not a value for process_cmdline and RS realize that it cannot find the processes, it could write "are you sure you want to use the default value for process_cmdline which is ?" or something like that.

[And I will ignore your suggestion of XML -.-]

@giomasce
Copy link
Member

giomasce commented Jan 3, 2014

I would like to move the contest configuration to the database as much as possible. Ideally all services in a contest should just know how to contact the PostgreSQL database (or some kind of reference service) and then get everything they need from there.

So far I have no operative proposals for that, though.

@lw
Copy link
Member Author

lw commented Jan 3, 2014

I think that the config should contain only the system and network configuration (e.g. location of lib, log, cache and run directories, paths of external libraries, URL of the database, listening addresses and ports for AWS and CWS, etc.). Everything else should be moved to the database (e.g. maximum submission limits, etc.).

The "rule" should be: cms.conf should be edited only by system administrators, whereas contest administrators should only interact with the database. I expect that a dump should contain everything needed to run an identical contest (in every aspect) even on a totally different system (machines, linux distributions, network topology).

I don't think I like the idea of services getting their system configuration by connecting to a central service. That would mean adding another single point of failure in addition to the database, which is against our design goal "no service is fundamental", isn't it?

@giomasce
Copy link
Member

giomasce commented Jan 3, 2014

Uhm, probably both approaches can be useful. I agree with your distinction between things managed by contest and system admins. Yet, I think that a (optional?) centralized service for managing the (system) configuration of a lot of machines could save a lot of adrenaline when running a complex contest (in other words: don't overrely on the system administrator, because they're a SPOF too).

@lw
Copy link
Member Author

lw commented Jan 3, 2014

I think system administators already have their own tools to manage distribution and synchronization of configuration files and management of large sets of machines. Tools designed for that purpose, like puppet, ansible, etc., are more powerful and reliable than those we could code ourselves, and administrators are already familiar with them.

I fear that it would end like ResourceService, which is basically doing what supervisor, systemd, upstart, etc. could do much better.

@giomasce
Copy link
Member

giomasce commented Jan 3, 2014

This kind of tools is usually very complicated to use in a one-shot environment like one usually finds for many contents (in our experience too). Moreover, they're usually better suited for rather large and complex setups, where updates are rather low frequency. This is not our case at all (aside from that, in my experience puppet has no relationship at all with the words "powerful" and "reliable").

@gollux
Copy link
Contributor

gollux commented Aug 9, 2022

BTW: JSON5 could be a nice compromise between usability and simplicity.

@wil93
Copy link
Member

wil93 commented Aug 9, 2022

I think nowadays the format of choice would be TOML: https://toml.io/en/

But if we'd like to keep the JSON while adding comments, there's this recent one by Microsoft. It looks less intrusive than JSON5 (which would allow, among other things, to use either single or double quotes, which is probably not really an issue that needed to be solved as urgently as the comments). See also: https://github.com/microsoft/node-jsonc-parser

@gollux
Copy link
Contributor

gollux commented Aug 10, 2022

For me, absence of comments is the second worst problem. The one that annoys me the most is illegality of traling commas (I lost count of how many times I was burned by this!).

@magula
Copy link
Contributor

magula commented Jan 24, 2023

If CMS would switch to a standard made for configuration files, like TOML, it would indeed make configuration files a lot more readable and ease configuration significantly. See this snippet for what it may look like.

That would require people to rewrite their existing configuration files (or use a script provided for that). It would be interesting to know if people say they would be affected by that so much it would not be worth it?

If CMS does decide to switch to TOML, there would be some things to consider:

TOML supports sections.
I believe it would make sense to basically transform every (syntactically meaningless) line from cms.conf that is of the form "_section": "Database", to an actual section [Database]. This changes the format of the dictionary resulting from toml.load inside conf.py, so the corresponding arguments would need to be adapted in the python code, which would be some work (a lot more than just dropping in toml for json, in fact). And, perhaps more importantly, it would mean work for anyone with additional arguments in their CMS fork.

It would however make the configuration file and the code cleaner, by making the sections syntactically meaningful.

The alternative, of course, would be to not have sections, and keep the "section comments" as they are.

Notifying users of the change.
There should be a message informing users of the change. This could look like

A configuration file /usr/local/etc/cms.conf containing JSON was found but will not be used, because configuration is now stored in a TOML file. You should rewrite /usr/local/etc/cms.conf to TOML.

Update Script.
The change could come with a script to (heuristically) convert an existing JSON config to a TOML config.

In forks, there will be keys that are not part of upstream CMS. In case a fork's maintainers merge without adapting the script and the config's datastructure: I propose that it should throw a warning when encountering any unknown keys and then put them in a special [stray] section, the content of which would be treated like the attributes of the current JSON config so they can still be accessed the same way they are accessed now. This way, forks might just continue to work even without changes being made by maintainers.

Also, more technical:
Null Values.
TOML doesn't support null or anything equivalent. This is currently the default value for https_certfile and printer, for both of which it should be appropriate to make "" the default and treat this as null.

I have started implementing this in our fork to see how it would work. Should CMS decide TOML would be a good way to go, I would of course continue to implement it based on any feedback and create a pull request at some point.

@gollux
Copy link
Contributor

gollux commented Jan 24, 2023

Another candidate is JSON5, which is JSON sans the most annoying features.

@magula
Copy link
Contributor

magula commented Jan 24, 2023

JSON5 would certainly be less intrusive, so it could well be the best option if we want to preserve compatibility!

But it is still harder to read and write than TOML. And while TOML will be part of Python's standard library from Python 3.11, I don't really know how good the JSON5 libraries are yet?

Also, being able to overhaul the config structure while moving to e.g. TOML would be helpful: E.g., there are currently attributes

  • contest_listen_address, contest_listen_port, cookie_duration (not: contest_cookie_duration) for the ContestWebServer, and
  • admin_listen_address, admin_listen_port, admin_cookie_duration for the AdminWebServer,

which seems inconsistent and, even though being such a little thing, is confusing both in the config and in the Python code. The same holds for other attributes where I remember I have had to double-check what they belong to repeatedly. These things would be made better by having sections, I think.

@wil93
Copy link
Member

wil93 commented Jan 24, 2023

Worth noting that from version 3.11 of Python, toml is part of the standard library: https://docs.python.org/3/library/tomllib.html

This makes it a more favourable candidate than JSON5 in my opinion since we wouldn't need to install a third party library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants