Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be Stricter About Lack of Null/None/Nil #975

Closed
mitsuhiko opened this issue May 21, 2023 · 29 comments
Closed

Be Stricter About Lack of Null/None/Nil #975

mitsuhiko opened this issue May 21, 2023 · 29 comments

Comments

@mitsuhiko
Copy link

There are quite a few people who want null/none/nil/unset or anything like this in the language. It's one of the oldest issues (#30) that was opened and closed. The most recent incarnation also did not go very far in terms of getting towards support for it:

#921

I personally try my best to avoid the use of null in my own uses of TOML, but I keep running into other people's TOML based configurations where the absence of a key is used as an alternative to null. I believe this to be a mistake in designing TOML schemas because such a value cannot be represented in TOML. This is particularly a problem when TOML files are "merged" or layered. Since null/none support is unlikely to happen I wonder if an explicit section could be added to the TOML specification describing about why null does not exist, and that the absence of a key should not be used to indicate that a value is null.

@arp242
Copy link
Contributor

arp242 commented Jun 2, 2023

How about:

TOML does not support "null" or "none" values as the semantics and support across languages is inconsistent. It's generally recommended to avoid interpreting the absence of a key as a "null" and using sentinel values instead, as a key can never be unset after files are merged or layered.

Not entirely sure where to put that though; maybe just a new section before "Filename Extension"?

@mitsuhiko
Copy link
Author

I would have put it directly into the key/value pairs section. It already has an example on invalid values which is somewhat related.

@eksortso
Copy link
Contributor

eksortso commented Jun 3, 2023

It's difficult to specify something that shouldn't exist, which is why I've held my tongue. We also seem to discourage adopting into the standard explicit recommendations for configuration designers, though there are exceptions. We also don't mention layering or merging configurations anywhere, but we could note that TOML provides no means to "unset" key/value pairs.

Let me take a stab at this, starting with your suggestion, @arp242. The proposal below follows the guidelines previously mentioned, and would appear in the Key/Value Pair section, per @mitsuhiko's suggestion.

Unspecified values are invalid. TOML does not support "null" or "none" values, and no means exists to "unset" a key defined elsewhere. The use of sentinel values is recommended for meta configuration, the logic for which is necessarily application-specific.

What do you think?

@mitsuhiko
Copy link
Author

That sounds good to me.

@arp242
Copy link
Contributor

arp242 commented Jun 3, 2023

It's difficult to specify something that shouldn't exist, which is why I've held my tongue. We also seem to discourage adopting into the standard explicit recommendations for configuration designers, though there are exceptions.

I concur, and this is also why I didn't reply initially, hoping someone else had something insightful to say, but it seems not 😅

I think the issue comes up frequently enough that it's worth at least explicitly mentioning that TOML doesn't support Null values. Your phrasing is better than mine, although I would replace and no means exists to "unset" a key with and has no means to "unset" a key, but that's a minor thing.

I considered proposing a new document for this ("FAQ", "Effective TOML", or something like that) but I don't really know what else we could put there.

@ChristianSi
Copy link
Contributor

ChristianSi commented Jun 4, 2023

This is advice, not spec, and as such I think it should not go anywhere into the spec. It might be good to write something in that regard into a TOML FAQ. It's true that there is no such FAQ yet, but maybe something like "Why is there no None type?" is indeed a good first question to start it.

Besides that, I'm very skeptical about the discussed contents of the answer. As I see it, merging multiple TOML files or somehow modifying TOML data that already exists in memory (by means of a different TOML file) are exotic special cases, not your typical use case. In the typical case, treating None by omitting the whole key/value pair when serializing and treating a missing key/value pair as None when deserializing is exactly the right way of dealing with such data (implicit rather than explicit None). I also believe that the fact that TOML has such an implicit None type is very much the reason that it doesn't have (or need) an explicit one.

Hence I'm opposed to mentioning anything about "sentinel values" except possibly for very specialized use cases. Should we ever arrive at the conclusion that such sentinel values are indeed more frequently needed, that would IMHO provide a good rationale for finally adding an explicit None as the paramount sentinel value.

@ChristianSi
Copy link
Contributor

Add-on regarding

I also believe that the fact that TOML has such an implicit None type is very much the reason that it doesn't have (or need) an explicit one.

That's indeed the very reason given by @mojombo for closing #30 way back when the Internet was still young more than 10 years ago:

"TOML is intended for configuration, at which point @aaronblohowiak is right: just leave it out."

@slonik-az
Copy link

In the absense of nil/null/none how would one indicate missing values in the array?
For instance, something like ary = [1,2,3,None,None,6,7]
There can be legitimate reasons why certain values are not available or optional.

@marzer
Copy link
Contributor

marzer commented Jul 19, 2023

@slonik-az

how would one indicate missing values in the array?

If you have Nones in the array, they're not missing. None is a value in the way you've represented it in your example. Its occupying elements [3] and [4] of the array. If they were missing values, your array would be [1,2,3,6,7].

If you want to indicate some sort of semantic absense of information intended for those slots, say for some sort of serialization scenario? Personally I'd leave that down to whatever application is resposible for validating the input, rather than depend on it being a part of the TOML itself.

or, if you're suggesting that the schema itself would assume Nones as valid to allow a user to specify those elements being blank, then your schema is poorly-designed IMO.

@slonik-az
Copy link

slonik-az commented Jul 19, 2023

I am talking about optional value types (like Option in Rust). A concrete value can be, let say, integer or None if it it missing or undefined for some reason. But it is essential that it occupies the right slot in the array. Just shrinking the array would not work.

EDIT: Providing more context. I have a library that uses JSON for configuration. It extensively uses json arrays with null values. I am trying to find an equivalent TOML representation that does not look awkward.

@marzer
Copy link
Contributor

marzer commented Jul 19, 2023

Well then, you can come up with some scheme for that in your application, like using an empty inline table {}. Ultimately it's important to remember that TOML is primarily intended for configuration, and is not a programming language. Best practice for an undefined value in a config is generally to just not define it, i.e. leave it out entirely. TOML is designed around this principle.

@arp242
Copy link
Contributor

arp242 commented Jul 19, 2023

I'd prefer it if this was kept on-topic, rather than be a generic "how do I do X without nulls?" issue. If any of the previous issues don't answer your question, it's better to comment on one of those (if on-topic) or open a new one.

@eksortso
Copy link
Contributor

I have a library that uses JSON for configuration. It extensively uses json arrays with null values. I am trying to find an equivalent TOML representation that does not look awkward.

May I ask, is this library publicly available? Using nulls in arrays as a means of specifying configuration defaults seems problematic on the surface if that's the only way to get the defaults. Can you link to the library's documentation? Or is that not possible?

@slonik-az
Copy link

Well then, you can come up with some scheme for that in your application, like using an empty inline table {}.

This is what I consider an "awkward" hack. Essentially, there is no clean way to build JSON <=> TOML bridge if json objects contain nulls.

@drunkwcodes
Copy link

VS Code is using JSON for its configs.
I'm wondering if one day it can use TOML instead.

@eksortso
Copy link
Contributor

VS Code is using JSON for its configs. I'm wondering if one day it can use TOML instead.

I'm thinking they should. Nulls and undefineds are the bane of their existence on so many occasions. They are partly fighting against them in casual use with strict null checking. Not something any configuration standard should ever have to deal with, if they did things right.

Getting these well-informed opinions codified is a struggle.

@eksortso
Copy link
Contributor

eksortso commented Aug 2, 2023

Personal arrogance aside, I feel that I should step back a moment. Nothing really should be said about NULLs as far as parsing is concerned. As it stands, parsers will produce table-like objects, and the absence of key names (including table names) when expected by applications will always be subject to interpretation by those programs. Saying anything else at the TOML level just mucks up this dynamic.

NULLs just are not simple or obvious. They may be so in programming languages, but they are not so in configurations. So programming languages must carry that weight when they do their parsing.

So let's shift our focus, where we can be much more clear-cut. Maybe we should write something to address TOML emitters. We should make it clear that the presence of a NULL value in an object to be written to TOML must raise an error. It's the responsibility of the emitter to check for this, and although they may provide ways to address NULL values before emission, they certainly cannot handle NULLs without preprocessing. This approach keeps the complications of handling null values out of TOML. And it's not a complex ask; we already strongly imply that output to a file leaves behind a valid TOML document.

This may leave users who want NULLs in arrays searching for ways to represent them. But arrays can hold any legal type of TOML value. Sentinels can be used. Empty arrays and empty tables can be used. Special Error object tables, describing the errors in the same way that web APIs return them, could be emitted as long as subsequent reads by human readers lets them know that their attention is needed.

There's no elegant catch-all solution to representing deliberately missing values, which is why data validation is such a chore sometimes. We can't assume anything at a fundamental level. That's up to users to solve, not the TOML language.

So what do you say? Shall we address proper emitter behavior?

@marzer
Copy link
Contributor

marzer commented Aug 2, 2023

We should make it clear that the presence of a NULL value in an object to be written to TOML must raise an error.

This would be standardizing existing practice - all the python TOML libraries work this way at the very least. There isn't really any other sensible approach. Putting it in writing makes sense to me.

@drunkwcodes
Copy link

I'm using pyserde to serialize my dataclass as below.
Now it's target format is json because the dataclass uses None as default values to avoid mutable default values like lists and dicts.

It is usual in python to use None, and TOML can not be used in these cases.

I'm designing the metadata format for Mojo Package.
If Toml accepts nulls, it will gain a lot more market share.
And then I can use TOML in Mojo package metadata which is AWESOME!
For readability!

@serde
@dataclass
class RingInfo:
    """Confrom mojo core-metadata"""

    name: str
    version: str
    metadata_version: str = "0.1"
    # Below are optional
    dynamic: list[str] | None = None
    platforms: list[str] | None = None
    supported_platforms: list[str] | None = None
    summary: str = ""
    description: str = ""
    description_content_type: str = "text/markdown"
    keywords: list[str] | None = None
    home_page: str = ""
    download_url: str = ""
    author: str = get_user_email_from_git()[0]
    author_email: str = get_user_email_from_git()[1]
    maintainer: str = get_user_email_from_git()[0]
    maintainer_email: str = get_user_email_from_git()[1]
    license: str = ""
    classifiers: list[str] | None = None
    requires_dist: list[str] | None = None
    requires_mojo: str = ""
    requires_external: list[str] | None = None
    project_urls: dict[str, str] | None = None
    provides_extra: list[str] | None = None
    provides_dist: list[str] | None = None
    obsoletes_dist: list[str] | None = None
    file_name: str = ""

@ChristianSi
Copy link
Contributor

ChristianSi commented Aug 5, 2023

@drunkwcodes: I'd say an object like that should simply be serialized by omitting the corresponding key/value pairs when the value is None. That's the general practice (I suppose) and also leads to the shortest and most readable TOML.

That brings me to the earlier question about whether to "address proper emitter behavior". I'd say we sure can, but we must be careful regarding the wording to use. As I see it, None in a list should (by default) be rejected as an error, but for None in a table-like object (one that's serialized to a table) the above approach is certainly reasonable and we must not forbid it.

@drunkwcodes
Copy link

Thank you for your response. I can see it as a better solution. I have filed an issue with pyserde and am now awaiting their reply.

@Fatal1ty
Copy link

Fatal1ty commented Aug 5, 2023

@drunkwcodes: I'd say an object like that should simply be serialized by omitting the corresponding key/value pairs when the value is None.

+1 to this, mashumaro does it for TOML since version 3.2 (release notes).

@drunkwcodes
Copy link

@Fatal1ty Marvelous! I will try it and adopt mashumaro.

@DominoPivot
Copy link

I do think it'd be neat if the website provided recommendations for good configuration API design, and better alternatives to null. But this very thread makes me doubt we'd ever agree on them. Here's what I would say, which directly contradicts some of the alternatives people have proposed before:

Prefer self-descriptive strings to represent edge cases over arbitrary sentinel values.

indent.size = -1         # BAD (ambiguous)
indent.size = "default"  # GOOD

Prefer boolean values for settings that exclusively take boolean values, and where the meaning of the tokens true and false will be unambiguous.

autosaving.frequency = false    # BAD (ambiguous)
autosaving.frequency = "never"  # OK
autosaving.enabled = false      # BETTER

Prefer strings over booleans for abitrary choices.

indent.useTabs = false    # BAD
indent.style = "spaces"   # GOOD

player.isLuigi = false      # BAD
player.character = "Mario"  # GOOD

@eksortso
Copy link
Contributor

@DominoPivot wrote:

I do think it'd be neat if the website provided recommendations for good configuration API design, and better alternatives to null.

That's actually a pretty good idea! The website is a separate project, of course related to this one. And that would be a suitable place to maintain and collaborate on such a guide.

@pradyunsg Remember I was talking about another document for FAQs and advice on using TOML? This is an ideal candidate for a document on the website, and well-suited to it. If it's found to be useful, it could be translated into all the languages that the website supports.

This is broader than the original purpose of this issue; an API recommendations page would cover more topics than simply how to find alternatives to null. But it would be the best place to explain why we oppose null values in the TOML spec in a productive way. And composing such a document may allow the differences in approach that we've expressed here to melt away.

I don't know much about the website, though. But it could use a few more eyeballs.

@pradyunsg
Copy link
Member

pradyunsg commented Aug 28, 2023

@pradyunsg Remember I was talking about another document for FAQs and advice on using TOML?

Yup! toml-lang/toml.io#70 filed for this. It might not need to be on the website (another .md file in this repository is definitely less work), but let's discuss the details there.


We should make it clear that the presence of a NULL value in an object to be written to TOML must raise an error.

Honestly, that's more library API design than anything else.

If an encoder wants to do this and it makes sense in the context of their library, I don't see why the spec needs to outlaw that. There's little reason to dictate the library APIs here. I would be really surprised if this isn't obvious in a similar manner that a user-defined-object or complex numbers or whatever-cool-type a language has shouldn't be serialised as-is.

A weaker but similar argument applies for arrays too -- I think it's sufficiently obvious, and doesn't need to be called out explicitly.

I don't think it's productive to have API design guidance in the specification. Policing everyone's use of TOML is not a feasible thing, and particularly, I don't think it's a productive thing to be doing.


the website provided recommendations for good configuration API design

I'm not convinced this is a format-specific thing, and needs to be coupled with TOML in some way (or is made better by coupling with TOML). There's many resources for API design guidance in the world and there is limited value in having one-more-place to get that advice -- I'm certainly not interested in having one associated with TOML.

All the examples provided are generic pieces of advice for API design and, as much as I agree with them, they don't belong in a TOML-specific location.


I wonder if an explicit section could be added to the TOML specification describing about why null does not exist

Not in the specification, no.

I think it's reasonable to have an FAQ-style entry about "why no null" or something, to provide guidance around it

That's fair, and I agree that we should have this -- albeit separately. I wouldn't mind having a place to put "why certain language design choices were made" information, which can cover this specifically as well as a few other things that have come up.

"Why does TOML not have NULL/None/Nil?" is a good thing to have in an FAQ, which we can discuss how to handle the use cases where people reach for null-ish values. And, one of the options is omitting keys, as has come up in the discussion already and unlike what OP is claiming.


I'm inclined to close this out, on the basis that the specification isn't going to change to gain new language around this. However, some of the things I've stated here might warrant more discussion so I'm not gonna click the button just yet. :)

@slonik-az
Copy link

Any advice on porting to TOML an existing JSON schema that extensively uses null? The library that uses this schema is expected to continue supporting JSON.

@marzer
Copy link
Contributor

marzer commented Aug 28, 2023

@slonik-az There's been a number of suggestions in this discussion thread already, as well as in other related issues (links above). All broadly boil down to one of:

  1. use a sentinel (e.g. an empty inline table, an empty string, whatever makes sense)
  2. omit the KVP entirely
  3. just keep using JSON

If the application already uses JSON, and depends on JSON-only features, it's not clear what value you'd be gaining out of porting to TOML.

@pradyunsg
Copy link
Member

OK, it looks like there isn't more discussion to be had here so closing this out. toml-lang/toml.io#70 covers adding an FAQ to the website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants