Skip to content
This repository has been archived by the owner on Oct 15, 2024. It is now read-only.

TOML spacing #3490

Closed
4 of 6 tasks
markus2330 opened this issue Sep 16, 2020 · 17 comments
Closed
4 of 6 tasks

TOML spacing #3490

markus2330 opened this issue Sep 16, 2020 · 17 comments

Comments

@markus2330
Copy link
Contributor

markus2330 commented Sep 16, 2020

I played around with the new TOML plugin and must say it works beyond expectations, it is definitely superior to ini. The only things I noticed so far are some texts in error messages:

  • we should not expect people to know what rhs or lhs is

and spacing:

  • a practical problem is that some INI parsers (in particular everything related to shells) do not accept spaces around =, see also INI: support not having spaces around = #2065. Ideally, the parser would remember what spaces are around = and restore it as it was.
  • a cosmetic problem is that spaces before the key name get removed. Probably nearly nobody would do this, if the example on https://toml.io/en/ would not promote this usage. But as it is, people quickly find out, e.g. Beginners Questions #3472. Also here the ideal solution would be the restore it as it was.

String quoting seems to be a bit buggy (You will probably catch these things with kdb test):

  • path = 'C:\Users\nodejs\templates'in a TOML file gets after roundtrip rewritten to path = "C:\Users\nodejs\templates" which leads to a parse error.

  • Some rare features like 1e06numbers seem to be not supported. (The parser currently thinks it is a bare string.) Imho rather low priority.

Something else:

  • Keys with empty names seem to be currently not supported. In Elektra you can represent them by giving them an empty base name. In TOML you can represent them with
"" = "blank"     # VALID but discouraged
'' = 'blank'     # VALID but discouraged

(Btw. I think this example within the TOML spec is broken as another example says that multiple keys with the same name is not allowed.)

I do not think that any of these problems is urgent enough that it must be fixed for the next release. If I could choose something, I think the error messages are the most important thing. It might be good, however, that the current gaps to the TOML standard are documented in the README.md (and which version it should be compliant to.)

So in general I am very positive about this plugin, it greatly improves Elektra 💖 I'll remove the ini plugin, as I am very positive that soon the toml plugin will cover essentially every use case the ini plugin fulfilled but unlike the ini plugin, it will be properly maintained.

@markus2330 markus2330 changed the title TOML spacing TOML next steps Sep 16, 2020
@bauhaus93
Copy link
Contributor

I played around with the new TOML plugin and must say it works beyond expectations, it is definitely superior to ini. The only things I noticed so far are some texts in error messages

Thanks, I'm glad it works so well! I will look through all error messages and make them more clear.

a practical problem is that some INI parsers (in particular everything related to shells) do not accept spaces around =, see also #2065. Ideally, the parser would remember what spaces are around = and restore it as it was.
a cosmetic problem is that spaces before the key name get removed. Probably nearly nobody would do this, if the example on https://toml.io/en/ would not promote this usage. But as it is, people quickly find out, e.g. #3472. Also here the ideal solution would be the restore it as it was.

I can look into preserving the correct amount of spaces for those. I think I've already an idea on how to solve this without inflating the bison grammar too much/at all (which was my reason for not doing it in the first place).

String quoting seems to be a bit buggy (You will probably catch these things with kdb test)

Yes, strings are currently only written as basic strings. I'll look into this issue in the default storage PR #3478, since the unexpected escaping is causing problems there also.
For solving this, I currently consider 2 possible solutions:

  • Introduce a metakey which hints that the string contains escape sequences. Only write a basic string, when the metakey is present (making literal strings the default for writing). The metakey would be set by the plugin on the reading of basic strings or manually by the user.
  • On writing, check what the "best" type of string would be. Eg. check if the string contains any ' or " characters. Additionally, check if the string contains only valid escape sequences (or none at all). Then, with this parameters, derive the most appropriate type of string. The problem here would be, that if a string contains only valid escape sequences by accident, it may falsely be seen as a basic string (eg. a file path of C:\tmp\name).

Some rare features like 1e06numbers seem to be not supported. (The parser currently thinks it is a bare string.) Imho rather low priority.

I'll look into this issue as well, this should already have been supported.

Keys with empty names seem to be currently not supported. In Elektra you can represent them by giving them an empty base name. In TOML you can represent them with

I'll check this out too. I don't think this case is currently handled at all.

I will look through all this issues, prioritizing the error messages and mentioning the current gaps to the TOML standard in the readme.

@bauhaus93 bauhaus93 mentioned this issue Sep 23, 2020
27 tasks
@markus2330
Copy link
Contributor Author

@bauhaus93 can you update the top-post? Which of these issues are now resolved via the PR that was merged?

@bauhaus93
Copy link
Contributor

@markus2330 I've updated the top post

@markus2330
Copy link
Contributor Author

@kodebach is the point " path = 'C:\Users\nodejs\templates'in a TOML file gets after roundtrip rewritten to path = "C:\Users\nodejs\templates" which leads to a parse error." already fixed?

@kodebach
Copy link
Member

kodebach commented Aug 3, 2021

It should be fixed in #3935. I'll try to finish the PR this week.

The code is already there, if you wanna test it. Documentation and tests are still missing, and the handling of comments will change again.

We do also validate all string values before writing, so that we never create an invalid TOML file. In the case mentioned here, you will get the literal string 'C:\Users\nodejs\templates'. If you remove the tomltype metakey that is used to remember the kind of string, you will get a correctly escaped basic string "C:\\Users\\nodejs\\templates".

Without a tomltype we default to basic strings (or multiline basic strings, if there are 2 or more \n), because those can be used for any valid UTF-8 string. If the string is not UTF-8, we will replace invalid sections with the Replacement Character (U+FFFD) and emit a warning.

If you wanna know more take a look at the code or wait for me to finish the docs.

@stale
Copy link

stale bot commented Aug 11, 2022

I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@stale stale bot added the stale label Aug 11, 2022
@markus2330
Copy link
Contributor Author

@kodebach which issues here are still open? I remember you fixed many problems with quoting?

@stale stale bot removed the stale label Aug 12, 2022
@kodebach
Copy link
Member

kodebach commented Aug 12, 2022

I'd have to check again for the full details, but:

@markus2330
Copy link
Contributor Author

I'd have to check again for the full details

Thank you! Is it maybe possible to work on these issues for FLOSS or is the whole lexer/parser too complicated for a new-comer?

it seems we support some positions for comments

I think this is okay.

@kodebach
Copy link
Member

I think if someone has prior experience with flex/bison parsing (not unreasonable, considering there are other courses for that), then working on this should be possible. The code is pretty readable IMO and on the TOML side there is a good spec and many examples to show what is allowed and what isn't. There are even some GitHub repos with lots of TOML files for testing parsers.

@markus2330
Copy link
Contributor Author

@kodebach this issue is for everything that is not #2330 (further features etc.)

@kodebach
Copy link
Member

#2330 is closed, did you mean #3910 or maybe #4466?

@markus2330 markus2330 changed the title TOML next steps TOML spacing Sep 28, 2022
@markus2330
Copy link
Contributor Author

I mean #3910.

So actually only the spacing issues are left-over here. I don't think this is crucial for having toml as default storage.

@flo91 flo91 added the lang/c label Oct 5, 2022
@flo91
Copy link
Collaborator

flo91 commented Oct 5, 2022

@markus2330 Maybe an issue for the FLOSS course?

@markus2330
Copy link
Contributor Author

It is a bit difficult to implement and has much lower priority than #3910.

Copy link

github-actions bot commented Feb 3, 2024

I mark this stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping by writing a message here or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@github-actions github-actions bot added the stale label Feb 3, 2024
Copy link

I closed this now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants