Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element Position Support #21

Merged
merged 2 commits into from
Aug 29, 2014
Merged

Element Position Support #21

merged 2 commits into from
Aug 29, 2014

Conversation

eanderton
Copy link
Contributor

So, this change may be a tad controversial. If you want, we can hash out a better way to do all this if the changeset doesn't fit with what you have in mind for go-toml.

The motivation for this change was that there is a real need to be able to trace where a given TOML table, and its associated keys, came from in the source file. With that information, the programmer can inform the user of semantic mistakes made in a given TOML file. As an experiment, I'm using this version of the feature on Grapnel: https://github.com/eanderton/grapnel/blob/master/src/grapnel/rewrite.go#L126 Here, this feature is used to inform the user of where they have made grammatical mistakes in regular expressions and templates in strings, that may be embedded deep within a configuration file.

The first thing I did was modify the parser to record position information from the token stream. This wound up being a parallel structure (PositionMap) that is stored alongside the TomlTree during parsing. After that, I needed a structure to couple those two together for the library user - this became the TomlDocument.

A quirk of the position storage scheme is that it was not possible to modify the TomlTree, nor the storage of key/value data, without invalidating the current API. As a result, it's necessary to use a key's parent TomlTree pointer in order to find the location of that key in the document. The TomlDocument also provides methods to query this position information using string paths, in order to parallel the path navigation available on the TomlTree itself.


TomlDocument provides an optional TOML processing path where position
informaiton is stored alongside a TomlTree.

  • Added Position struct
  • Added TomlDocument struct
  • Revised parser to emit position data
  • Revised LoadFile() and Load() to use revised parser
  • Revised token to use new Position struct
  • Added tests for new functionality
  • Bugfixed table array duplicate key handling
  • Applied gofmt to all code

@pelletier
Copy link
Owner

I would really love to see go-toml offer pinpointing values in the original file!

A quirk of the position storage scheme is that it was not possible to modify the TomlTree, nor the storage of key/value data, without invalidating the current API

I don't fully understand why: the TomlTree object is supposed to be accessed by its methods (Get, SetPath etc.), not directly.

I think we should modify the TomlTree object to attach the position information for each key directly, and provide additional methods, such as GetPosition or GetPositionForPath, to retrieve the position information.

What do you think?

@eanderton
Copy link
Contributor Author

Thomas, that's a great idea. But wouldn't that require the TomlTree to be changed from a map to a struct, in order to contain the necessary position data? If you're okay with changing the API around, I can get to work on doing just this.

@pelletier
Copy link
Owner

It won't change the API: all the methods inside toml.go (the API) will still have the same signatures and behaviors :)

Considering the design, I see two possibilities:

  1. have two maps inside a struct:

    type TomlTree struct {
        values    map[string]interface{}
        positions map[string]Position
    }
  2. one map with a struct wrapping the value and its position:

    type tomlValue struct {
        value    interface{}
        position Position
    }
    
    type TomlTree map[string]interface{}

Personally I prefer the second version. What do you think?

@eanderton
Copy link
Contributor Author

I like that second one too. Upon inspection, it would appear that I've been abusing the API by treating TomlTree like a map. ::oops:: Sure enough, you have Get(), Set(), and Keys() as first-class members of TomlTree itself, so this change makes complete sense.

I gather we'll have to add:

func (TomlTree* t) GetPosition(key string) Position { ... }
func (TomlTree* t) SetPosition(key string, pos Position) { ... }
func (TomlTree* t) GetPathPosition(keys []string) Position { ... }

... to do this right? Let me know if those signatures work for you.

@pelletier
Copy link
Owner

I don't really understand the last one: it takes several keys in input, so I would guess it should return several positions? What's its goal?

Also, I don't see why we would need a SetPosition. Altering a position, from the user's point of view would make little sense in my opinion.

@eanderton
Copy link
Contributor Author

Ah, I see where you're going. Sure! We can drop SetPosition() since it has limited use outside the parser. GetPathPosition() was for symmetry with GetPath(), but I can see how that's not useful considering that you probably want to have a TomlTree instance first.

That leaves GetPosition(). That makes this all nice and easy.

@pelletier
Copy link
Owner

To be honest, I am not sure I should have made GetPath public in the first place. So I don't know if we should keep the symmetry and implement GetPathPosition, or if we should ditch it and eventually deprecate GetPath. What's your opinion?

@eanderton
Copy link
Contributor Author

GetPath has some utility, but I honestly don't use it; I have no clue if anyone else does or not. I honestly think the library would be better served by a more complete implementation, akin to something like XPATH (although that's a bit involved). I've been tempted on more than one occasion to give it support to dive into the TomlTree[] portions that I added, since GetPath can't quite get you everything in the document anymore. I wouldn't miss it if it were dropped.

As for GetPathPosition, let's leave that out for now. With the other pieces in place, it would be trivial to implement (should we want it back), since it would be just a wrapper around GetPath() + GetPosition.

@pelletier
Copy link
Owner

You're right about a more powerful implementation for Get. I have been thinking about globing, but indeed we now need a more complete system.

This was referenced Aug 23, 2014
@pelletier
Copy link
Owner

@eanderton By the way, do you need me to work on something here to split the workload? I don't want to dig in without knowing what's your status on this, so as we don't duplicate effort / introduce errors. :)

@eanderton
Copy link
Contributor Author

I appreciate the desire to not duplicate effort. At the moment, I'm not sure the work is splittable just yet. After I get this feature redesigned, I think we're another design refactoring away from something that will more easily support Query support, per the other issue you added. I should be a few more evenings away from getting this thing done. I'll be in touch.

TomlDocument provides an optional TOML processing path where position
informaiton is stored alongside a TomlTree.
* Added Position struct
* Revised TomlTree to contain position data
* Added tomlValue to bind positions to values
* Revised parser to emit position data
* Revised token to use new Position struct
* Added tests for new functionality
* Bugfixed table array duplicate key handling
* Applied gofmt to all code
@eanderton
Copy link
Contributor Author

Thomas, here is the latest revision for this merge request. TomlDocument has been dropped in favor of a revised TomlTree/tomlValue pair that now contains position information. I found that using only a tomlValue struct complicated code tremendously (extra traversals through tomlValue to get to _TomlTree and []_TomlTree), and applying both the possible revisions we discussed yielded a nice solution.

Most of the changes were in the test matrix. The new layout of TomlTree had a profound impact on some things, mostly due to it no longer being cast-able to a map. Position was also changed to a 1-index instead of a 0-index design, such that an uninitialized Position is now an "invalid" value (0,0); this simplifies return values and object initialization in a few places.

There is some redundant code in toml.go, where walking the node tree is concerned. Since we're going to improve how we query the tree, I figured that this refactoring would be best handled with that goal in mind. If you don't agree, I can always give this another round to consolidate things on this changeset.

One other thing came up: there is a test in the BurntSushi validator that enforces that a key can contain a large set of special values: https://github.com/BurntSushi/toml-test/blob/master/tests/valid/key-special-chars.toml I didn't notice this earlier since the test matrix was using a basic map index (tree[key]) to validate its contents. When I replaced that with Get(), things failed; I then learned why.

~!@#$^&*()_+-`1234567890[]\|/?><.,;: = 1

This already has a profound impact on how Get() behaves: querying keys with embedded '.' chars doesn't work. One must pass the keys as an array to GetPath() as a workaround; this is what the test suite does now to validate TomlTree contents.

@pelletier
Copy link
Owner

This is great Eric!

TomlDocument has been dropped in favor of a revised TomlTree/tomlValue pair that now contains position information

Smart. I missed this possibility when I wrote about the possible solutions last time.

There is some redundant code in toml.go, where walking the node tree is concerned. Since we're going to improve how we query the tree, I figured that this refactoring would be best handled with that goal in mind.

Totally agree! We will design this more carefully and refactor later.

there is a test in the BurntSushi validator that enforces that a key can contain a large set of special values

Ah yes, this test is really pure evil. It almost made me drop the support for dotted paths in Get in the first versions, but I figured it was just twisted: most people won't have dots in their keys, and dotted paths are imho really handy. Good call on using GetPath in the test suite: that's the expected behavior and Get is only a sugarcoat.


Last thing before I merge this: would you mind removing the BSD/Copyright notices in the files headers? I don't want to have some files under a different licence than the whole project, and you are listed in the authors in the readme :)

@eanderton
Copy link
Contributor Author

Done and done. :)

pelletier added a commit that referenced this pull request Aug 29, 2014
@pelletier pelletier merged commit e118479 into pelletier:master Aug 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants