Element Position Support #21

eanderton · 2014-08-23T00:12:01Z

So, this change may be a tad controversial. If you want, we can hash out a better way to do all this if the changeset doesn't fit with what you have in mind for go-toml.

The motivation for this change was that there is a real need to be able to trace where a given TOML table, and its associated keys, came from in the source file. With that information, the programmer can inform the user of semantic mistakes made in a given TOML file. As an experiment, I'm using this version of the feature on Grapnel: https://github.com/eanderton/grapnel/blob/master/src/grapnel/rewrite.go#L126 Here, this feature is used to inform the user of where they have made grammatical mistakes in regular expressions and templates in strings, that may be embedded deep within a configuration file.

The first thing I did was modify the parser to record position information from the token stream. This wound up being a parallel structure (PositionMap) that is stored alongside the TomlTree during parsing. After that, I needed a structure to couple those two together for the library user - this became the TomlDocument.

A quirk of the position storage scheme is that it was not possible to modify the TomlTree, nor the storage of key/value data, without invalidating the current API. As a result, it's necessary to use a key's parent TomlTree pointer in order to find the location of that key in the document. The TomlDocument also provides methods to query this position information using string paths, in order to parallel the path navigation available on the TomlTree itself.

TomlDocument provides an optional TOML processing path where position
informaiton is stored alongside a TomlTree.

Added Position struct
Added TomlDocument struct
Revised parser to emit position data
Revised LoadFile() and Load() to use revised parser
Revised token to use new Position struct
Added tests for new functionality
Bugfixed table array duplicate key handling
Applied gofmt to all code

pelletier · 2014-08-23T13:19:59Z

I would really love to see go-toml offer pinpointing values in the original file!

A quirk of the position storage scheme is that it was not possible to modify the TomlTree, nor the storage of key/value data, without invalidating the current API

I don't fully understand why: the TomlTree object is supposed to be accessed by its methods (Get, SetPath etc.), not directly.

I think we should modify the TomlTree object to attach the position information for each key directly, and provide additional methods, such as GetPosition or GetPositionForPath, to retrieve the position information.

What do you think?

eanderton · 2014-08-23T13:59:36Z

Thomas, that's a great idea. But wouldn't that require the TomlTree to be changed from a map to a struct, in order to contain the necessary position data? If you're okay with changing the API around, I can get to work on doing just this.

pelletier · 2014-08-23T15:38:07Z

It won't change the API: all the methods inside toml.go (the API) will still have the same signatures and behaviors :)

Considering the design, I see two possibilities:

have two maps inside a struct:

type TomlTree struct {
    values    map[string]interface{}
    positions map[string]Position
}

one map with a struct wrapping the value and its position:

type tomlValue struct {
    value    interface{}
    position Position
}

type TomlTree map[string]interface{}

Personally I prefer the second version. What do you think?

eanderton · 2014-08-23T16:24:16Z

I like that second one too. Upon inspection, it would appear that I've been abusing the API by treating TomlTree like a map. ::oops:: Sure enough, you have Get(), Set(), and Keys() as first-class members of TomlTree itself, so this change makes complete sense.

I gather we'll have to add:

func (TomlTree* t) GetPosition(key string) Position { ... }
func (TomlTree* t) SetPosition(key string, pos Position) { ... }
func (TomlTree* t) GetPathPosition(keys []string) Position { ... }

... to do this right? Let me know if those signatures work for you.

pelletier · 2014-08-23T16:56:35Z

I don't really understand the last one: it takes several keys in input, so I would guess it should return several positions? What's its goal?

Also, I don't see why we would need a SetPosition. Altering a position, from the user's point of view would make little sense in my opinion.

eanderton · 2014-08-23T17:08:12Z

Ah, I see where you're going. Sure! We can drop SetPosition() since it has limited use outside the parser. GetPathPosition() was for symmetry with GetPath(), but I can see how that's not useful considering that you probably want to have a TomlTree instance first.

That leaves GetPosition(). That makes this all nice and easy.

pelletier · 2014-08-23T17:27:39Z

To be honest, I am not sure I should have made GetPath public in the first place. So I don't know if we should keep the symmetry and implement GetPathPosition, or if we should ditch it and eventually deprecate GetPath. What's your opinion?

eanderton · 2014-08-23T17:32:10Z

GetPath has some utility, but I honestly don't use it; I have no clue if anyone else does or not. I honestly think the library would be better served by a more complete implementation, akin to something like XPATH (although that's a bit involved). I've been tempted on more than one occasion to give it support to dive into the TomlTree[] portions that I added, since GetPath can't quite get you everything in the document anymore. I wouldn't miss it if it were dropped.

As for GetPathPosition, let's leave that out for now. With the other pieces in place, it would be trivial to implement (should we want it back), since it would be just a wrapper around GetPath() + GetPosition.

pelletier · 2014-08-23T17:40:24Z

You're right about a more powerful implementation for Get. I have been thinking about globing, but indeed we now need a more complete system.

pelletier · 2014-08-25T15:38:31Z

@eanderton By the way, do you need me to work on something here to split the workload? I don't want to dig in without knowing what's your status on this, so as we don't duplicate effort / introduce errors. :)

eanderton · 2014-08-25T20:57:05Z

I appreciate the desire to not duplicate effort. At the moment, I'm not sure the work is splittable just yet. After I get this feature redesigned, I think we're another design refactoring away from something that will more easily support Query support, per the other issue you added. I should be a few more evenings away from getting this thing done. I'll be in touch.

TomlDocument provides an optional TOML processing path where position informaiton is stored alongside a TomlTree. * Added Position struct * Revised TomlTree to contain position data * Added tomlValue to bind positions to values * Revised parser to emit position data * Revised token to use new Position struct * Added tests for new functionality * Bugfixed table array duplicate key handling * Applied gofmt to all code

eanderton · 2014-08-27T01:24:41Z

Thomas, here is the latest revision for this merge request. TomlDocument has been dropped in favor of a revised TomlTree/tomlValue pair that now contains position information. I found that using only a tomlValue struct complicated code tremendously (extra traversals through tomlValue to get to _TomlTree and []_TomlTree), and applying both the possible revisions we discussed yielded a nice solution.

Most of the changes were in the test matrix. The new layout of TomlTree had a profound impact on some things, mostly due to it no longer being cast-able to a map. Position was also changed to a 1-index instead of a 0-index design, such that an uninitialized Position is now an "invalid" value (0,0); this simplifies return values and object initialization in a few places.

There is some redundant code in toml.go, where walking the node tree is concerned. Since we're going to improve how we query the tree, I figured that this refactoring would be best handled with that goal in mind. If you don't agree, I can always give this another round to consolidate things on this changeset.

One other thing came up: there is a test in the BurntSushi validator that enforces that a key can contain a large set of special values: https://github.com/BurntSushi/toml-test/blob/master/tests/valid/key-special-chars.toml I didn't notice this earlier since the test matrix was using a basic map index (tree[key]) to validate its contents. When I replaced that with Get(), things failed; I then learned why.

~!@#$^&*()_+-`1234567890[]\|/?><.,;: = 1

This already has a profound impact on how Get() behaves: querying keys with embedded '.' chars doesn't work. One must pass the keys as an array to GetPath() as a workaround; this is what the test suite does now to validate TomlTree contents.

pelletier · 2014-08-27T07:24:21Z

This is great Eric!

TomlDocument has been dropped in favor of a revised TomlTree/tomlValue pair that now contains position information

Smart. I missed this possibility when I wrote about the possible solutions last time.

There is some redundant code in toml.go, where walking the node tree is concerned. Since we're going to improve how we query the tree, I figured that this refactoring would be best handled with that goal in mind.

Totally agree! We will design this more carefully and refactor later.

there is a test in the BurntSushi validator that enforces that a key can contain a large set of special values

Ah yes, this test is really pure evil. It almost made me drop the support for dotted paths in Get in the first versions, but I figured it was just twisted: most people won't have dots in their keys, and dotted paths are imho really handy. Good call on using GetPath in the test suite: that's the expected behavior and Get is only a sugarcoat.

Last thing before I merge this: would you mind removing the BSD/Copyright notices in the files headers? I don't want to have some files under a different licence than the whole project, and you are listed in the authors in the readme :)

eanderton · 2014-08-28T01:18:29Z

Done and done. :)

Element Position Support

This was referenced Aug 23, 2014

Powerful querying interface #23

Merged

Improve code style #22

Closed

eanderton force-pushed the master branch from 8773cb9 to fdb3724 Compare August 27, 2014 00:58

eanderton force-pushed the master branch from fdb3724 to 7c63fff Compare August 27, 2014 01:01

Removed redudant license info

c0c5d65

pelletier added a commit that referenced this pull request Aug 29, 2014

Merge pull request #21 from eanderton/master

e118479

Element Position Support

pelletier merged commit e118479 into pelletier:master Aug 29, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Element Position Support #21

Element Position Support #21

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

pelletier commented Aug 25, 2014

eanderton commented Aug 25, 2014

eanderton commented Aug 27, 2014

pelletier commented Aug 27, 2014

eanderton commented Aug 28, 2014

Element Position Support #21

Element Position Support #21

Conversation

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

eanderton commented Aug 23, 2014

pelletier commented Aug 23, 2014

pelletier commented Aug 25, 2014

eanderton commented Aug 25, 2014

eanderton commented Aug 27, 2014

pelletier commented Aug 27, 2014

eanderton commented Aug 28, 2014