Static keys in all json #811

arlimus · 2016-07-04T09:48:51Z

Why? Self-documenting API and data fields.

I have recently received the question: "Why do we need filenames and paths in those groups field?" to which I responded "They are actually IDs, it's just paths by convenience right now." Our current output format carries 2 inherent assumptions, which users might not understand: controls and groups are actually maps/dicts that connect IDs to their items.

Changes

This MR essentially turns:

      "groups": {
        "controls/meta.rb": {
          "title": "SSH Server Configuration",
          "controls": [
            "ssh-1"
          ]
        },
        "controls/example.rb": {
...

into

      "groups": [
        {
          "id": "controls/meta.rb",
          "title": "SSH Server Configuration",
          "controls": [
            "ssh-1"
          ]
        },
        {
          "id": "controls/example.rb",
...

and also changes:

      "controls": {
        "ssh-1": {
          "title": "Allow only SSH Protocol 2",
...

into

      "controls": [
        {
          "id": "ssh-1",
          "title": "Allow only SSH Protocol 2",
...

Added bonuses

Easier interaction with multiple tools like jq and elasticsearch
Easier javascript interaction

Negative points

Controls and groups are not unique by design. (with arrays you may have multiple entries with same IDs)
Any processing that looks for control and group IDs will need more work to find the right item (it is simpler to access via map/dict). e.g. connecting inspec exec --format json-min with inspec json

Please join the discussion below

chris-rock · 2016-07-04T11:35:00Z

Thanks @arlimus for bringing up that question. I try to self-answer some questions and feedback is welcome to get the appropriate output.

What is the purpose of the json format?

It is a data-exchange format

Then the next questions comes into my mind: What are the properties of a data-exchange format?

easy to consume eg. easy to parse file
data-integrity

Form file parsing perspective, I see the point that it is easier to iterate over an array of elements, on the other hand the access of single elements is way more difficult. Therefore the pros and cons are on the same level here.

Especially data-integrity is difficult to achieve (see https://en.wikipedia.org/wiki/Data_integrity). By using arrays instead of unique keys, users could manipulate the content and add multiple entries with the same id (had a great discussion with @arlimus about that topic). Now the application needs to handle all edge cases, therefore extra logic is required on top of the json parser. Data-integrity is a core property that we added in the first place, to ensure the data is not corrupted. This allows us to reduce the code that ensures the data is correct and leads to simpler and more secure application software. By relying on this safety mechanism of a json parser, its easy to parse the format in all languages. Now only sha sums are required to ensure the data has not been tampered with. Using arrays, would lead to more insecure application code.

Therefore I vote against the change.

alexpop · 2016-07-05T12:39:00Z

@arlimus, profiles would also need to be converted to an array if we are to make this change

arlimus · 2016-07-05T13:03:26Z

Great catch Alex, thank you!!

mhedgpeth · 2016-07-05T13:13:56Z

I would vote for data integrity over reportability. I think you keep the core data model as correct as possible and then transform it into easier to read formats for other tools as needed. I agree with @chris-rock against the change.

arlimus · 2016-07-07T00:18:42Z

Let me grab that data-integrity argument for a sec: I don't believe the Array vs Hash discussion helps us in a significant way to improve or deteriorate data integrity. It is indeed true that additional entries can be added to arrays, that essentially lead to duplicate IDs in controls; However this is only a tiny piece of the landscape of integrity. For example, taking the argument that attackers might manipulate via ID's in arrays vs hashes: the scenario would permit manipulation via IDs either way, which could lead to manipulated entries in the Hash, just as much as in the Array (with the only difference being that the array protects against duplication).

If we want to protect data integrity for profiles and results (and we do, this is planned in the future) we should tackle it with measures that address it (e.g. signatures with integrity information).

mhedgpeth · 2016-07-07T13:55:55Z

@arlimus I wasn't thinking in terms of attack protection but in terms of ensuring that you have uniqueness of keys, so data integrity of the keys and no duplication. I think you're right that the signing and protection-oriented features are outside of the scope of this discussion.

(1) The field is not yet optimal, the calculations are great! (2) Changing this field should go together with all other breaking json changes, especially if #811 results in a change.

chris-rock · 2016-08-08T10:49:36Z

In order to find a solution for the question, we decided to treat our CLI JSON results similar to API endpoints. In order to increase the quality and stability, we played with the latest version of JSON Schema v4. It is very difficult to verify hash objects, but easy to verify array objects.

In addition to the schema verification, dealing with arrays is easier in Javascript and jq.

Therefore I change my opinion from Hash to Array.

arlimus · 2016-09-14T14:05:57Z

Decision

Decision for 1.0 JSON changes regarding this discussion:

{
  "version": "0.34.1",
  "profiles": {
    "profile": {
      "controls": {
        "ssh-1": {
~
      "groups": {
        "controls/meta.rb": {
~
        },
~

changes to:

{
  "version": "0.34.1",
  "profiles": [
    {
      "name": "profile",  // only name here
~
      "controls": [
        {
          "id": "control-xyz",
~
      "groups": [
        {
          "id": "controls/meta.rb",
~
        },
~

i.e. all hash-of-hashes change to array-of-hashes

chris-rock · 2016-09-19T14:57:45Z

Thanks to all, this was a great discussion. We decided to switch to an array based-solution, because it makes handling in various languages a lot easier.

💯 @arlimus

arlimus force-pushed the dr/json-arrays branch from e55cedd to b6f7a29 Compare July 4, 2016 10:14

arlimus changed the title ~~static keys in all json~~ DISCUSS static keys in all json Jul 4, 2016

arlimus force-pushed the dr/json-arrays branch from b6f7a29 to 2a8240a Compare July 4, 2016 10:59

arlimus force-pushed the dr/json-arrays branch from 2a8240a to a1c1a7b Compare July 4, 2016 13:13

chris-rock added this to the 1.0.0 milestone Aug 8, 2016

chris-rock mentioned this pull request Aug 8, 2016

Add JSON Schema validation #884

Closed

chris-rock modified the milestones: 1.0.0, 0.31.0 Aug 8, 2016

arlimus mentioned this pull request Aug 8, 2016

revert control_summary field in output #887

Merged

chris-rock modified the milestones: 0.31.0, 0.32.0 Aug 18, 2016

arlimus added the Type: RFC Community survey for a proposal label Aug 22, 2016

arlimus modified the milestones: 1.0.0, 0.32.0 Aug 22, 2016

arlimus modified the milestones: 0.35.0, 1.0.0 Sep 14, 2016

arlimus added the ready label Sep 15, 2016

arlimus self-assigned this Sep 15, 2016

arlimus force-pushed the dr/json-arrays branch 2 times, most recently from 7946a1e to 8e06e8d Compare September 16, 2016 23:20

arlimus changed the title ~~DISCUSS static keys in all json~~ Static keys in all json Sep 17, 2016

chris-rock modified the milestones: 0.35.0, 0.36.0 Sep 19, 2016

static keys in all json

38f2680

arlimus force-pushed the dr/json-arrays branch 3 times, most recently from c34d4ad to 83d4aac Compare September 19, 2016 11:16

adopt new json formatting

6792550

arlimus force-pushed the dr/json-arrays branch from 83d4aac to 6792550 Compare September 19, 2016 11:45

arlimus added in progress and removed ready labels Sep 19, 2016

chris-rock merged commit 8564e2d into master Sep 19, 2016

chris-rock deleted the dr/json-arrays branch September 19, 2016 14:57

chris-rock modified the milestones: 0.36.0, 1.0.0 Sep 21, 2016

arlimus mentioned this pull request Sep 21, 2016

Update to InSpec 1.0 chef-boneyard/audit#98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static keys in all json #811

Static keys in all json #811

arlimus commented Jul 4, 2016 •

edited

Loading

chris-rock commented Jul 4, 2016

alexpop commented Jul 5, 2016

arlimus commented Jul 5, 2016

mhedgpeth commented Jul 5, 2016

arlimus commented Jul 7, 2016

mhedgpeth commented Jul 7, 2016

chris-rock commented Aug 8, 2016

arlimus commented Sep 14, 2016 •

edited

Loading

chris-rock commented Sep 19, 2016

Static keys in all json #811

Static keys in all json #811

Conversation

arlimus commented Jul 4, 2016 • edited Loading

chris-rock commented Jul 4, 2016

alexpop commented Jul 5, 2016

arlimus commented Jul 5, 2016

mhedgpeth commented Jul 5, 2016

arlimus commented Jul 7, 2016

mhedgpeth commented Jul 7, 2016

chris-rock commented Aug 8, 2016

arlimus commented Sep 14, 2016 • edited Loading

Decision

chris-rock commented Sep 19, 2016

arlimus commented Jul 4, 2016 •

edited

Loading

arlimus commented Sep 14, 2016 •

edited

Loading