-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static keys in all json #811
Conversation
Thanks @arlimus for bringing up that question. I try to self-answer some questions and feedback is welcome to get the appropriate output. What is the purpose of the json format? It is a data-exchange format Then the next questions comes into my mind: What are the properties of a data-exchange format?
Form file parsing perspective, I see the point that it is easier to iterate over an array of elements, on the other hand the access of single elements is way more difficult. Therefore the pros and cons are on the same level here. Especially data-integrity is difficult to achieve (see https://en.wikipedia.org/wiki/Data_integrity). By using arrays instead of unique keys, users could manipulate the content and add multiple entries with the same id (had a great discussion with @arlimus about that topic). Now the application needs to handle all edge cases, therefore extra logic is required on top of the json parser. Data-integrity is a core property that we added in the first place, to ensure the data is not corrupted. This allows us to reduce the code that ensures the data is correct and leads to simpler and more secure application software. By relying on this safety mechanism of a json parser, its easy to parse the format in all languages. Now only sha sums are required to ensure the data has not been tampered with. Using arrays, would lead to more insecure application code. Therefore I vote against the change. |
@arlimus, |
Great catch Alex, thank you!! |
I would vote for data integrity over reportability. I think you keep the core data model as correct as possible and then transform it into easier to read formats for other tools as needed. I agree with @chris-rock against the change. |
Let me grab that data-integrity argument for a sec: I don't believe the Array vs Hash discussion helps us in a significant way to improve or deteriorate data integrity. It is indeed true that additional entries can be added to arrays, that essentially lead to duplicate IDs in controls; However this is only a tiny piece of the landscape of integrity. For example, taking the argument that attackers might manipulate via ID's in arrays vs hashes: the scenario would permit manipulation via IDs either way, which could lead to manipulated entries in the Hash, just as much as in the Array (with the only difference being that the array protects against duplication). If we want to protect data integrity for profiles and results (and we do, this is planned in the future) we should tackle it with measures that address it (e.g. signatures with integrity information). |
@arlimus I wasn't thinking in terms of attack protection but in terms of ensuring that you have uniqueness of keys, so data integrity of the keys and no duplication. I think you're right that the signing and protection-oriented features are outside of the scope of this discussion. |
(1) The field is not yet optimal, the calculations are great! (2) Changing this field should go together with all other breaking json changes, especially if #811 results in a change.
In order to find a solution for the question, we decided to treat our CLI JSON results similar to API endpoints. In order to increase the quality and stability, we played with the latest version of JSON Schema v4. It is very difficult to verify hash objects, but easy to verify array objects. In addition to the schema verification, dealing with arrays is easier in Javascript and jq. Therefore I change my opinion from Hash to Array. |
DecisionDecision for 1.0 JSON changes regarding this discussion: {
"version": "0.34.1",
"profiles": {
"profile": {
"controls": {
"ssh-1": {
~
"groups": {
"controls/meta.rb": {
~
},
~ changes to: {
"version": "0.34.1",
"profiles": [
{
"name": "profile", // only name here
~
"controls": [
{
"id": "control-xyz",
~
"groups": [
{
"id": "controls/meta.rb",
~
},
~ i.e. all hash-of-hashes change to array-of-hashes |
7946a1e
to
8e06e8d
Compare
c34d4ad
to
83d4aac
Compare
83d4aac
to
6792550
Compare
Thanks to all, this was a great discussion. We decided to switch to an array based-solution, because it makes handling in various languages a lot easier. 💯 @arlimus |
Why? Self-documenting API and data fields.
I have recently received the question: "Why do we need filenames and paths in those groups field?" to which I responded "They are actually IDs, it's just paths by convenience right now." Our current output format carries 2 inherent assumptions, which users might not understand: controls and groups are actually maps/dicts that connect IDs to their items.
Changes
This MR essentially turns:
into
and also changes:
into
Added bonuses
jq
andelasticsearch
Negative points
inspec exec --format json-min
withinspec json
Please join the discussion below