Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested improvement to the JSON representation of replica timestamps #146

Open
colin-nolan opened this issue Feb 23, 2016 · 2 comments
Open

Comments

@colin-nolan
Copy link
Contributor

The way in which baton currently represents replica timestamps is rather unwieldy.

It would be better if replica created and modified times were properties of the JSON object representing the replica, i.e.:

{
  "replicates": [
    {
      "number": 0,
      ...,
      "created": "2016-02-22T14:25:26",
      "modified": "2016-02-22T14:25:26"
    },
    {
      "number": 1,
      ...,
      "created": "2016-02-22T14:35:25",
      "modified": "2016-02-22T14:35:25"
    }
  ],
  "collection": "/iplant/home/rods",
  "data_object": "log"
}

Instead they are combined together and then linked to the replica in a property of the data object:

{
  "replicates": [
    {
      "number": 0,
      ...
    },
    {
      "number": 1,
      ...
    }
  ],
  "collection": "/iplant/home/rods",
  "data_object": "log",
  "timestamps": [
    {
      "created": "2016-02-22T14:25:26",
      "replicates": 0
    },
    {
      "modified": "2016-02-22T14:25:26",
      "replicates": 0
    },
    {
      "created": "2016-02-22T14:35:25",
      "replicates": 1
    },
    {
      "modified": "2016-02-22T14:35:25",
      "replicates": 1
    }
  ]
}

The current implementation strangely couples the JSON representation of a replica with that of a data object; properties of replica exist outside of the replica object. Having all timestamps in one array also makes it more difficult than it should be to get timestamps related to a particular replica.

@keithj
Copy link
Contributor

keithj commented Feb 23, 2016

The current structure for timestamps arose from the design goal of representing iRODS' data object abstraction identically in both queries and query results. My expectation was that users should not care about replicates or their individual timestamps, checksums, locations etc. A timestamp would therefore appear to be a property of a data object directly, likewise a checksum. Yes, each replicate has those, but they would be unified and seamlessly presented as data object properties.

It became clear as time went on that the data object abstraction in iRODS is rather leaky. Users needed to know about replicates, their sizes, checksums, locations and timestamps because the system was not managing these things transparently. That's why the replicate reporting is an addendum.

Using the same JSON representation for queries and query results is a primary design goal. It enables the output of one baton program to be the input of another, which is something we use a lot. I'm wary of moving the timestamps from the containing object deeper into the structure because that makes writing timestamp queries more difficult as they would need to use the same JSON structure.

I'm in favour of the idea of streamlining the JSON. However, we must think about preserving the unified query/result representation and backwards compatibility.

@keithj
Copy link
Contributor

keithj commented Feb 24, 2016

I think that the timestamps can be added safely to replicates as you suggest. The timestamps property will have to remain for the purpose of metadata queries and backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants