Support for observational error measurements in data #281

steko · 2016-08-15T19:04:05Z

Hey all, based on a discussion with @danfowler I'm submitting this proposal to add support for observational error measurements in data, a rather common occurrence in scientific datasets. I can't draft a full spec at the moment but I hope others will chime in with comments from their specific experience. Examples below are archaeology-based.

While the idea came out in the context of data packages, it seems JSON table schema is the area where this kind of support should be added.

Examples

Radiocarbon dates

As can be seen in the Mediterranean Radiocarbon dates dataset (one of the largest open datasets of this kind), radiocarbon dates need to be expressed at least by the conventional radiocarbon age and the error. While it's common to write 3340 ± 45 in text, datasets usually record the two separately. However, the radiocarbon age has no meaning without the attached error.

Neutron activation analysis

Compositional data from INAA (Neutron Activation Analysis) are expressed as parts per million with an attached measurement error as can be seen in the Chemical Composition by Neutron Activation Analysis (INAA) of Neo-Assyrian Palace Ware dataset (a rather common case). In this case, measurement and error are recorded in a single column, separated by ±.

Existing implicit conventions

Separate columns

id, data, error
0, 34, 0.2

Single column

id, data
0, 34 ± 0.2

Proposed approach

Add a field descriptor in the JSON schema to explicitly mark the values in one field as linked to another field, e.g.:

{
    "fields": [
      {
        "name": "measurement",
        "title": "The numeric value",
        "type": "number"
      },
      {
        "name": "error",
        "title": "The error attached to the numeric value",
        "type": "number",
        "errorOf": "measurement"
      }
    ]
}

An alternate approach:

{
    "fields": [
      {
        "name": "measurement",
        "title": "The numeric value",
        "type": "number",
        "errorField": "error"
      },
      {
        "name": "error",
        "title": "The error attached to the numeric value",
        "type": "number"
      }
    ]
}

This is just a basic description of the issue to get the discussion started, with no presumption of formal correctness nor exhaustive coverage of the various issues in other disciplines.

The text was updated successfully, but these errors were encountered:

djvanderlaan · 2016-08-17T14:43:43Z

I am working mainly with statistical output tables (unemployment figures an such) where we sometimes also have the uncertainty. However, most often this is specified using a lower and upper bound of the confidence interval. We currently code this in the variable names (e.g. "measurement_lb" and "measurement_ub") and it has been on our todo list for a while to encode this in the meta data. So +1.

However, I think we need more than errorOf. A mentioned above we often have a lower and upper bound. What also is used are relative errors (%). The most flexible way would be to be able to specify arbitrary relations between columns. Perhaps something in the line of:

{
    "fields": [
      {
        "name": "measurement",
        "title": "The numeric value",
        "type": "number",
      },
      {
        "name": "error",
        "title": "The error attached to the numeric value",
        "type": "number",
        "relation" : { "type": "errorOf", "column": "measurement"}
      }
    ]
}

This will also allow people to specify custom relations. Although a list of suggested/default supported relations would be nice.

rufuspollock · 2016-08-22T06:42:16Z

@steko @djvanderlaan i think this is a perfect candidate for a "pattern" proposal. A pattern is something that would offer a suggestion of how to solve a particular problem - in this case linking error information to main measurement - without being a formal spec.

roll added Table Schema backlog labels Aug 16, 2016

danfowler mentioned this issue Aug 25, 2016

Add section and system for user-contributed "patterns" (part of /guides/) frictionlessdata/website-old#78

Closed

roll removed the backlog label Aug 29, 2016

pwalsh modified the milestone: v1.1 Feb 5, 2017

Stephen-Gates mentioned this issue Feb 2, 2018

Data Summary and Quality description spec #364

Closed

roll removed this from the v1.1 milestone Apr 14, 2023

roll added Patterns and removed Table Schema labels Jan 3, 2024

roll added Table Schema and removed Recipes labels Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for observational error measurements in data #281

Support for observational error measurements in data #281

steko commented Aug 15, 2016

djvanderlaan commented Aug 17, 2016

rufuspollock commented Aug 22, 2016

Support for observational error measurements in data #281

Support for observational error measurements in data #281

Comments

steko commented Aug 15, 2016

Examples

Radiocarbon dates

Neutron activation analysis

Existing implicit conventions

Separate columns

Single column

Proposed approach

djvanderlaan commented Aug 17, 2016

rufuspollock commented Aug 22, 2016