modules: Add JSON output for db.univar and v.db.univar #2386

wenzeslaus · 2022-05-19T21:31:27Z

This adds JSON output for db.univar and v.db.univar.

v.db.univar map=roadsmajor column=SHAPE_LEN format=json -e percentile=80,90,95,99 | jq

{
  "statistics": {
    "n": 355,
    "min": 20.359027,
    "max": 64177.255429,
    "range": 64156.896402,
    "mean": 4934.153557109861,
    "mean_abs": 4934.153557109861,
    "variance": 38328715.30867731,
    "stddev": 6191.0189233015035,
    "coeff_var": 1.254727655238975,
    "sum": 1751624.5127740006,
    "first_quartile": 761.180256,
    "median": 1601.228177,
    "third_quartile": 9527.487778,
    "percentiles": [
      {
        "percentile": 80,
        "value": 11737.529039
      },
      {
        "percentile": 90,
        "value": 13883.001283
      },
      {
        "percentile": 95,
        "value": 14711.484257
      },
      {
        "percentile": 99,
        "value": 14943.396283
      }
    ]
  }
}

The implementation introduces some more duplication (in addition to the existing duplicated code), but given that the computations are directly implemented in Python without any libraries and that the implemented method, even if implemented correctly, may need to be changed to a more standard one, I'm going with bad, but easier to write, code which brings less changes to the current code.

This also adds test, but the correctness checks against NumPy are limited, esp. due to different definitions of quartiles and percentiles.

See also #2108.

Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original. The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.

wenzeslaus · 2022-06-01T14:31:45Z

This PR adds JSON output to v.db.univar. Please, let me know what you think about the output structure.

The question is how to represent the user-provided percentiles which are a list. Do we repeat them in output? As a list of mappings (as in the description) or as list of values (as below)? What about the names (percentile versus percentile value)?

{
  "statistics": {
    "first_quartile": 761.180256,
    "median": 1601.228177,
    "third_quartile": 9527.487778,
    "percentiles_points": [
      80,
      90,
      95,
      99
    ],
    "percentiles_values": [
      11737.529039,
      13883.001283,
      14711.484257,
      14943.396283
    ]
  }
}

Created with:

diff --git a/scripts/db.univar/db.univar.py b/scripts/db.univar/db.univar.py
index d96f168700..278e83deb7 100755
--- a/scripts/db.univar/db.univar.py
+++ b/scripts/db.univar/db.univar.py
@@ -362,10 +362,13 @@ def main():
         result["median"] = q50
         result["third_quartile"] = q75
         if options["percentile"]:
-            percentiles = []
+            percentiles_points = []
+            percentiles_values = []
             for i, one_percentile in enumerate(perc):
-                percentiles.append({"percentile": one_percentile, "value": pval[i]})
-        result["percentiles"] = percentiles
+                percentiles_points.append(one_percentile)
+                percentiles_values.append(pval[i])
+        result["percentiles_points"] = percentiles_points
+        result["percentiles_values"] = percentiles_values
         json.dump({"statistics": result}, sys.stdout)
     else:
         sys.stdout.write("first_quartile=%.15g\n" % q25)

wenzeslaus · 2022-06-07T21:05:53Z

The current code produces percentiles which repeats the percentile parameter and percentile_values which are the actual values:

grass8 ~/grassdata/nc_spm_08_grass7/user1/ --exec v.db.univar precip_30ynormals column=annual format=json -e percentile=80,90,95,99 | jq

{
  "statistics": {
    "n": 136,
    "min": 947.42,
    "max": 2329.18,
    "range": 1381.7599999999998,
    "mean": 1289.311470588235,
    "mean_abs": 1289.311470588235,
    "variance": 39430.97231548565,
    "stddev": 198.57233522191768,
    "coeff_var": 0.15401424694633417,
    "sum": 175346.35999999996,
    "first_quartile": 1183.64,
    "median": 1234.44,
    "third_quartile": 1320.8,
    "percentiles": [
      80,
      90,
      95,
      99
    ],
    "percentile_values": [
      1381.76,
      1480.82,
      1661.16,
      2222.5
    ]
  }
}

wenzeslaus · 2022-06-07T21:56:59Z

JSON output for v.db.univar has tests and it is ready for feedback, review, or merge.

* Add JSON output to db.univar. * Add JSON from db.univar to v.db.univar. * All formats now handled through the format option. * Output percentiles as two lists, not a mapping. Tests: * Old tests fixed. * New tests are using pytest. * Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original. * The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.

modules: Add JSON output for db.univar and v.db.univar

72e60df

wenzeslaus added this to the 8.4.0 milestone May 20, 2022

wenzeslaus added Python Related code is in Python enhancement New feature or request labels May 20, 2022

Add tests, fix code

8626eac

Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original. The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.

wenzeslaus marked this pull request as ready for review June 1, 2022 14:09

wenzeslaus added 2 commits June 7, 2022 16:42

Output percentiles as two lists

45e89c8

Add documentation

a657377

wenzeslaus mentioned this pull request Jun 7, 2022

v.dissolve: Compute attribute aggregate statistics #2388

Merged

wenzeslaus merged commit 120f198 into OSGeo:main Jun 9, 2022

wenzeslaus deleted the json-for-db_univar branch June 9, 2022 16:03

wenzeslaus mentioned this pull request Dec 2, 2022

r.kappa: Add JSON output option #2666

Merged

cwhite911 mentioned this pull request Jun 5, 2023

Add JSON and YAML C library dependency #3020

Open

46 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modules: Add JSON output for db.univar and v.db.univar #2386

modules: Add JSON output for db.univar and v.db.univar #2386

wenzeslaus commented May 19, 2022 •

edited

Loading

wenzeslaus commented Jun 1, 2022 •

edited

Loading

wenzeslaus commented Jun 7, 2022

wenzeslaus commented Jun 7, 2022

modules: Add JSON output for db.univar and v.db.univar #2386

modules: Add JSON output for db.univar and v.db.univar #2386

Conversation

wenzeslaus commented May 19, 2022 • edited Loading

wenzeslaus commented Jun 1, 2022 • edited Loading

wenzeslaus commented Jun 7, 2022

wenzeslaus commented Jun 7, 2022

wenzeslaus commented May 19, 2022 •

edited

Loading

wenzeslaus commented Jun 1, 2022 •

edited

Loading