Allow "; " separated Geotrace/shape to prevent loss of all but first coordinate in OData export #300

florianm · 2020-10-20T05:18:56Z

Problem

OData exports only the first coordinate of "; " separated geoshapes (and probably geotraces to).
The RESTful API and the CSV/ZIP export are unaffected.

Data

Original issue with data examples at ropensci/ruODK#95

@TimonWeitkamp found that about half of all submissions to his form contain invalid geotraces which are "; " separated. The records in question were captured in Mozambique and are stored on Timon's ODK Central instance.
There is no telling which record is valid and which is not, so any error handling needs to run on each record.

The exact form is also deployed to https://sandbox.central.getodk.org/#/projects/14/forms/2/submissions but doesn't (yet) show invalid geotraces among the test records captured by me in Perth and the Netherlands by Timon.
Maybe Timon can figure out how to reproduce the spaced out geoshapes and submit some to the sandbox?

Approach

Looking at backend's lib/data/json.js:

central-backend/lib/data/json.js

Line 247 in e9ffd2c

const pointStrs = text.split(';');

const pointStrs = text.split(';'); splits geotraces/shapes into points at the ";".
const [ lat, lon, altitude/*, accuracy*/ ] = str.split(/\s+/g).map(parseFloat); maps parseFloat over the coordinates.
On "; " separated geoshapes (assume this pertains also to geotraces) (((lat == null) || (lon == null)) && (pointStrs.length === 1)) return; yields a null for lat and aborts the parsing. This would explain the missing coordinates in OData.

Possible solution

Change

central-backend/lib/data/json.js

Line 255 in e9ffd2c

const [ lat, lon, altitude/*, accuracy*/ ] = str.split(/\s+/g).map(parseFloat);

from

const [ lat, lon, altitude/*, accuracy*/ ] = str.split(/\s+/g).map(parseFloat);

to

const [ lat, lon, altitude/*, accuracy*/ ] = str.split(/\s+/g).trim().map(parseFloat);

Worked example: https://jsfiddle.net/h52wc8bu/
A quick benchmark says that native trim() outperforms an equivalent regex significantly, and outperforms a targeted regex (just dropping leading whitespace) still comfortably.
Is the performance hit ...madness? (check back with the Tech Lead)

The text was updated successfully, but these errors were encountered:

issa-tseng · 2020-11-16T01:32:44Z

i seem to recall a comment elsewhere about this that indicated spaces should not be there after the ;, and the issue should be fixed at the source? is that still the story here?

TimonWeitkamp · 2020-11-16T08:50:19Z

I just downloaded the data through ruODK again and compared the polygons with the manual CSV download through the central server. The ruODK data still has the space error.

An example:
ruODK: POLYGON ((32.9974803 -19.0138301 655))
csv: -19.0138301 32.9974803 655.0 8.0; -19.0138301 32.9974803 655.0 8.0; -19.0138301 32.9974803 655.0 8.0; -19.013824 32.9974849 658.0 7.333; -19.013824 32.9974849 658.0 7.333; -19.013824 32.9974849 658.0 7.333; -19.0137943 32.9975057 661.0 5.0; -19.0137943 32

florianm · 2020-11-16T11:46:48Z

Worked example is here (same form that had broken records in the first place, but with freshly captured test data): https://rpubs.com/florian_mayer/ruodk_issue95 (expand code blocks for some context)

Important note: the bug manifests itself in the OData submission endpoint, ruODK receives only the first coordinate tuple of "buggy" geoshapes/traces with extra whitespaces. ruODK exports the data without problem through the CSV ZIP and the REST submission_get endpoints.

Regardless of where the bug comes from, and whether a future fix will prevent it for future submissions, I see actual production data being impacted, and the suggested fix to ODK Central (.trim()) would make ODK Central robust against this issue.

@lognaturel mentioned on Slack that the source of the error could be ODK Collect pending some investigations.

@TimonWeitkamp have you been able to reproduce the issue, or gotten any further information from the data collectors sending the broken records?

lognaturel · 2020-11-16T17:30:13Z

I haven't been able to reproduce but @getodk/testers will spend some time with it hopefully soon. Ideally there wouldn't need to be any change to Central backend but if we can't track down how it happens we'll have to discuss adding support for spaces there as a fallback.

issa-tseng · 2021-04-24T01:34:20Z

have we made a decision here?

lognaturel · 2021-04-24T05:38:51Z

We’ve now patched Collect but naturally that doesn’t help with data that has already been gathered. If the impact on performance and the level of effort aren’t too large, it would be best to ignore those spaces.

florianm · 2021-05-06T00:23:29Z

@issa-tseng any chance of including the .trim()patch into v1.2? This would allow the impacted users unlock their broken records via the standard Central upgrade process rather than any manual patching.

Seeing Collect has been fixed, the .trim() could even be dropped a few versions down the line once those sending broken geoshapes in the first place have upgraded to latest Collect.

issa-tseng · 2021-05-06T20:25:38Z

found a way to do this without adding an operation which makes me feel better about it.

florianm · 2021-05-07T03:58:15Z

Oh that's a way nicer fix than trim() - thanks for the patch!

florianm mentioned this issue Feb 10, 2021

Wrong Polygon representation in OData submissions API #322

Closed

matthew-white closed this as completed in 713ca16 May 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow "; " separated Geotrace/shape to prevent loss of all but first coordinate in OData export #300

Allow "; " separated Geotrace/shape to prevent loss of all but first coordinate in OData export #300

florianm commented Oct 20, 2020 •

edited

Loading

issa-tseng commented Nov 16, 2020

TimonWeitkamp commented Nov 16, 2020 •

edited

Loading

florianm commented Nov 16, 2020 •

edited

Loading

lognaturel commented Nov 16, 2020 •

edited

Loading

issa-tseng commented Apr 24, 2021

lognaturel commented Apr 24, 2021

florianm commented May 6, 2021 •

edited

Loading

issa-tseng commented May 6, 2021

florianm commented May 7, 2021

Allow "; " separated Geotrace/shape to prevent loss of all but first coordinate in OData export #300

Allow "; " separated Geotrace/shape to prevent loss of all but first coordinate in OData export #300

Comments

florianm commented Oct 20, 2020 • edited Loading

Problem

Data

Approach

Possible solution

issa-tseng commented Nov 16, 2020

TimonWeitkamp commented Nov 16, 2020 • edited Loading

florianm commented Nov 16, 2020 • edited Loading

lognaturel commented Nov 16, 2020 • edited Loading

issa-tseng commented Apr 24, 2021

lognaturel commented Apr 24, 2021

florianm commented May 6, 2021 • edited Loading

issa-tseng commented May 6, 2021

florianm commented May 7, 2021

florianm commented Oct 20, 2020 •

edited

Loading

TimonWeitkamp commented Nov 16, 2020 •

edited

Loading

florianm commented Nov 16, 2020 •

edited

Loading

lognaturel commented Nov 16, 2020 •

edited

Loading

florianm commented May 6, 2021 •

edited

Loading