Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow "; " separated Geotrace/shape to prevent loss of all but first coordinate in OData export #300

Closed
florianm opened this issue Oct 20, 2020 · 9 comments

Comments

@florianm
Copy link

florianm commented Oct 20, 2020

Problem

OData exports only the first coordinate of "; " separated geoshapes (and probably geotraces to).
The RESTful API and the CSV/ZIP export are unaffected.

Data

Original issue with data examples at ropensci/ruODK#95

@TimonWeitkamp found that about half of all submissions to his form contain invalid geotraces which are "; " separated. The records in question were captured in Mozambique and are stored on Timon's ODK Central instance.
There is no telling which record is valid and which is not, so any error handling needs to run on each record.

The exact form is also deployed to https://sandbox.central.getodk.org/#/projects/14/forms/2/submissions but doesn't (yet) show invalid geotraces among the test records captured by me in Perth and the Netherlands by Timon.
Maybe Timon can figure out how to reproduce the spaced out geoshapes and submit some to the sandbox?

Approach

Looking at backend's lib/data/json.js:

const pointStrs = text.split(';');

const pointStrs = text.split(';'); splits geotraces/shapes into points at the ";".
const [ lat, lon, altitude/*, accuracy*/ ] = str.split(/\s+/g).map(parseFloat); maps parseFloat over the coordinates.
On "; " separated geoshapes (assume this pertains also to geotraces) (((lat == null) || (lon == null)) && (pointStrs.length === 1)) return; yields a null for lat and aborts the parsing. This would explain the missing coordinates in OData.

Possible solution

Change

const [ lat, lon, altitude/*, accuracy*/ ] = str.split(/\s+/g).map(parseFloat);
from

const [ lat, lon, altitude/*, accuracy*/ ] = str.split(/\s+/g).map(parseFloat);

to

const [ lat, lon, altitude/*, accuracy*/ ] = str.split(/\s+/g).trim().map(parseFloat);

Worked example: https://jsfiddle.net/h52wc8bu/
A quick benchmark says that native trim() outperforms an equivalent regex significantly, and outperforms a targeted regex (just dropping leading whitespace) still comfortably.
Is the performance hit ...madness? (check back with the Tech Lead)

@issa-tseng
Copy link
Member

i seem to recall a comment elsewhere about this that indicated spaces should not be there after the ;, and the issue should be fixed at the source? is that still the story here?

@TimonWeitkamp
Copy link

TimonWeitkamp commented Nov 16, 2020

I just downloaded the data through ruODK again and compared the polygons with the manual CSV download through the central server. The ruODK data still has the space error.

An example:
ruODK: POLYGON ((32.9974803 -19.0138301 655))
csv: -19.0138301 32.9974803 655.0 8.0; -19.0138301 32.9974803 655.0 8.0; -19.0138301 32.9974803 655.0 8.0; -19.013824 32.9974849 658.0 7.333; -19.013824 32.9974849 658.0 7.333; -19.013824 32.9974849 658.0 7.333; -19.0137943 32.9975057 661.0 5.0; -19.0137943 32

@florianm
Copy link
Author

florianm commented Nov 16, 2020

Worked example is here (same form that had broken records in the first place, but with freshly captured test data): https://rpubs.com/florian_mayer/ruodk_issue95 (expand code blocks for some context)

Important note: the bug manifests itself in the OData submission endpoint, ruODK receives only the first coordinate tuple of "buggy" geoshapes/traces with extra whitespaces. ruODK exports the data without problem through the CSV ZIP and the REST submission_get endpoints.

Regardless of where the bug comes from, and whether a future fix will prevent it for future submissions, I see actual production data being impacted, and the suggested fix to ODK Central (.trim()) would make ODK Central robust against this issue.

@lognaturel mentioned on Slack that the source of the error could be ODK Collect pending some investigations.

@TimonWeitkamp have you been able to reproduce the issue, or gotten any further information from the data collectors sending the broken records?

@lognaturel
Copy link
Member

lognaturel commented Nov 16, 2020

I haven't been able to reproduce but @getodk/testers will spend some time with it hopefully soon. Ideally there wouldn't need to be any change to Central backend but if we can't track down how it happens we'll have to discuss adding support for spaces there as a fallback.

@issa-tseng
Copy link
Member

have we made a decision here?

@lognaturel
Copy link
Member

We’ve now patched Collect but naturally that doesn’t help with data that has already been gathered. If the impact on performance and the level of effort aren’t too large, it would be best to ignore those spaces.

@florianm
Copy link
Author

florianm commented May 6, 2021

@issa-tseng any chance of including the .trim()patch into v1.2? This would allow the impacted users unlock their broken records via the standard Central upgrade process rather than any manual patching.

Seeing Collect has been fixed, the .trim() could even be dropped a few versions down the line once those sending broken geoshapes in the first place have upgraded to latest Collect.

@issa-tseng
Copy link
Member

found a way to do this without adding an operation which makes me feel better about it.

@florianm
Copy link
Author

florianm commented May 7, 2021

Oh that's a way nicer fix than trim() - thanks for the patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants