-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow "; " separated Geotrace/shape to prevent loss of all but first coordinate in OData export #300
Comments
i seem to recall a comment elsewhere about this that indicated spaces should not be there after the |
I just downloaded the data through ruODK again and compared the polygons with the manual CSV download through the central server. The ruODK data still has the space error. An example: |
Worked example is here (same form that had broken records in the first place, but with freshly captured test data): https://rpubs.com/florian_mayer/ruodk_issue95 (expand code blocks for some context) Important note: the bug manifests itself in the OData submission endpoint, ruODK receives only the first coordinate tuple of "buggy" geoshapes/traces with extra whitespaces. ruODK exports the data without problem through the CSV ZIP and the REST submission_get endpoints. Regardless of where the bug comes from, and whether a future fix will prevent it for future submissions, I see actual production data being impacted, and the suggested fix to ODK Central (.trim()) would make ODK Central robust against this issue. @lognaturel mentioned on Slack that the source of the error could be ODK Collect pending some investigations. @TimonWeitkamp have you been able to reproduce the issue, or gotten any further information from the data collectors sending the broken records? |
I haven't been able to reproduce but @getodk/testers will spend some time with it hopefully soon. Ideally there wouldn't need to be any change to Central backend but if we can't track down how it happens we'll have to discuss adding support for spaces there as a fallback. |
have we made a decision here? |
We’ve now patched Collect but naturally that doesn’t help with data that has already been gathered. If the impact on performance and the level of effort aren’t too large, it would be best to ignore those spaces. |
@issa-tseng any chance of including the Seeing Collect has been fixed, the |
found a way to do this without adding an operation which makes me feel better about it. |
Oh that's a way nicer fix than |
Problem
OData exports only the first coordinate of "; " separated geoshapes (and probably geotraces to).
The RESTful API and the CSV/ZIP export are unaffected.
Data
Original issue with data examples at ropensci/ruODK#95
@TimonWeitkamp found that about half of all submissions to his form contain invalid geotraces which are "; " separated. The records in question were captured in Mozambique and are stored on Timon's ODK Central instance.
There is no telling which record is valid and which is not, so any error handling needs to run on each record.
The exact form is also deployed to https://sandbox.central.getodk.org/#/projects/14/forms/2/submissions but doesn't (yet) show invalid geotraces among the test records captured by me in Perth and the Netherlands by Timon.
Maybe Timon can figure out how to reproduce the spaced out geoshapes and submit some to the sandbox?
Approach
Looking at backend's lib/data/json.js:
central-backend/lib/data/json.js
Line 247 in e9ffd2c
const pointStrs = text.split(';');
splits geotraces/shapes into points at the ";".const [ lat, lon, altitude/*, accuracy*/ ] = str.split(/\s+/g).map(parseFloat);
maps parseFloat over the coordinates.On "; " separated geoshapes (assume this pertains also to geotraces)
(((lat == null) || (lon == null)) && (pointStrs.length === 1)) return;
yields anull
for lat and aborts the parsing. This would explain the missing coordinates in OData.Possible solution
Change
central-backend/lib/data/json.js
Line 255 in e9ffd2c
to
Worked example: https://jsfiddle.net/h52wc8bu/
A quick benchmark says that native
trim()
outperforms an equivalent regex significantly, and outperforms a targeted regex (just dropping leading whitespace) still comfortably.Is the performance hit ...madness? (check back with the Tech Lead)
The text was updated successfully, but these errors were encountered: