Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for entered coordinate values in Geospatial metadata #10142

Merged
merged 9 commits into from
Nov 30, 2023

Conversation

stevenwinship
Copy link
Contributor

What this PR does / why we need it: Prevents the user input of invalid geospatial bounding box coordinates

Which issue(s) this PR closes: #9547

Closes #9547

Special notes for your reviewer:

Suggestions on how to test this: try adding non numbers/null values and values outside the norm for coordinates

Does this PR introduce a user interface change? If mockups are available, please link/include them here: error messages were added

Is there a release notes update needed for this change?: No

Additional documentation:

@coveralls
Copy link

coveralls commented Nov 27, 2023

Coverage Status

coverage: 20.084% (+0.08%) from 20.007%
when pulling 2d6efc5 on 9547-validate-geospatial-metadata
into 410eb45 on develop.

This comment has been minimized.

@landreev landreev self-assigned this Nov 27, 2023
@scolapasta scolapasta added this to the 6.1 milestone Nov 27, 2023

This comment has been minimized.

@landreev
Copy link
Contributor

@stevenwinship
OK, it is definitely doing the right thing now when all 4 values are entered - great!
It may not be doing what we want when some of the 4 boxes are left empty, however. 
As in, this is accepted:
Screen Shot 2023-11-28 at 8 29 10 AM
... but this is not:
Screen Shot 2023-11-28 at 8 30 15 AM

It is because of this line:
final Float returnVal = value != null ? Float.parseFloat(value) : min; // defaults to min if value is missing

it should not default to min. The idea was to default to the other value; i.e., if only X1 is entered, we wanted to assume that X2=X1 (what's being done in the current indexing code, in other words). So, the method verifyBoundingBoxCoordinatesWithinRange(...) would need to allow, and return null, then the top-level method would substitute this null with the value of the opposite end of the box - if entered...

Thinking about it though, it may have been a misguided idea, to continue allowing users to leave some of the values empty. It may be making the whole thing even more confusing than it already is.

So, I suggest that we handle it as follows:

  1. We require that all 4 values are entered; however,
  2. Let's explicitly allow the case where X1=X2 and/or Y1=Y2.

in other words, the following should be allowed:

Screen Shot 2023-11-28 at 8 34 34 AM

i.e., <= instead of <.

Just to be a boring nerd, the rationale is that the above (which is roughly the spot of our office at IQSS) is a fairly valid "box". Because 42.3755 is not really a fixed point, but rather a longitude coordinate defined to +-0.0001 of a degree of precision - and that makes a fairly sizeable, non-empty chunk of surface.

This comment has been minimized.

This comment has been minimized.

@cmbz cmbz added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Nov 29, 2023
@landreev
Copy link
Contributor

landreev commented Nov 29, 2023

I checked in a quick change to the geospatial block tsv and .properties. Added extended labels for the bbox fields on the edit form as discussed.
Also added this bit of formatting for the display form.
To show this:
Screen Shot 2023-11-29 at 5 42 54 PM
instead of this:
Screen Shot 2023-11-29 at 5 46 34 PM

Also added a 2 sentence release note telling them to update the block.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving.

Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9547-validate-geospatial-metadata
ghcr.io/gdcc/configbaker:9547-validate-geospatial-metadata

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@scolapasta scolapasta assigned landreev and unassigned stevenwinship Nov 30, 2023
@stevenwinship stevenwinship merged commit 5b3a408 into develop Nov 30, 2023
11 checks passed
@landreev
Copy link
Contributor

@stevenwinship and everybody
I retested extra carefully the scenario where there are some preexisting datasets with invalid bounding box values, and you try to reindex everything. Confirmed that these datasets are now getting indexed properly (minus the bad bboxes that are dropped), instead of failing quietly.

We reindexed our prod. catalogue from scratch after the last deployment (6.0), with a new version of Solr installed (older versions were less picky about these bboxes being correctly defined). We discovered in the process that we had a few thousand automatically-generated datasets that had the North/South values swapped, and they all just quietly disappeared from the search catalog.

I'm going to declare this PR merged and done for real now.

@jggautier
Copy link
Contributor

"Prod. catalogue" is Harvard Dataverse right, and the thousand automatically-generated datasets are datasets published in Harvard Dataverse? If so, what does it mean that datasets disappeared from the catalog? Sounds like they're not searchable anymore 😬

@landreev
Copy link
Contributor

@jggautier There was a very long slack thread about the whole disaster, I'll look it up.

"Prod. catalogue" is Harvard Dataverse right, and the thousand automatically-generated datasets are datasets published in Harvard Dataverse? If so, what does it mean that datasets disappeared from the catalog? Sounds like they're not searchable anymore 😬

@jggautier
Copy link
Contributor

Thanks. And sorry; I try to keep up with the conversation. Sometimes things go over my head

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add validation for entered coordinate values in Geospatial metadata (bounding boxes, etc)
6 participants