Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets labeled with "Incomplete metadata" due to invalid geospatial bounding box quietly disappear in search results #10526

Closed
kbrueckmann opened this issue Apr 24, 2024 · 4 comments
Labels
Type: Bug a defect

Comments

@kbrueckmann
Copy link

What steps does it take to reproduce the issue?
For us, only previously published datasets are affected. An example can be found here:
Dataset: https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/10000
Dataverse (the dataset is missing in the listed child datasets): https://heidata.uni-heidelberg.de/dataverse/iwrgraphics
The dataset is still in the database (and can be reached via API), but it disappears from the search results and can only be found by using the direct link/doi.

When does this issue occur?
We encountered the problem with datasets that use the bounding box in the geospatial metadata section. Apparently, this might be connected to the changes in #10142 . On editing the metadata, the following message is shown: "Geographic Bounding Box has invalid coordinates. East must be greater than West and North must be greater than South. Missing values are NOT allowed." This is quite easily corrected and afterwards the dataset reappears in the search results.

However, we do not know how to find the datasets affected in order to be able to correct them. Is there any feature for this?

To whom does it occur (all users, curators, superusers)?
All users

What did you expect to happen?
The datasets to be still visible in the search results even if parts of the metadata are now invalid. Ideally, to receive a notice that they are invalid. We are looking for a way to find all affected datasets.

Which version of Dataverse are you using?
6.1

Any related open or closed issues to this bug report?
The question was previously asked by @lmaylein in Issue #10116 and opened here on @pdurbin 's request.
#10116 #10142

@kbrueckmann kbrueckmann added the Type: Bug a defect label Apr 24, 2024
@qqmyers
Copy link
Member

qqmyers commented Apr 24, 2024

FWIW: I don't know of any way to find specifically which datasets have this geo box issue, but the https://guides.dataverse.org/en/latest/admin/solr-search-index.html#index-and-database-consistency status api call would write in the log a list of any datasets that didn't get indexed. That might help if the datasets are not getting indexed at all versus just being marked as having incomplete metadata/not having the geo box indexed.

@lmaylein
Copy link
Contributor

@qqmyers Strangely enough, the affected datasets that we know of were not in the index check list. However, we will now correct the datasets we know and then completely re-index everything.

@pdurbin
Copy link
Member

pdurbin commented Apr 24, 2024

It's really a bummer that it's not easy to find the affected datasets. If you do a complete reindex, are these problematic datasets logged? (It may be better to try this on a test server first.)

I've long dreamed of an API that will exercise our validation rules on a dataset (we use Bean Validation for this). New rules were added in #10142 which is why data that was treated as perfectly fine in 6.0 is now treated as invalid (for good reasons). Obviously, we need a way to know when old data no longer complies with new rules. 😅

@lmaylein
Copy link
Contributor

lmaylein commented May 2, 2024

No errors were logged during reindexing. I assume that after correcting the metadata, all datasets will now be displayed again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
None yet
Development

No branches or pull requests

4 participants