-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harvest: DDI import appears not to include all fields exported as DDI. #3297
Comments
Needs some further verification about whether or not this import is in scope for Harvesting. I'll take a look through the documentation and see if I can find this. |
Looking at "DocumentationAndAccessToSources" from above. From http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation_files/schemas/codebook_xsd/elements/srcDocu.html:
|
Removing from 4.5. We'll address this in the near future when our new metadata librarian joins the team. At that point we'll work with her/him to gather more information regarding the importance of including these fields. |
If we'd like people to be able to search for harvested datasets to the same extent that they're able to search for local datasets, (I say we should like to) I would think that the same number of searchable metadata fields should be available for both harvested and non-harvested datasets. For example, people who know the title of an article and want to know if there are any associated datasets can search on related publication citations of local (non-harvested) datasets, so they should be able to search on related publication citations of harvested datasets. This isn't possible if all or some harvested related publication metadata are not searchable (as @kcondon reported). Maybe a first pass could be to identify the metadata fields we think are important for searching by looking at what's been added to the advanced search field, plus any weighting added in solr so that searches favor certain fields over others. And make sure those fields are being indexed for harvested metadata. |
For this issue and in general, I think it would be helpful to record how Dataverse is mapping DDI Codebook fields to Dataverse fields when dataset metadata is imported. Makes sense to me to do this in the metadata crosswalk.
Can someone confirm by looking at the code that in both cases (harvesting over OAI-PMH and using the API endpoint) Dataverse is mapping DDI to Dataverse in the same way? Is it using the same code to map the fields? If we know that, it'll make documenting how DDI fields are mapped to Dataverse fields on import a lot easier. |
@landreev wrote in #4964 that "the only other harvesting format we (theoretically) recognize from a non-Dataverse OAI archive is DDI; in practice, it's extremely unlikely that we'll be able to parse a DDI that's produced by anything other than a Dataverse." This makes me wonder how much we've prioritized harvesting DDI metadata in general. It makes sense that Dataverse would prioritize support for harvesting dataverse_json metadata from Dataverse repositories, at least for Dataverse 4+ repositories. After lots of testing and recent conversations about harvesting, here are the configurations I'm following when creating harvesting clients in Harvard Dataverse:
Of course harvesting DDI metadata from non-Dataverse repositories would be preferred over Dublin Core, but if DC is the best supported option for now, there seem to be only two cases where we would recommend that a Dataverse 4+ repository harvest DDI metadata (over OAI-PMH): When harvesting from Dataverse 2-3.x repositories and from ICPSR. @kcondon, when you opened this ticket, were your findings the result of harvesting from Dataverse 2-3.x repositories? |
Just to reiterate what I said in a comment in #6650: I believe this issue can be considered a duplicate of the above. |
@jggautier
I didn't mean we shouldn't try harvesting DDI; we always want to choose more metadata-rich formats whenever possible. Note that what I said above was in the context of harvesting from a "generic archive", defined as a repository that we don't know anything about, aside from the fact that it is OAI-compliant. As opposed to harvesting DDI from the above-mentioned ICPSR, Roper and Nesstar archives, for which we do have pre-defined redirect rules. (This is what appears in the "archive type" pull down menu on the Harvesting Clients page; "Nesstar" isn't shown there, because it's not supported for active harvesting, but it is defined for displaying legacy Nesstar records). Hope this makes sense. Otherwise please let me know... |
That makes sense. Thanks @landreev! When we spoke today we agreed that this issue can be closed, so I'm closing it. #6650 addresses all of the fields mentioned in this issue's original comment. Some geospatial metadata won't be mapped on import, possibly because Codebook has no DDI elements for them (e.g. State, City). But I think that could be its own issue (that is, more research or confirming with the DDI technical committee that there's really no way to express geospatial metadata like State and City, or asking the DDI folks to consider adding a way, since they're actually in the middle of updates to Codebook). |
Searching on fields in the DDI export of a dataset mostly works but there are some fields apparently not imported.
Fields not searchable/ not imported but that appear in the DDI export include:
The text was updated successfully, but these errors were encountered: