-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Production Place repeatable and facetable, make Series repeatable #7810
Conversation
Merge from original repo
Turned allowmultiples facetable for Citation Metadata field Production Place from FALSE into TRUE as we'd like to be able to add multiple production places and to be able to filter dataset based on this metadata field.
Turning Citation Metadata section Series from allowmultiples=FALSE to allowmultiples=TRUE.
@philippconzett just to be sure this is also changing series, is that intended? |
@scolapasta Yes, that's right. I tried to split this into two PRs, but I don't know how to do that :-/ |
@djbrooke Could we get this PR merged before the next release? |
Hi @philippconzett - there's the change on line 69, to series, that I thought you did not want to include in this PR: Let me know if you'd like me to work on this. I can make changes to the PR so that only your desired changes are included. |
Hi @djbrooke. Actually, I wanted Series to be made repeatable. I probably should have created this as a separate PR, but I didn't know how to remove it from this one. If it's OK, please merge the change together with the Production Place change. Thanks! |
Hi @philippconzett, about Production Place, in your datasets' metadata, is each Production Place associated with the metadata entered in Producer fields? I'm wondering if it would make sense to make Production Place a child field of the Producer fields. I know this isn't how the two elements, Production Place and Producer, are organized in DDI Codebook 2 and I think that's why in the Citation metadata block they're treated as two separate fields. I'm thinking if reorganizing the fields makes sense, we can later ask the DDI folks to incorporate that change in a new version of their schema. |
Thanks @philippconzett, I've updated the title to reflect that you want both of these changes. Thanks @jggautier for jumping in! |
@djbrooke Thanks for changing the title. |
Thanks @philippconzett. That makes sense. This makes me think that "location" in the description doesn't always mean a geographical location. For some datasets, people may need to include the name of an organization. @djbrooke you mentioned yesterday that you might have questions? |
@jggautier thanks, no questions at the moment! I'll update from dev and add a release note about updating citation.csv. |
OK - all set. I will keep this assigned to me because I'll need to update from develop once we cut 5.8. |
Thanks @kcondon @philippconzett we'll take care of the remaining release notes updates on this side. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "series" subfields should be updated in schema.xml. We plan to give more guidance about this in #8248.
@@ -359,7 +359,7 @@ | |||
<field name="responseRate" type="text_en" multiValued="false" stored="true" indexed="true"/> | |||
<field name="samplingErrorEstimates" type="text_en" multiValued="false" stored="true" indexed="true"/> | |||
<field name="samplingProcedure" type="text_en" multiValued="false" stored="true" indexed="true"/> | |||
<field name="series" type="text_en" multiValued="false" stored="true" indexed="true"/> | |||
<field name="series" type="text_en" multiValued="true" stored="true" indexed="true"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want the subfields of "series" to be multivalued as well. Please check out this diff:
- <field name="series" type="text_en" multiValued="false" stored="true" indexed="true"/>
- <field name="seriesInformation" type="text_en" multiValued="false" stored="true" indexed="true"/>
- <field name="seriesName" type="text_en" multiValued="false" stored="true" indexed="true"/>
+ <field name="series" type="text_en" multiValued="true" stored="true" indexed="true"/>
+ <field name="seriesInformation" type="text_en" multiValued="true" stored="true" indexed="true"/>
+ <field name="seriesName" type="text_en" multiValued="true" stored="true" indexed="true"/>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the subfields of series in 4509dea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or wouldn't it make more sense to change Series to a complex field, such as Author? I guess this would mean:
series
allowmultiples: TRUE
seriesName
allowmultiples: FALSE
seriesInformation
allowmultiples: FALSE
When I load the TSV in this pull request and do the manual process I used to do I get a bigger "diff" below than I expect. Here's the resulting schema.xml (renamed so I can upload it): schema.xml.txt The diff:
|
Updating from the API output picked up the missing subfields of "series" which is a compound field.
In 4509dea I updated schema.xml because the subfields of "series" had not been made multivalued. In #8248 we plan to explain the recommended way for developers to update schema.xml when making changes to metadata blocks we ship with the app, such as citation. In short, you should use the output from http://localhost:8080/api/admin/index/solr/schema I've been poking a Solr and search-related stuff on this pull request. Somewhat surprisingly, it seems like a full reindex may not be required. I was able to create a dataset under "develop" and still have it's newly multivalued fields show up as facets once I deployed the new code. Once I had the new code deployed, however, I noticed some bugs. Here's what I posted in Slack: DDI export fails (and maybe other exports?) if you specify multiple production places. “More than one series title found” is logged over and more if you specify more than series. It seems to be used in the citation. Basically, there is hard-coded logic around these fields in the Java code. Changing the TSV is not enough. A rule of thumb might be… if you’re messing with citation fields, go in DatasetFieldConstant and do a “find usages” for the fields you’re changing and try to figure out the impact. |
Hi @philippconzett, as you can see in the note from @pdurbin, during review and testing we realized that these changes will take more work to implement than just updates to the .tsv and accompanying release notes and Solr schema changes. I'll need to look at our other work in progress to see if this is something that we can prioritize, but I don't believe we'll be able to work on the necessary code changes here before the end of the year. |
This applies the change at the installation level, so it's not possible for managers of collections within a Dataverse installation to control the number of values allowed for a field. Are there collection managers within Dataverse installations who want to make sure that depositors can enter only one Production Place? Should the same question be investigated for other fields we consider making repeatable, including Collection Mode (although that field's already been made repeatable), before solutions are proposed and implemented? |
@philippconzett heads up that there are merge conflicts in this PR. Do you need a hand resolving them? |
@philippconzett and I just had a call about this PR and we're closing it in favor of these: |
What this PR does / why we need it:
Data contained in a dataset may have been produced at multiple places. Making the field Production Place repeatable will make it possible to reflect this fact in the dataset metadata. Making the field facetable will allow us to customize Dataverse collections more appropriately.
Which issue(s) this PR closes:
None, to my knowledge.