Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Production Place repeatable and facetable, make Series repeatable #7810

Closed
wants to merge 12 commits into from

Conversation

philippconzett
Copy link
Contributor

What this PR does / why we need it:
Data contained in a dataset may have been produced at multiple places. Making the field Production Place repeatable will make it possible to reflect this fact in the dataset metadata. Making the field facetable will allow us to customize Dataverse collections more appropriately.

Which issue(s) this PR closes:
None, to my knowledge.

Merge from original repo
Turned allowmultiples facetable for Citation Metadata field Production Place from FALSE into TRUE as we'd like to be able to add multiple production places and to be able to filter dataset based on this metadata field.
Turning Citation Metadata section Series from allowmultiples=FALSE to allowmultiples=TRUE.
@scolapasta
Copy link
Contributor

@philippconzett just to be sure this is also changing series, is that intended?

@philippconzett
Copy link
Contributor Author

@scolapasta Yes, that's right. I tried to split this into two PRs, but I don't know how to do that :-/

@philippconzett
Copy link
Contributor Author

@djbrooke Could we get this PR merged before the next release?

@djbrooke
Copy link
Contributor

djbrooke commented Nov 2, 2021

Hi @philippconzett - there's the change on line 69, to series, that I thought you did not want to include in this PR:

https://github.com/IQSS/dataverse/pull/7810/files#diff-9a87ef9131a9320671e94ee2b06991ee64d8ae37ddad2bff052215c55e2ddb10R69

Let me know if you'd like me to work on this. I can make changes to the PR so that only your desired changes are included.

@djbrooke djbrooke self-assigned this Nov 2, 2021
@philippconzett
Copy link
Contributor Author

Hi @djbrooke. Actually, I wanted Series to be made repeatable. I probably should have created this as a separate PR, but I didn't know how to remove it from this one. If it's OK, please merge the change together with the Production Place change. Thanks!

@jggautier
Copy link
Contributor

jggautier commented Nov 3, 2021

Hi @philippconzett, about Production Place, in your datasets' metadata, is each Production Place associated with the metadata entered in Producer fields? I'm wondering if it would make sense to make Production Place a child field of the Producer fields.

I know this isn't how the two elements, Production Place and Producer, are organized in DDI Codebook 2 and I think that's why in the Citation metadata block they're treated as two separate fields. I'm thinking if reorganizing the fields makes sense, we can later ask the DDI folks to incorporate that change in a new version of their schema.

@djbrooke djbrooke changed the title Make Production Place repeatable and facetable Make Production Place repeatable and facetable, make series repeatable Nov 3, 2021
@djbrooke djbrooke changed the title Make Production Place repeatable and facetable, make series repeatable Make Production Place repeatable and facetable, make Series repeatable Nov 3, 2021
@djbrooke
Copy link
Contributor

djbrooke commented Nov 3, 2021

Thanks @philippconzett, I've updated the title to reflect that you want both of these changes.

Thanks @jggautier for jumping in!

@philippconzett
Copy link
Contributor Author

@djbrooke Thanks for changing the title.
@jggautier Thanks for raising the issue. I'm not sure whether Production Place should be a sub-field of Producer. Production Place has the following description: "The location where the data collection and any other related materials were produced." I think for many scientific fields where data are collected and sometimes also processed offsite/outside campus, Production Place is not identical with the location of the research organization. Also, one dataset may have been produced at different locations. So, I think we should keep Production Place and Producer separate, but still make Production Place repeatable. Thanks.

@jggautier
Copy link
Contributor

jggautier commented Nov 4, 2021

Thanks @philippconzett. That makes sense. This makes me think that "location" in the description doesn't always mean a geographical location. For some datasets, people may need to include the name of an organization.

@djbrooke you mentioned yesterday that you might have questions?

@djbrooke
Copy link
Contributor

djbrooke commented Nov 5, 2021

@jggautier thanks, no questions at the moment! I'll update from dev and add a release note about updating citation.csv.

@djbrooke
Copy link
Contributor

djbrooke commented Nov 5, 2021

OK - all set. I will keep this assigned to me because I'll need to update from develop once we cut 5.8.

@djbrooke djbrooke removed their assignment Nov 5, 2021
@djbrooke djbrooke self-assigned this Nov 17, 2021
@kcondon kcondon removed their assignment Nov 18, 2021
@djbrooke
Copy link
Contributor

Thanks @kcondon

@philippconzett we'll take care of the remaining release notes updates on this side. Thanks!

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "series" subfields should be updated in schema.xml. We plan to give more guidance about this in #8248.

@@ -359,7 +359,7 @@
<field name="responseRate" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="samplingErrorEstimates" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="samplingProcedure" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="series" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="series" type="text_en" multiValued="true" stored="true" indexed="true"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want the subfields of "series" to be multivalued as well. Please check out this diff:

-    <field name="series" type="text_en" multiValued="false" stored="true" indexed="true"/>
-    <field name="seriesInformation" type="text_en" multiValued="false" stored="true" indexed="true"/>
-    <field name="seriesName" type="text_en" multiValued="false" stored="true" indexed="true"/>
+    <field name="series" type="text_en" multiValued="true" stored="true" indexed="true"/>
+    <field name="seriesInformation" type="text_en" multiValued="true" stored="true" indexed="true"/>
+    <field name="seriesName" type="text_en" multiValued="true" stored="true" indexed="true"/>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the subfields of series in 4509dea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or wouldn't it make more sense to change Series to a complex field, such as Author? I guess this would mean:

series
allowmultiples: TRUE

seriesName
allowmultiples: FALSE

seriesInformation
allowmultiples: FALSE

@pdurbin pdurbin removed their assignment Nov 19, 2021
@pdurbin
Copy link
Member

pdurbin commented Nov 19, 2021

When I load the TSV in this pull request and do the manual process I used to do I get a bigger "diff" below than I expect.

Here's the resulting schema.xml (renamed so I can upload it): schema.xml.txt

The diff:

diff --git a/conf/solr/8.8.1/schema.xml b/conf/solr/8.8.1/schema.xml
index 1718962bfb..0a4e6f3d96 100644
--- a/conf/solr/8.8.1/schema.xml
+++ b/conf/solr/8.8.1/schema.xml
@@ -342,7 +342,7 @@
     <field name="producerName" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="producerURL" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="productionDate" type="text_en" multiValued="false" stored="true" indexed="true"/>
-    <field name="productionPlace" type="text_en" multiValued="false" stored="true" indexed="true"/>
+    <field name="productionPlace" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="publication" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="publicationCitation" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="publicationIDNumber" type="text_en" multiValued="true" stored="true" indexed="true"/>
@@ -359,9 +359,9 @@
     <field name="responseRate" type="text_en" multiValued="false" stored="true" indexed="true"/>
     <field name="samplingErrorEstimates" type="text_en" multiValued="false" stored="true" indexed="true"/>
     <field name="samplingProcedure" type="text_en" multiValued="false" stored="true" indexed="true"/>
-    <field name="series" type="text_en" multiValued="false" stored="true" indexed="true"/>
-    <field name="seriesInformation" type="text_en" multiValued="false" stored="true" indexed="true"/>
-    <field name="seriesName" type="text_en" multiValued="false" stored="true" indexed="true"/>
+    <field name="series" type="text_en" multiValued="true" stored="true" indexed="true"/>
+    <field name="seriesInformation" type="text_en" multiValued="true" stored="true" indexed="true"/>
+    <field name="seriesName" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="socialScienceNotes" type="text_en" multiValued="false" stored="true" indexed="true"/>
     <field name="socialScienceNotesSubject" type="text_en" multiValued="false" stored="true" indexed="true"/>
     <field name="socialScienceNotesText" type="text_en" multiValued="false" stored="true" indexed="true"/>
@@ -376,13 +376,13 @@
     <field name="studyAssayOrganism" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="studyAssayOtherMeasurmentType" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="studyAssayOtherOrganism" type="text_en" multiValued="true" stored="true" indexed="true"/>
-    <field name="studyAssayPlatform" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="studyAssayOtherPlatform" type="text_en" multiValued="true" stored="true" indexed="true"/>
-    <field name="studyAssayTechnologyType" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="studyAssayOtherTechnologyType" type="text_en" multiValued="true" stored="true" indexed="true"/>
+    <field name="studyAssayPlatform" type="text_en" multiValued="true" stored="true" indexed="true"/>
+    <field name="studyAssayTechnologyType" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="studyDesignType" type="text_en" multiValued="true" stored="true" indexed="true"/>
-    <field name="studyOtherDesignType" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="studyFactorType" type="text_en" multiValued="true" stored="true" indexed="true"/>
+    <field name="studyOtherDesignType" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="studyOtherFactorType" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="subject" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="subtitle" type="text_en" multiValued="false" stored="true" indexed="true"/>
@@ -613,13 +613,13 @@
     <copyField source="studyAssayOrganism" dest="_text_" maxChars="3000"/>
     <copyField source="studyAssayOtherMeasurmentType" dest="_text_" maxChars="3000"/>
     <copyField source="studyAssayOtherOrganism" dest="_text_" maxChars="3000"/>
-    <copyField source="studyAssayPlatform" dest="_text_" maxChars="3000"/>
     <copyField source="studyAssayOtherPlatform" dest="_text_" maxChars="3000"/>
-    <copyField source="studyAssayTechnologyType" dest="_text_" maxChars="3000"/>
     <copyField source="studyAssayOtherTechnologyType" dest="_text_" maxChars="3000"/>
+    <copyField source="studyAssayPlatform" dest="_text_" maxChars="3000"/>
+    <copyField source="studyAssayTechnologyType" dest="_text_" maxChars="3000"/>
     <copyField source="studyDesignType" dest="_text_" maxChars="3000"/>
-    <copyField source="studyOtherDesignType" dest="_text_" maxChars="3000"/>
     <copyField source="studyFactorType" dest="_text_" maxChars="3000"/>
+    <copyField source="studyOtherDesignType" dest="_text_" maxChars="3000"/>
     <copyField source="studyOtherFactorType" dest="_text_" maxChars="3000"/>
     <copyField source="subject" dest="_text_" maxChars="3000"/>
     <copyField source="subtitle" dest="_text_" maxChars="3000"/>

Updating from the API output picked up the missing subfields of "series"
which is a compound field.
@pdurbin pdurbin self-assigned this Nov 22, 2021
@pdurbin
Copy link
Member

pdurbin commented Nov 22, 2021

In 4509dea I updated schema.xml because the subfields of "series" had not been made multivalued. In #8248 we plan to explain the recommended way for developers to update schema.xml when making changes to metadata blocks we ship with the app, such as citation. In short, you should use the output from http://localhost:8080/api/admin/index/solr/schema

I've been poking a Solr and search-related stuff on this pull request. Somewhat surprisingly, it seems like a full reindex may not be required. I was able to create a dataset under "develop" and still have it's newly multivalued fields show up as facets once I deployed the new code.

Once I had the new code deployed, however, I noticed some bugs. Here's what I posted in Slack:

DDI export fails (and maybe other exports?) if you specify multiple production places.

“More than one series title found” is logged over and more if you specify more than series. It seems to be used in the citation.

Basically, there is hard-coded logic around these fields in the Java code. Changing the TSV is not enough.

A rule of thumb might be… if you’re messing with citation fields, go in DatasetFieldConstant and do a “find usages” for the fields you’re changing and try to figure out the impact.

@djbrooke
Copy link
Contributor

Hi @philippconzett, as you can see in the note from @pdurbin, during review and testing we realized that these changes will take more work to implement than just updates to the .tsv and accompanying release notes and Solr schema changes. I'll need to look at our other work in progress to see if this is something that we can prioritize, but I don't believe we'll be able to work on the necessary code changes here before the end of the year.

@pdurbin
Copy link
Member

pdurbin commented Mar 16, 2022

I just wanted to mention that we recently gained some experience changing allowMultiples from false to true with "Collection Mode" (collectionMode) in PR #8473. One thing to look out for is to make sure DDI import continues to work. See PR #8491 and PR #8483.

@jggautier
Copy link
Contributor

jggautier commented Mar 17, 2022

This applies the change at the installation level, so it's not possible for managers of collections within a Dataverse installation to control the number of values allowed for a field. Are there collection managers within Dataverse installations who want to make sure that depositors can enter only one Production Place?

Should the same question be investigated for other fields we consider making repeatable, including Collection Mode (although that field's already been made repeatable), before solutions are proposed and implemented?

@pdurbin
Copy link
Member

pdurbin commented Oct 5, 2022

@philippconzett heads up that there are merge conflicts in this PR. Do you need a hand resolving them?

@pdurbin pdurbin assigned philippconzett and unassigned djbrooke Oct 5, 2022
@mreekie mreekie added the bk2211 label Nov 1, 2022
@pdurbin
Copy link
Member

pdurbin commented Jan 4, 2023

@philippconzett and I just had a call about this PR and we're closing it in favor of these:

@pdurbin pdurbin closed this Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants