Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset types (software, workflow, etc.) - initial support #10694

Merged
merged 56 commits into from
Sep 6, 2024

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Jul 17, 2024

What this PR does / why we need it:

This PR provides initial support for dataset types (part of IQSS/dataverse-pm#307):

  • a place for dataset types in the database
  • ability to use the types via API on create
  • send to DataCite "Software" or "Workflow"
  • various APIs (see how to test)

A good entry point for docs at https://dataverse-guide--10694.org.readthedocs.build/en/10694/user/dataset-management.html#dataset-types

This pull request also allows the status of feature flags to be listed via API. See https://dataverse-guide--10694.org.readthedocs.build/en/10694/api/native-api.html#list-all-feature-flags

Which issue(s) this PR closes:

Special notes for your reviewer:

I followed Proposal: Supporting Multiple Dataset Types in Dataverse the best I could but I was also influenced by discussions at tech hours.

The only failing test is Shellspec but it should be fixed by #10682

Suggestions on how to test this:

Make sure Jenkins is passing.

As of this writing (2024-07-31) Jenkins does not have the dataset types feature flag on. Turning this on would test the new feature. (The feature flag was removed.)

Test all APIs:

Test publishing to DataCite to ensure that the correct type is sent.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Yes, there is a new "Dataset Type" facet:

Screenshot 2024-07-31 at 2 56 34 PM

Also, when you publish a dataset of type software to DataCite, it will show as such in Fabrica. In the example below, look for "Software" next to the name (pyDataverse):

Screenshot 2024-07-29 at 5 16 31 PM

Is there a release notes update needed for this change?:

Yes, included.

Additional documentation:

Included, see especially:

@coveralls
Copy link

coveralls commented Jul 17, 2024

Coverage Status

coverage: 20.735% (-0.02%) from 20.755%
when pulling b7b9b7d on 10517-dataset-types
into 8fd8c18 on develop.

This comment has been minimized.

This comment has been minimized.

@cmbz
Copy link

cmbz commented Jul 18, 2024

2024/07/18 - 6.4 proposal request from @poikilotherm

@pdurbin pdurbin added the Champion: pdurbin Championed by @pdurbin for inclusion in the next release label Jul 19, 2024

This comment has been minimized.

1 similar comment

This comment has been minimized.

@pdurbin
Copy link
Member Author

pdurbin commented Jul 23, 2024

At tech hours today, I gave a demo of dataset types as of cfac9dc. Here's a screenshot where you can see a new facet called "Dataset Type" and a couple examples of "software" and "workflow":

Screenshot 2024-07-23 at 4 09 21 PM

I pointed out that I've already completed a good amount of the tasks in the issue at #10517 with the exception of this one:

  • Send an appropriate type to DataCite (maybe also other PID providers). Done in 8593d32.

However, sending different info to DataCite should probably wait until we merge the following pull request by @qqmyers, to avoid merge conflicts and extra effort:

In addition, based on feedback at tech hours, I plan to focus on the following:

  • Remove the ability to import software, etc. using DDI. That way we stop abusing the dataKind field. Done in 3aab5c0.
  • Make the baseType field non-nullable and create a Flyway script that populates it with "dataset" for existing datasets. Done in c8adf25.
  • All new datasets get a type of "dataset" unless something else is specified at create time. Done in c8adf25.
  • remove the _s from datasetType_s and put it in our Solr schema file.
  • make sure we can show non-English values in the UI for "software" and "workflow". Done in 067d416.

We also discussed the following ideas, but I don't consider any of these blockers for moving this pull request forward:

  • Talk with @jggautier and @dliburd about the possibility of icons for software and workflows, to be shown on search cards in the screenshot above, for example.
  • Think more about if dataset types (software, workflow, etc.) should be defined in the database rather than being hard-coded in an enum as they are now. Done in c8adf25.
  • In Proposal: Supporting Multiple Dataset Types in Dataverse there is a diagram from @poikilotherm and language around extending types through inheritance. We haven't forgotten about this but for now we're following the "incremental development" section of the proposal (adding a single column). We discussed a base type of "text" and how "articles" and "posters" could extend it.

I'll go ahead and mention members (I can find) of the old Software, Workflows & Containers Working Group (not already mentioned above) in case they'd like to see this update on software datasets: @atrisovic @doigl @kmika11 @4tikhonov

@jggautier
Copy link
Contributor

jggautier commented Jul 24, 2024

@pdurbin a while back we saw a Dataverse installation that customized those search cards to show the word "Dataset" when the item was a dataset. I wonder if that might be a better approach. Less work to figure out which icons to use for each type of object and maybe clearer for folks searching for data. Although then we have to think about internationalization.

World Agroforestry at https://data.worldagroforestry.org does this. And I think I've seen another installation that does this. Can't find it but it was a bit different visually.

Edit: For the sake of transparency I should say that I haven't really followed this proposal, so sorry if what I mentioned was already mentioned or isn't in scope 😬

Maybe someday but we're not confident about which field to use and
we're not even sure if there is any interest in this because DDI
usually represents data, not software or workflows.
@jggautier
Copy link
Contributor

Hey all. @scolapasta thought it would be good to have a call about the idea of icons or otherwise indicating on the search page what type of research object a user is looking at and suggested @qqmyers be on the call and that I check with @pdurbin.

@qqmyers is out this week and after this week I'll be out until August 12. @pdurbin, @scolapasta, @qqmyers and @dliburd, would you be interested in a call on or after August 12 about the idea of icons or otherwise indicating on the search page what type of research object a user is looking at? If so, I could send a poll to see what times work best.

I'm writing here so that there's a record of it that everyone else involved can see and so that it's close to more information about this effort.

pdurbin added 2 commits July 26, 2024 16:22
Also populate a few dataset types in database, with "dataset"
being the default. Add default type to existing datasets.

Also APIs for managing dataset types.
We bumped our db migration script to .2

Conflicts:
src/main/resources/db/migration/V6.3.0.1.sql
src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java

This comment has been minimized.

@pdurbin pdurbin removed the Champion: pdurbin Championed by @pdurbin for inclusion in the next release label Jul 29, 2024

This comment has been minimized.

Before this commit, the facet looked like this...

Dataset Type
(3) Dataset
(2) software
(1) workflow

... that is, "Dataset" was capitalized but "sofware" and
"workflow" were not. This commit fixes this, making all types
capitalized, and it makes the values translatable in other
languages. However, it does nothing to address some confusion
that Search API users will feel. They'll get back the capitalized
values but will need to pass in the lower case version (in English)
to narrow their search results.

This comment has been minimized.

Also add upgrade instructions for Solr.

Note that the change from "software" to "Software" should
have been included in the last commit about capitalization.
@sekmiller
Copy link
Contributor

It doesn't seem to me that a Feature Flag is being used for this feature. What am I missing?

@qqmyers
Copy link
Member

qqmyers commented Aug 29, 2024

It no longer is, but it did originally so some convenience methods that were added to see flags are still here. That code now could be in a separate PR, but seemed small enough to just review/let through in this.

This comment has been minimized.

@sekmiller
Copy link
Contributor

sekmiller commented Aug 30, 2024

Minor thing: when you click on the Dataset type facet you get the name from the table probably - not the bundle, so I'm seeing the lower case name - and you wouldn't see any translation provided.
Screen Shot 2024-08-30 at 10 12 20 AM

Looks OK in the facet (added one of my own)

Screen Shot 2024-08-30 at 10 15 14 AM

@sekmiller
Copy link
Contributor

Also advocating putting some kind of indicator of Dataset Type in the UI - tag on the card and/or dataset page.

@sekmiller
Copy link
Contributor

There are a couple of newly added unused imports in dataverses.java

@pdurbin
Copy link
Member Author

pdurbin commented Sep 4, 2024

As discussed with @sekmiller I did add some tests in 673d775 to assert that the API properly returns the value from the bundle such as "Software" (capital S) rather than the value from the database such as "software" (lower case).

I suspect the facets not showing the correct values could have something to do with this issue:

My hope is that after merging the fix for that issue...

... the JSF facets will show the correct values, the ones from the bundle.

We also talked a bit about how a visual indicator that you are looking at a dataset would be nice, but we'll defer this to a future pull request. (This was also mentioned in a previous comment, how we'd like to pull in Julian and Dwayne.)

Finally, we should probably simply close this issue:

I'll check with @cmbz about this at standup. @scolapasta and @qqmyers decided we didn't need this after all, so I removed it.

This comment has been minimized.

@pdurbin
Copy link
Member Author

pdurbin commented Sep 4, 2024

My hope is that after merging the fix for that issue...

From a quick test (making a test branch, merging that PR into this one), I'm hopeful that it will be a good fix. I'm now seeing the correct values from the bundle in the facets.

Screenshot 2024-09-04 at 10 28 59 AM

This comment has been minimized.

Copy link

github-actions bot commented Sep 4, 2024

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:10517-dataset-types
ghcr.io/gdcc/configbaker:10517-dataset-types

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@sekmiller sekmiller merged commit 4143031 into develop Sep 6, 2024
24 of 25 checks passed
@pdurbin pdurbin added this to the 6.4 milestone Sep 6, 2024
@pdurbin
Copy link
Member Author

pdurbin commented Oct 4, 2024

Just to close the loop on the discussion above, I retested now that #10158 has been merged and as we suspected when you click the "Software" facet...

Screenshot 2024-10-04 at 12 12 49 PM

... it now correctly says "Dataset Type: Software" (uppercase) instead of "Dataset Type: software" (lowercase):

Screenshot 2024-10-04 at 12 12 59 PM

@pdurbin pdurbin deleted the 10517-dataset-types branch October 4, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 2 FY25 Sprint 2 issues FY25 Sprint 3 FY25 Sprint 3 FY25 Sprint 4 FY25 Sprint 4 FY25 Sprint 5 FY25 sprint 5 GREI Year 3 Year 3 GREI task GREI 6 Connect Digital Objects Size: 50 A percentage of a sprint. 35 hours.
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

Expose if feature flags are enabled or disabled via API Create the datasetType class and add UI facet
7 participants