Add a FCREPO/Solr import plugin to help with Islandora 7 migrations #21

DiegoPino · 2021-04-15T17:59:24Z

What is this?

We got a lot of requests for this so its about time. This issue is to explain the design and how I plan on doing this, happy to get feedback, feature request, questions, etc.

How?

The idea here is to add another AMI Source Plugin that can deal with data coming directly from Solr (but not limited to, just as a start).
The plugin will integrate with the current setup form the same way Google Sheets and CSV works right now and will provide the following options

Server URL to your core
A predefined Islandora Profile with an advanced override
2.1. Collection PID/Top Object PID (e.g book) to import from (one at the time)
2.2. Filter CMODELS
2.3. Binary Datastream(s) per CMODEL to fetch files from
2.3.1 (NEW). HOCR and other derived data streams could be marked as SBFlavors and go into Solr directly. We can start with HOCR first and then think how that fits others. I wonder if we could pass the responsibility (a 'does it apply' method) to each Strawberry Runners Plugin or have one special plugin that automatically takes ingested files of certain criteria and does the work without AMI level settings. Thanks @noahwsmith
2.4. Automatic build remote URL for datastream fetching
2.5. Offset and Number of Objects to fetch (this will be per Top object, because why would you want to limit the number of pages per Book?)
Advanced Override will include:
3.1. Select Membership relations (default profile provides the most common fields already)
3.2. Do a shallow import (default profile is deep import)
3.3. custom filter (as a coma separated list of fields/values)
3.4. What fields to return (default profile is fgs_, datastream data based on selected DSIDs, mods_)
3.5. Make Compounds/books a single ADO or multiple ADOs. Default is (guess what) a single ADO.
3.6 Use PID (if UUID based) as UUID for new ADO.
3.7 Use PID (if UUID based or numeric) as UUID seed for UUIDv5 for new ADO (hashes the PID so every time we ingest the same set we get the same ADO UUID (cool right?)

Any other feature you feel is missing here? Concerns?

Note: Mapping/etc will be the same as in any other AMI setup

Output CSV will already contain "documents, images, label, UUID and parent ship relations computed for you" columns processed by the plugin

Future work: make file URIs/URLs/Paths to be computable via a template. Gives full control for @giancarlobi to override the paths for each file and use, already existing local paths.

@dmer @patdunlavey @giancarlobi @alliomeria please add your suggestion/comments here. Thanks

noahwsmith · 2021-04-16T13:46:53Z

This would be magic. Should I interpret 2.3 as being able to move OCR/HOCR without the need to regenerate?

DiegoPino · 2021-04-16T14:05:19Z

Oh, sure! HOCR (good catch) should become Strawberry Flavor Data Sources too => Solr Docs. I will work on designing that part too. Added a comment as 2.3.1 for that case.

patdunlavey · 2021-04-16T14:15:52Z

@DiegoPino If this could be delivered quickly, it could change our whole migration ballgame for CAR. In particular, the ability to avoid the export step from their old system would make it hugely more efficient, both in the bulk migration of already existing content, but also in the final period prior to cutover when we need to update the new site with the most recent additions and changes from the old one. It could cut down our content freeze to literally nothing. So, yes, huge!
Am I right that, for binary file fetching, it would be essentially repository platform agnostic (e.g. fedora)? Where would the logic reside for mapping solr field value(s) (e.g. PID plus datastream ID) to a file uri? In twig?

DiegoPino · 2021-04-16T14:20:05Z

@patdunlavey in the 'default profile' or simple mapping we will assume a normal /datastream/DSID/download endpoint and built it for the user but the advanced option will provide you with a for-this-purpose only twig template to mangle/transform your endpoints the way you need/want. It will require docs.

How fast I can do this? Send coffee!! I already started, this goes well with the amount of time I'm devoting already to AMI so should not be slow (but 1-2 weeks for fully working at least)

patdunlavey · 2021-04-16T16:31:33Z

I spoke quickly, but perhaps not too quickly, about applicability of this to the CAR migration use case. The islandora solr instance does not have a lot of the data that we need to migrate (workflow metadata, mostly). These are in the drupal side. However, it seems like, if we add search_api_solr to the drupal site (either on the same core as islandora, or on a separate core or solr container) and index all the drupal fields, we could use the ami solr source. Can you think of any reason why not @DiegoPino ?

DiegoPino · 2021-04-16T16:43:51Z

@patdunlavey I do not see a not, but this would be a separate plugin profile/work or Advanced Setup I guess. Since Solr fields/data for Drupal will different vastly from what I can expect from a Fedora GSearch Driven config. If you manage to index all your D7 fields in Solr we can do some testing, I would need that something you drive and document using the features we provide here. Hope that makes sense. Let's start with Solr indexing from D7 and we go from there.

DiegoPino · 2021-04-16T16:52:37Z

Note: @patdunlavey I would also go a different CORE. Your Islandora Core will confuse the hell out Search API of D7.

DiegoPino · 2021-04-16T16:53:33Z

Also, this is implying also you want D8 migrator? For Islandora 8? 👀 because if this works for your D7 it will also work for the other one.

patdunlavey · 2021-04-16T21:54:40Z

Thanks @DiegoPino. Yes, understood that this would be on us to drive and document. In principle it seems like it should not be difficult.

Here's my thinking/understanding. I would expect that the new solr plugin would simply be feeding each object/entity's solr fields data into an array, keyed by solr field name. Each solr field's data could be a number, string, or array. Then it would send that array to the twig template, where the user would be responsible for mapping it to json. For filtering of the source data, we would need to provide a place to enter a solr query string. It would also make sense for the user to also be able to provide a field list parameter. Perhaps these would be in addition to what you were thinking would be needed for the UI for Islandora migration?

I'm not thinking that there need to be a "D7 migrator", or D8. Documentation, definitely, which we will do. But unless I'm really misunderstanding something (very possible!), I don't see where bundling a special profile would make sense.

Thanks for the recommendation to use a separate solr core. I had assumed that would make most sense, but without knowing why!

DiegoPino · 2024-08-15T18:11:11Z

Resolved

DiegoPino self-assigned this Apr 15, 2021

DiegoPino added enhancement New feature or request Ingest Setup Knobs and Levers you move while thinking about feelings and metadata and CSV files labels Apr 15, 2021

DiegoPino mentioned this issue May 17, 2021

ISSUE-21: Islandora 7 Importer/Update/Patch refactor and new Options #26

Merged

DiegoPino mentioned this issue Jun 23, 2021

ISSUE-21b(the return of the CSV) Islandora 7 migrations! ready to be used (almost?) #29

Merged

DiegoPino linked a pull request Jun 23, 2021 that will close this issue

ISSUE-21b(the return of the CSV) Islandora 7 migrations! ready to be used (almost?) #29

Merged

DiegoPino closed this as completed Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a FCREPO/Solr import plugin to help with Islandora 7 migrations #21

Add a FCREPO/Solr import plugin to help with Islandora 7 migrations #21

DiegoPino commented Apr 15, 2021 •

edited

Loading

noahwsmith commented Apr 16, 2021

DiegoPino commented Apr 16, 2021 •

edited

Loading

patdunlavey commented Apr 16, 2021

DiegoPino commented Apr 16, 2021 •

edited

Loading

patdunlavey commented Apr 16, 2021

DiegoPino commented Apr 16, 2021

DiegoPino commented Apr 16, 2021

DiegoPino commented Apr 16, 2021

patdunlavey commented Apr 16, 2021

DiegoPino commented Aug 15, 2024

Add a FCREPO/Solr import plugin to help with Islandora 7 migrations #21

Add a FCREPO/Solr import plugin to help with Islandora 7 migrations #21

Comments

DiegoPino commented Apr 15, 2021 • edited Loading

What is this?

How?

noahwsmith commented Apr 16, 2021

DiegoPino commented Apr 16, 2021 • edited Loading

patdunlavey commented Apr 16, 2021

DiegoPino commented Apr 16, 2021 • edited Loading

patdunlavey commented Apr 16, 2021

DiegoPino commented Apr 16, 2021

DiegoPino commented Apr 16, 2021

DiegoPino commented Apr 16, 2021

patdunlavey commented Apr 16, 2021

DiegoPino commented Aug 15, 2024

DiegoPino commented Apr 15, 2021 •

edited

Loading

DiegoPino commented Apr 16, 2021 •

edited

Loading

DiegoPino commented Apr 16, 2021 •

edited

Loading