Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring back hOCR term config item in IIIF Manifest Views style. #953

Closed
wants to merge 9 commits into from

Conversation

alxp
Copy link

@alxp alxp commented Jun 19, 2023

GitHub Issue: (link)

What does this Pull Request do?

Brings back the config item on IIIF Manifest Views style plugin so that a site builder can choose the term to retrieve hOCR text from on a sibling Media entity.

What's new?

We outsmarted ourselves trying to retrieve a related media item using Views relationships. It turns out that this does not work if a row's object has an image media but no hOCR media.*

  • This is with the current view that uses Media as its base table.

This rolls back the commit where I removed the options form to explicitly select the term, and adds more descriptive help text.

  • Does this change add any new dependencies? No
  • Does this change require any other modifications to be made to the repository
    (i.e. Regeneration activity, etc.)? No
  • Could this change impact execution of existing code? No

How should this be tested?

From a new instance of the starter site, e.g. 'make starter_dev' in ISLE-DC:

Back up your existing site config if you want to return to it with 'make config:export', then copy the files in codebase/config/sync to a temporary folder.
Run composer require islandora/islandora_mirador "islandora/islandora:dev-946-hocr-media as 2.7.x-dev". If composer gives you trouble you may need to chmod -R u+w web/sites/default and / or generate a GitHub API token. Alternatively you can just check out this issue branch manually.
I've attached a zip file with configs for setting up what you need to test this change. Import the attached configs:
config-import.zip
3.1. Unzip the file into your codebase folder so it is accessible inside ISLE-DC, e.g., to codebase/config. The files will be in a folder called 'config-import'./
3.2. Inside Docker's Drupal image, run the command: drush config:import --partial --source=/var/www/drupal/config/config-import
create a new term in the Media Use taxonomy with URL set to https://discoverygarden.ca/use#hocr (likely to be the future one we use.)
We should now be able to test hOCR generateion:

Add a Repository Item with Model 'Paged Content'
Upload one or more TIFFs with text on them as children, using Model 'Page', media type 'File' and media use 'Original File'.
You can monitor hOCR derivative generation with the logger, docker compose logs -f hypercube
After derivatives are generated you should see 'hOCR Extracted Text' media as part of the items on the Media tab.
Next test the IIIF Manifest:

Go to Admin > Structure > Views and edit the IIIF Manifest view.
Click 'Settings' next to IIF Manifest in the Format section on the left.
You should see a new config form element, allowing you to choose the Media Use term for hOCR Extracted Text:
image

Append '/manifest' to the URL of the Paged Content node you created earlier. This should print a manifest including "SeeAlso" entries where the hOCR URLs are included.
The node page itself should include a Mirador viewer with the Text Overlay plugin enabled. Text should be slectable. You can turn this off via Mirador's UI in the top-right. If the text selection buttons don't appear, try clearing Drupal's cache.
image
Next go to one of the children objects, and click on the Media tab. Delete the hOCR Extracted Text media, then go back to the original book page. You should still see the image in Mirador, but it won't have a text overlay.

The part of this PR that is new is the Views Plugin setting. The rest is already in Islandora

Documentation Status

  • Does this change existing behaviour that's currently documented? No
  • Does this change require new pages or sections of documentation? No
  • Who does this need to be documented for?

Additional Notes:

Any additional information that you think would be helpful when reviewing this
PR.

Interested parties

Tag (@ mention) interested parties or, if unsure, @Islandora/committers
@rosiel

@seth-shaw-asu seth-shaw-asu self-requested a review June 21, 2023 17:28
@rosiel rosiel self-assigned this Jul 12, 2023
Copy link

@adam-vessey adam-vessey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presently in conflict with upstream.

@alxp alxp force-pushed the hocr_media_redux branch from 0d027c4 to d034d04 Compare July 19, 2023 03:21
@alxp
Copy link
Author

alxp commented Jul 31, 2023

At IslandroaCon 2023 the following items were agreed on:

  1. Islandroa IIIF should be moved to its own module
  2. It should be hosted on Drupal.org
  3. We should try to not have any explicit dependency on Islandora's internal functions or assumptions about object composition.
  4. Nodes should eventually be made to reference their own media rather than media referencing nodes.
    With all of these changes, it's my conclusion that the functionality in this PR would better belong as a subclass of the current more generic views style plugin and could live directly in Islandora as an IIIF support module if it's needed at all.

Closing this PR since it represents work that we'd be und-doing later. (Hopefully sooner)

@alxp alxp closed this Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants