0514 ml usage tags #516

glenrobson · 2024-07-12T13:59:02Z

For consideration & further discussion related to #514

Moved to the cookbook repo to build preview. Original Pull request #515

glenrobson · 2024-07-12T14:02:56Z

recipe/0514-ml-usage-tags/manifest.json

+    ]
+  },
+  "rights": "http://creativecommons.org/licenses/by-sa/3.0/",
+  "requiredStatement": [


I don't think requiredStatement can be an array I think it needs to be a JSON object. So to get it to validate if you can remove the [] in required statement it should get past the validation and start deploying the preview.

Ah got it, sorry for the extra [], removed

Sorry I'm being dim. Its not valid JSON now without the [] and looking at the presentation API:

https://iiif.io/api/presentation/3.0/#requiredstatement

I think its only possible to have one required statement. For this example do you want to remove the attribution one?

Ok, thank you, just sent through another updated version with the attribution statement as a second value, hope that’s alright.

This exemplifies the problem with the approach. Now you have two values with one label, but the label only applies to the first. This should be called out in the recipe description, and that this is a field intended for humans to read rather than machines to process. Even the best intentioned machine agent won't know what to do with this... at which point, just merge the two parts into one statement.

Thanks for the feedback @azaroth42.

My reading of the specs is indeed that the requiredStatement property is really only for human readable and displayable statements (recipe states 'for humans', but could be more verbose about this in the general description text as well). I'm thinking of this initial recipe as complementary to and a first step towards the more (intentionally) machinable rights --have a note about the WIP here:

cookbook-recipes/recipe/0514-ml-usage-tags/index.md

Line 61 in 507c140

* URIs to be pursued for machineable interactions, pending further discussions within the IIIF and wider repository communities during Summer 2024 and onwards.

(FWIW, also have a brief note about the actionability in general)

This is a bit out of scope for this particular recipe, but It might be worth having a larger discussion about requiredStatement only permitting a single object. I can see the use case for multiple objects/statements (regardless of these potential tags), such as repositories that often have both standardized and local rights statements, where it would helpful to users to see the text for the standardized statement in full alongside the more local statements. Or where multiple language labels and paired value statements could be useful.

Remove errant brackets

Object structure change for validation

alliomeria · 2024-07-12T14:50:58Z

Hi @glenrobson, looking at the build validation errors right now. Will follow up and try to correct those in just a few moments.

glenrobson · 2024-07-12T14:55:49Z

recipe/0514-ml-usage-tags/index.md

+ - Mirador
+ - UV
+topic: 
+ - text


The topics can only be one of the following

basic

property

note

structure

annotation

image

AV

realWorldObject

geo-recipes

content-state

I think I would go for note at the moment.

Got it, just changed to note, thanks Glen.

Correct topic for validation; update line references as needed for manifest.json changes

DiegoPino · 2024-07-17T20:25:01Z

recipe/0514-ml-usage-tags/manifest.json

@@ -12,7 +12,7 @@
         "<p>Picture taken by the <a href=\"https://github.com/glenrobson\">IIIF Technical Coordinator</a></p>"
      ]
   },
-   "rights": "http://creativecommons.org/licenses/by-sa/3.0/",
+   "rights": "https://www.wikidata.org/wiki/Q127518037",


Context update (:

alliomeria · 2024-07-17T20:40:22Z

Ah, I see I'm not getting the Validation right for @context. Did I misinterpret the extension mechanism?

alliomeria · 2024-07-17T21:03:44Z

Follow up notes: thinking the Policy Extension Registry for listing that Wikidata Q ref is not actually actionable as written (does not set true context.json; this would not either).

Also, it seems the validator itself has set parameters for rights:
https://github.com/IIIF/presentation-validator/blob/c3283776ef60161e40677ac96632fa507602c0aa/schema/iiif_3_0.json#L248-L267

Can anyone point me to an example of valid usage of URIs in 'rights' that are not CC or RightsStatements.org? Any Local Contexts, Traditional Knowledge references, for example?

DiegoPino · 2024-07-17T21:12:00Z

@alliomeria I agree. The JSON schema is fixed on those 3 base URLs which differs from what the human readable specs say.
JSON schema syntax allows for conditional oneOf (or any listings) based on e.g another value found somewhere else which would allow, under certain conditions to allow any external URL if e.g a different context was provided .. but .. that leads to
a question: the Presentation API (3.0) specs suggest the extension mechanism for other URLs outside of the rights statements/cc domain, but it is not clear how? the extension mechanism would have any effect at all on a "value" of an existing base property or how could that be resolved in a validation at all. From my understanding the extension mechanist would allow to use other JSON keys/properties, like any additional @context, according to JSON-LD, would allow, but a mapping to another vocab/ontology does not necessarily define a "different" value for an existing key, specially on a key like "rights" which is already mapped to a very permissive (good) dcterms:rights in terms of what goes there.

Maybe there is space for interpretation in the specs for this?

azaroth42 · 2024-07-17T21:23:24Z

I think the specification is somewhere between misleading and flat out wrong here. We need a registry of additional rights URIs with an explanation of what a client should do when it encounters them.

Created IIIF/api#2309 to this end.

Thanks @alliomeria for pushing into this somewhat unknown territory! :)

DiegoPino · 2024-07-17T21:36:56Z

@alliomeria @azaroth42 to allow this recipe to validate against 3.0 while 4.0 figures this out, would an alternative additional JSON/JSON-LD property coming from e.g schema.org (I'm thinking in specific of usageInfo) be used for the Wikidata ML tags? That way rights could still be, while on 3.0, CC based, and [usageInfo] (https://schema.org/usageInfo) serve as a stub in the meantime? Might be a stretch (sorry, don't want to derail this great effort) and I don't know if schema.org is even valid in the IIIF specs as a registered extension.

Just an end of the day idea but it might cover the machinable part since I am pretty sure some crawlers like Google do know how to map/parse read that complete @context and thus their properties and values.

azaroth42 · 2024-07-18T12:17:17Z

You could create a JSON-LD context document that defines a new property for IIIF (either de novo or by mapping from an existing ontology like schema) for sure. It would then fall into the extensions part of the spec directly and you could put whatever values you wanted in it.

alliomeria · 2024-07-18T14:07:08Z

I think the specification is somewhere between misleading and flat out wrong here. We need a registry of additional rights URIs with an explanation of what a client should do when it encounters them.

Created IIIF/api#2309 to this end.

Thanks @alliomeria for pushing into this somewhat unknown territory! :)

Thanks for pushing this into a discussion of potentially revising the specs for 4.0 (*1.4.0/1.3.0 = Archipelago versioning 🤓 ), @azaroth42. :) I was a bit puzzled try to piece out how to work with the extensions mechanism as currently described.

For 3.0, I appreciate your suggestions @DiegoPino for a potential actionable way to address right now. Happy to give that mode an attempt...

Diego, Rob, Glen, or anyone watching this issue, what else might you suggest as ways to work within the current specs to provide valid, actionable rights statement (broader sense of this terminology) declarations?

Also, what exactly is the functional result of a client hitting the current version of rights URIs from CC/RightsStatements.org?

alliomeria · 2024-07-19T14:36:41Z

Looking around a bit more, came across this: https://github.com/IIIF/api/blob/main/source/registry/rights/index.md

not linked to from the main Registry Page, but published https://iiif.io/api/registry/rights/

Maybe timely for proposing filling out for CC/RightsStatements.org, and inclusion of alternate rights statements, including Local Contexts notices, Traditional Knowledge labels, and others (like these 🙃 )?

Or perhaps the Rights Registry is an unused/abandoned area?

glenrobson · 2024-07-19T15:18:17Z

Discussed in cookbook meeting. Suggested way forward:

Find a URI that expreses the desired ML usage
Propose an addition to https://iiif.io/api/registry/rights/ as part of the recipe

If this gets through TRC look at adding other rights statments to the registry

@context

Return to normal @context

Update statements related to Wikidata, rights, registries

To sidestep validation error, per Glen

remove indentation

kirschbombe · 2024-07-19T15:57:22Z

Just FYI on the Registry pages, we've been working on adding Registry pages and moving things from the annex to the Registry as we make update. I haven't worked on it in a bit, but probably have a branch with the in-progress changes. If the group comes to together on agreed URIs to add, let me know and I will add them to my draft. I can also prioritize work to update the Rights registry page.

both lines (:

kirschbombe · 2024-07-19T16:03:20Z

Here's the draft PR for the Registry page with a preview link: IIIF/api#2248

Bump down JSON snippets for `rights`

- Adjust order or rights & requiredStatement - Add more detailed information about the URIs, potential current & potential future machinable, and notes detailed the Example shown

alliomeria · 2024-07-19T18:35:23Z

Based on the feedback received on this PR and during today’s cookbook call, I made a few additional updates to the recipe. Check it out here: https://preview.iiif.io/cookbook/0514-ml-usage-tags/recipe/0514-ml-usage-tags/

Thank you very much to everyone who shared helpful feedback and recommendations so far, I really appreciate your time and consideration. Looking forward to continuing to work with the community to discuss this and potentially move further along through the official pipelines.

alliomeria · 2024-07-25T12:40:33Z

Hello everyone watching this pull request 👋

During yesterday’s IIIF AI + ML Group Meeting, I had the opportunity to present again on this proposal for ML/AI Usage statements in IIIF Manifests. In the presentation follow up discussions, Ellen Van Keer shared an interesting question/comment about how these statements might work within the context of the recently enacted EU AI Act. Specifically, Ellen was concerned that EU organizations might not be able to apply these usage statements if they were not the primary copyright holder for a given object/resource.

Ellen, thank you so much for raising this important topic during the call. If possible, could you please provide references supporting the concern that institutions may not be able to apply any kind of 'opt-out' or usage statement unless they are the primary copyright holder for a work? I'm also curious how this might apply to other usage/rights statements already at play currently.

From my what I am reading stateside, I am not seeing the text for opt-out mechanisms or usage statements described in that particular way. In the law itself (EN version here), I see this text "Where the rights to opt out has been expressly reserved in an appropriate manner, providers of general-purpose AI models need to obtain an authorisation from rightsholders if they want to carry out text and data mining over such works."

In legal analyses and practical discussions, such as the ones noted here and a few more below this message (sorry for the many links, trying to go through due diligence), it seems like the EU AI Act and TDM Directive can be interpreted differently and are still not yet fully defined.

From this source:

"The EU AI Act contains a provision that equates AI/machine learning with “text-and-datamining” (TDM) under the EU Text and Data Mining Directive.[1] Consequently, “machine learning” is allowed, provided that:

the person programming the machine-learning functionality has had lawful access to the content for the purpose of text and data extraction; and
the owner of the copyright and related rights and/or the database owner have not expressly reserved the extraction of text and data (the so-called opt-out mechanism).

The EU AI Act is expected to enter into force in 2024 and will fully apply 24 months thereafter. However, the TDM exception under the EU Text and Data Mining Directive already exists. Therefore, the TDM exception for machine learning can already be enforced in anticipation of the EU AI Act’s interpretation.”

Would anyone be willing to share their perspective on the potential procedures or enforcement mechanisms at play for the EU AI Act and TDM Provision in terms of opt-out requests, any kind of usage statements applicability? Is this an area where it is anticipated there is going to be variance between institutions and local policies?

In any case, I think that these ML/AI Usage Statements could still be useful, and even provide an actionable mechanism for helping an institution comply with an "opt out" request that was "expressly reserved in an appropriate manner".

Thanks again for bringing up this important related potential factor, Ellen. And thanks to everyone who has been taking this proposal into consideration and sharing feedback. I really appreciate everyone's time and expertise.

Additional Links:

alliomeria · 2024-08-19T20:04:51Z

Hello everyone who may watching this repo/issue, checking in to see if anyone might have time to add some follow up comments to the issue discussion noted here. It would be great to have more perspective about the EU AI Act & TDM Provision considerations. Thanks for your time! (I also gave a IIIF Slack ping, so apologies for the redundancy in messages related to this.)

veesalu · 2024-08-20T11:53:02Z

I believe Ellen referred to the DSM (Digital Single Market Strategy) which also applies to digital repositories of cultural heritage institutions in the EU.

The copyrights and licenses question is rather firmly regulated and only the rights holder has the right to assign licenses or access/usage terms to the works in copyright. CHIs in the EU can't legally assign licenses for works for which we don't own the rights.

There are different ways for getting material into the collections but for National Library of Estonia that is based on the Legal Deposit Copy Act. A publication must be submitted to NLE and we have the obligation to preserve it long term and make it available in accordance with the Copyright Act. During the act of deposit and based on the Copyright Act the rights holder assigns licenses and/or terms (in our case either CC or RightsStatements) based on their wishes and intentions. We also don't have the right to change access and usage terms of oprhan and out-of-commerce works based on our own judgement, we use EUIPO's portals in order to get the grounds to make them available.

Now, I guess it depends on how the ML usage tags are defined in the landscape of access and rights. The first idea that also came to my mind during the call was that we could use the tags for our own publications and to others we could add "check with the rights holder". Like Ellen, I doubt we could legally bindingly apply the tags for other rights holders' works if the ML usage tags are defined as licenses or usage conditions similar to CC or RightsStatements.

Regarding the TDM, data and text mining of works in copyright can only be done on the premises of NLE and outside researchers can only leave the premises with cleaned and worked on data. Also, the research must be done in "motivated" amount, meaning they can't collect and use our entire collection. In order for a researcher or research institution to get access to the data, they need to file an application in which they present their research, justify the need for the data and explain what they do with it. So, the TDM simply doesn't mean that anyone who says that they do research have automatic access to the data. There was a legal analysis carried out for our digital lab, there's a summary in English -> https://digilab.rara.ee/wp-content/uploads/2023/03/Virtual-LAB_eng_oigusanaluus.pdf

Another issue that I have been thinking about (and have yet to reach a conclusion also for myself) is that in Europe, we are in the make-it-accessible-and-reuse-freely stage. European Commission is geared towards accessibility, popularisation and re-use of cultural heritage. It's quite impossible to get funding for infrastructure or software if you don't plan "smart solutions". At the NLE we are currently applying for funding for three (two EC funded) AI/data science projects and we are building our own ML solution for automatic cataloguing. We have the European Collaborative Cloud for Cultural Heritage and Common European Data Space for Cultural Heritage, which both aim to reduce duplication of data and improve collaboration. I'm not sure how easy sell the ML usage restrictions could be in Brussels. But this already is a whole other discussion in itself.

alliomeria · 2024-08-21T16:47:09Z

Thank you for your thoughtful follow up veesalu, and for providing a link to the NLE's analysis that informs your institution's approach to TDM. It's really interesting to read through the perspective and the context you're all working with. Good luck with your pursuits of AI/data science projects and ML cataloging assistance tools. Looking forward to reading about your outcomes down the road. :)

Related to this topic, I wanted to note that CC is moving towards being receptive to the idea that perhaps creators should be able to have additional options within the CC licensing framework, loosely termed as "preference signals" in this blog piece where they discuss their early explorations on this concept: https://creativecommons.org/2024/07/24/preferencesignals/. (I will be reaching out to CC to ask about this, maybe there's some space for collaboration at a shared table.)

I understand that applying nuance within the frameworks of open sharing culture can be challenging. That said, I still think there are ways we can better attune our practices to the complexities around the considerations artists, authors, and other creators and content caretakers are facing in the modern AI/ML internet landscape.

veesalu · 2024-09-13T12:17:01Z

That was very interesting reading, thanks!

I lean towards agreeing that tags are of use, the issue is that in the EU only the rightsholders can opt-in or opt-out and in most of the cases, we are not the rightsholders. So implementing those is not just in-house project of deciding that from now on we do it like that, it needs bigger change in processes in general. I'm a little bit on the fence about this issue, because, on the one hand, I feel that we should make as much of CH accessible as possible, but on the other hand, we need to consider reasonable infrastructure loads, etc.

Another issue in using or not using CH data in ML is that right now LLMs don't really speak small languages (like Estonian) and the text corpora held in our collections is valuable material for training the models.

I do agree that this is a discussion that we should continue.

alliomeria · 2024-09-16T14:29:00Z

This is definitely a topic and practice area with a lot of nuanced considerations at play, for cultural heritage related and other fields that may end using ML technology. I think the true usefulness of ML assisted tools really remains to be seen in many ways, and I hope that there are will be actual comparative studies conducted in CH/GLAM for analyzing the effectiveness of ML tools compared with traditional practices and other technical approaches. I also hope that we can have an impact on how particular ML technical applications are developed for our field.

Thanks again for your sharing your time and feedback related to this issue.

alliomeria added 3 commits July 12, 2024 08:34

Create index.md

aa5c1a0

Create manifest.json

647f7e1

Update recipe folder to include issue number

6241c45

glenrobson mentioned this pull request Jul 12, 2024

0514 ml usage tags #515

Closed

glenrobson linked an issue Jul 12, 2024 that may be closed by this pull request

ML/AI Usage Tags Recipe #514

Open

glenrobson commented Jul 12, 2024

View reviewed changes

alliomeria added 2 commits July 12, 2024 10:14

Update manifest.json

e0adaf2

Remove errant brackets

Update manifest.json

d166158

Object structure change for validation

glenrobson commented Jul 12, 2024

View reviewed changes

Update index.md

77074aa

Correct topic for validation; update line references as needed for manifest.json changes

github-actions bot deployed to staging July 12, 2024 14:58 View deployment

Update index.md

507c140

github-actions bot deployed to staging July 12, 2024 15:02 View deployment

Update index.md

b1fcc6a

github-actions bot deployed to staging July 17, 2024 20:21 View deployment

Update manifest.json

19ebd3a

DiegoPino reviewed Jul 17, 2024

View reviewed changes

Update manifest.json

5c93302

Context update (:

azaroth42 mentioned this pull request Jul 17, 2024

rights extensions are actually just registrations IIIF/api#2309

Open

Update manifest.json

6ddfc4e

Return to normal @context

alliomeria added 3 commits July 19, 2024 11:51

Update index.md

eff85e4

Update statements related to Wikidata, rights, registries

Update preview.yml

a5bb159

To sidestep validation error, per Glen

Update preview.yml

5b34696

remove indentation

Update preview.yml

191ec32

both lines (:

github-actions bot deployed to staging July 19, 2024 16:01 View deployment

Update index.md

4253edc

Bump down JSON snippets for `rights`

github-actions bot deployed to staging July 19, 2024 16:27 View deployment

Update index.md

ae28788

- Adjust order or rights & requiredStatement - Add more detailed information about the URIs, potential current & potential future machinable, and notes detailed the Example shown

github-actions bot deployed to staging July 19, 2024 18:29 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0514 ml usage tags #516

0514 ml usage tags #516

glenrobson commented Jul 12, 2024

glenrobson Jul 12, 2024

alliomeria Jul 12, 2024

glenrobson Jul 12, 2024

alliomeria Jul 12, 2024

azaroth42 Jul 12, 2024

alliomeria Jul 12, 2024 •

edited

Loading

alliomeria commented Jul 12, 2024

glenrobson Jul 12, 2024

alliomeria Jul 12, 2024

DiegoPino Jul 17, 2024

alliomeria commented Jul 17, 2024

alliomeria commented Jul 17, 2024 •

edited

Loading

DiegoPino commented Jul 17, 2024 •

edited

Loading

azaroth42 commented Jul 17, 2024

DiegoPino commented Jul 17, 2024 •

edited

Loading

azaroth42 commented Jul 18, 2024

alliomeria commented Jul 18, 2024 •

edited

Loading

alliomeria commented Jul 19, 2024

glenrobson commented Jul 19, 2024

kirschbombe commented Jul 19, 2024

kirschbombe commented Jul 19, 2024

alliomeria commented Jul 19, 2024

alliomeria commented Jul 25, 2024

alliomeria commented Aug 19, 2024

veesalu commented Aug 20, 2024

alliomeria commented Aug 21, 2024

veesalu commented Sep 13, 2024

alliomeria commented Sep 16, 2024 •

edited

Loading

0514 ml usage tags #516

Are you sure you want to change the base?

0514 ml usage tags #516

Conversation

glenrobson commented Jul 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alliomeria Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

alliomeria commented Jul 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alliomeria commented Jul 17, 2024

alliomeria commented Jul 17, 2024 • edited Loading

DiegoPino commented Jul 17, 2024 • edited Loading

azaroth42 commented Jul 17, 2024

DiegoPino commented Jul 17, 2024 • edited Loading

azaroth42 commented Jul 18, 2024

alliomeria commented Jul 18, 2024 • edited Loading

alliomeria commented Jul 19, 2024

glenrobson commented Jul 19, 2024

kirschbombe commented Jul 19, 2024

kirschbombe commented Jul 19, 2024

alliomeria commented Jul 19, 2024

alliomeria commented Jul 25, 2024

alliomeria commented Aug 19, 2024

veesalu commented Aug 20, 2024

alliomeria commented Aug 21, 2024

veesalu commented Sep 13, 2024

alliomeria commented Sep 16, 2024 • edited Loading

alliomeria Jul 12, 2024 •

edited

Loading

alliomeria commented Jul 17, 2024 •

edited

Loading

DiegoPino commented Jul 17, 2024 •

edited

Loading

DiegoPino commented Jul 17, 2024 •

edited

Loading

alliomeria commented Jul 18, 2024 •

edited

Loading

alliomeria commented Sep 16, 2024 •

edited

Loading