Making the Metadata API public #40676

sorbaugh · 2023-09-28T09:03:51Z

Requirements

Use cases

Existing

Store width and height for pictures
Store GPS coordinate for pictures
Store user readable location for pictures

Planned

🖼️ Sorting pictures by taken date instead of modification date photos#87
🖼️ iOS live photos support #40242
🖼️ EXIF data display #39714
Generate blurhash for pictures for files, photos, and social

Potential

Tags on files

Features

Already implemented

Store arbitrary metadata as a string and link it to a fileid and a data name.
Allow the metadata to be manually and granularly exposed to WebDAV requests.
Trigger providers to generated metadata during files:scan and files' updates (NodeWrittenEvent).
Delete metadata upon files deletion (NodeDeletedEvent). But needs adjustments: [Bug]: oc_file_metadata preservation issues with trashbin #34424

Already missing

Support dependency among providers to allow providers to depend on the work of other providers. Example: MapMediaToPlaceJob in the Photos app.
Execute the provider in a background process to not slow down the request. Example: MapMediaToPlaceJob in the Photos app.
Support listing all different metadata value for a given user and metadata name. Example: listing all places linked to pictures in the Photos app.
Support listing all files with a given metadata value and name for a given user. Example: listing all pictures linked a place in the Photos app.

Needed for the new use cases

Ability to use a metadata in an orderby directive in a WebDAV SEARCH request (sorting picture by taken date)
Automatically handle the exposition of the metadata in WebDAV requests. (show EXIF data in sidebar)
Allow clients to set the content of a metadata. (live photos)

PRs

artonge · 2023-09-28T12:49:33Z

Technical requirements

Database

Indexing all the metadata is unnecessary, so we have one table to store the all the metadata, and another table to store indexed metadata.
The second table contains a copy of some metadata contained in the first table.
The second table can be rebuilt from the first one. This opens the way to be smarter about which data we keep in the indexed table. In the future, we could drop some of its data, or populate it on demand.

The last_update column is used to know how old is a given set of metadata.
The unique column is used to avoid race condition when updating metadata.

The indexed table contains two columns for the value. One is a varchar, the other a bigint. This allows to optimize the usage of both type of data.

oc_metadata

Name	Type
fileid	`varchar`
metadata	`text`
unique	`varchar`
last_update	`datetime`

oc_metadata_index

Name	Type
fileid	`varchar`
key	`varchar`
value_string	`varchar`
value_int	`bigint`
last_update	`datetime`

Data format in the database

Prefixing the name of the metadata prevents conflicts, and allow us to know who created a metadata. It is not decided yet whether we should enforce the prefixing.

A special key _indexed_values keeps track of which property should be indexed.

{
	"files:exif": {
		"value": {
			"width": 0,
			"height": 0,
			"coordinate": {
				"latitude": 0,
				"longitude": 0
			},
			"taken_date": 123456789
		},
		"type": "array",
		"indexed": true
	},
	"files:blurhash": {
		"value": "azertyuiop",
		"type": "string"
	},
	"photos:place": {
		"value": "Paris",
		"type": "string",
		"indexed": true
	},
	"files:last_access": {
		"value": ["user1", "user2"],
		"type": "string[]"
	},
	"photos:taken_date": {
		"value": 123456789,
		"type": "int",
		"indexed": true
	},
	"files:live_photos": {
		"value": "1234",
		"type": "string"
	},
	"files:state": {
		"value": "editing",
		"type": "string"
	},
	"files:tags": {
		"value": ["tag1", "tag2"],
		"type": "string[]"
	}
}

Populating metadata

On file creation or edit, an event is broadcasted with current metadata. App can listen to this event and change the metadata. Metadata are updated in the database at the end of the event.
Another event is dispatched in the same way, but inside a background job, allowing apps to do heavier work to generate the metadata.
Clients should be able to update the value of a metadata in PROPPATCH requests.
Metadata can be set as read (default) or indexed. All 'read' metadata related to a file are stored in the oc_metadata as a single JSON
If set as indexed, the metadata will be stored in oc_metadata and an entry will also be generated in the table oc_metadata_indexed.

WebDAV

Two options for requesting metadata:

<nc:metadata:files:exif></nc:metadata:files:exif>
<nc:metadata:files:blurhash></nc:metadata:files:blurhash>

<nc:metadata>
    <nc:metadata:files:exif>
    <nc:metadata:files:blurhash>
</nc:metadata>

artonge · 2023-09-28T12:49:41Z

artonge · 2023-09-28T12:49:57Z

@artonge
~~While I like the 2 separated events, we might miss the security feature included in the background jobs that limit the possibility to run the same job multiple time in parallel ?~~
2 seperated events, we store a unique key/timestamp to compare with value at read time before writing to avoid race condition/data loss on update of the item in database

Also can be better that your getMetadata() returns an object, based on a model that contains an array (to store the metadata) and single setters/getters for each type (bool, string, int, array, ...)

We load the object within the event and each app can read from it. If an app decide to update some data (setters will update an internal boolean 'updated'), we store the updated version of the JSON within the database.

The object is serialized to be stored in the database, and import/deserialize when needed.

Also, I would separate the creation of a new metadata and its configuration as indexeable:

the metadata object contains few methods: listIndexes(): array; addIndex(string); removeIndex(string);
when an entry is set as index, we keep the key/value pair within the metadata, and store the key in an array in the JSON itself. Then we update/create/maintains the related entry within the metadata_index table.

getMetadataForFile(fileid: string): {...}
setMetadataForFile(fileid: string, key: string, value: mixed, indexed: bool): {...}

maybe getMetadata(): Metadata; and saveMetadata(Metadata);

Should we add some lazy loading getMetadata(): Metadata; in Node ? Might helps a lot when having a list of files and a need for metadatas

If we start adding methods to Node, we could go with setMetadata(Metadata), so we directly feed the object freshly created from filecache with left joined metadata the data from the select statement.

getIndexedMetadataValueForUserAndKey(userId: string, key: string): <string|number>[]
getFilesForUserAndKeyAndIndexedValue(userId: string, key: string, value: string): string[]

~~This would be used to search for files in the database ?~~
I am more favorable in providing a small tool, QueryHelper ?, to help developers to join the right table and apply correct where conditions to an already existing request.

edit: both (queryhelper+prepared methods) will be available

PhilippSchlesinger · 2023-11-21T10:11:25Z

Is there an overview ticket that tracks clients implementations using the metadata API?
As nextcloud/photos#87 is listed as planned in this ticket, you may as well want to add (or track) requests for corresponding requests for Android nextcloud/android#10425 and Windows nextcloud/desktop#6052.

sorbaugh added enhancement 0. Needs triage Pending check for reproducibility or if it fits our roadmap ❄️ 2023-Winter labels Sep 28, 2023

sorbaugh added this to the Nextcloud 28 milestone Sep 28, 2023

sorbaugh assigned ArtificialOwl Sep 28, 2023

sorbaugh added this to 📁 Files team Sep 28, 2023

sorbaugh moved this to 📄 To do (~10 entries) in 📁 Files team Sep 28, 2023

artonge self-assigned this Sep 28, 2023

artonge moved this from 📄 To do (~10 entries) to 🏗️ In progress in 📁 Files team Sep 28, 2023

ArtificialOwl mentioned this issue Oct 12, 2023

IFilesMetadata #40761

Merged

21 tasks

This was referenced Oct 23, 2023

Support dynamic metadata request on PROPFIND requests #40964

Merged

Use new metadata API for providers nextcloud/photos#2104

Merged

SystemKeeper mentioned this issue Nov 7, 2023

Include image metadata nextcloud/spreed#10844

Closed

AndyScherzinger closed this as completed Nov 14, 2023

github-project-automation bot moved this from 🏗️ In progress to ☑️ Done in 📁 Files team Nov 14, 2023

This was referenced Nov 17, 2023

Sort Media view not by creation time but by EXIF date nextcloud/android#10425

Open

show date taken for images in explorer with virtual files nextcloud/desktop#6052

Open

This was referenced Nov 23, 2023

Show and edit EXIF Tags nextcloud/photos#226

Open

Implement BlurHash placeholders for image preview with VFS nextcloud/desktop#6249

Open

Implement BlurHash placeholders for image preview nextcloud/android#12214

Open

This was referenced Dec 8, 2023

Add Photos timeline nextcloud/android#7451

Closed

Details of an image shall show meta-data nextcloud/android#481

Closed

make file metadata (exif and others) available with VFS nextcloud/desktop#6293

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making the Metadata API public #40676

Making the Metadata API public #40676

sorbaugh commented Sep 28, 2023 •

edited by artonge

Loading

artonge commented Sep 28, 2023 •

edited

Loading

artonge commented Sep 28, 2023 •

edited

Loading

artonge commented Sep 28, 2023

PhilippSchlesinger commented Nov 21, 2023 •

edited

Loading

Making the Metadata API public #40676

Making the Metadata API public #40676

Comments

sorbaugh commented Sep 28, 2023 • edited by artonge Loading

Requirements

Use cases

Existing

Planned

Potential

Features

Already implemented

Already missing

Needed for the new use cases

PRs

artonge commented Sep 28, 2023 • edited Loading

Technical requirements

Database

Data format in the database

Populating metadata

WebDAV

artonge commented Sep 28, 2023 • edited Loading

TODO

artonge commented Sep 28, 2023

PhilippSchlesinger commented Nov 21, 2023 • edited Loading

sorbaugh commented Sep 28, 2023 •

edited by artonge

Loading

artonge commented Sep 28, 2023 •

edited

Loading

artonge commented Sep 28, 2023 •

edited

Loading

PhilippSchlesinger commented Nov 21, 2023 •

edited

Loading