Don't cleanup when overwriting uniqueness/mistakenness/hardness runs #164

brimoor · 2023-12-30T18:41:23Z

Requires voxel51/fiftyone#3978 to function.

Multiple users have wanted/expected that they can run the code below to compute uniqueness on disjoint subset views of a dataset and store the uniqueness values under the same field.

Previously, this code would silently delete the existing uniqueness values because, technically, the user is overwriting a brain run with same key (brain_key == uniquenesss_field), which triggers a call to the existing run's cleanup() method, which deletes the entire uniqueness_field.

Now, cleanup() is not called when running compute_uniqueness() multiple times with the same uniqueness_field, which allows the user to build up uniqueness values over multiple runs. The reason this was not previously allowed is that, technically, methods like load_brain_view() will now only load the last view on which uniqueness was computed; the dataset is no longer "aware" that multiple brain runs were executed on the dataset.

The way to have the best of both worlds would be to add a separate brain_key argument independent of uniqueness_field and also update the cleanup() method to only clear the uniqueness_field values if the input collection is a view rather than a dataset. However, I think the less invasive change in this PR is sufficient for now.

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("cifar10", split="test")

model = foz.load_zoo_model("mobilenet-v2-imagenet-torch")
dataset.compute_embeddings(model, embeddings_field="embeddings")

#
# Run uniqueness on each class individually
#

classes = dataset.distinct("ground_truth.label")
for label in classes:
    view = dataset.match(F("ground_truth.label") == label)
    fob.compute_uniqueness(view, embeddings="embeddings")

assert len(dataset) == len(dataset.exists("uniqueness"))

#
# Cleanup
#

dataset.delete_brain_run("uniqueness")
assert dataset.has_sample_field("uniqueness") is False

kaixi-wang · 2024-01-15T14:38:56Z

Isn't the uniqueness value dependent on the input set of samples since it's a relative measure of the distance between samples? So don't you have to delete previous values because adding or removing samples may change the value for all?

brimoor · 2024-01-15T14:44:03Z

yes the values are dependent on the input samples, but users seem to prefer to be able to run it on disjoint subsets (eg each class individually), knowing that the values cannot necessarily be compared between class.

don't cleanup when overwriting uniqueness/mistakenness/hardness runs

a265b05

brimoor added the enhancement Code enhancement label Dec 30, 2023

brimoor requested a review from a team December 30, 2023 18:41

brimoor self-assigned this Dec 30, 2023

brimoor merged commit fc0d7a7 into develop Jan 14, 2024
2 of 4 checks passed

brimoor deleted the run-cleanup-flag branch January 14, 2024 20:23

brimoor mentioned this pull request Jan 16, 2024

Release v0.15.0 #173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't cleanup when overwriting uniqueness/mistakenness/hardness runs #164

Don't cleanup when overwriting uniqueness/mistakenness/hardness runs #164

brimoor commented Dec 30, 2023 •

edited

Loading

kaixi-wang commented Jan 15, 2024

brimoor commented Jan 15, 2024

Don't cleanup when overwriting uniqueness/mistakenness/hardness runs #164

Don't cleanup when overwriting uniqueness/mistakenness/hardness runs #164

Conversation

brimoor commented Dec 30, 2023 • edited Loading

kaixi-wang commented Jan 15, 2024

brimoor commented Jan 15, 2024

brimoor commented Dec 30, 2023 •

edited

Loading