Don't cleanup when overwriting uniqueness/mistakenness/hardness runs #164
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Requires voxel51/fiftyone#3978 to function.
Multiple users have wanted/expected that they can run the code below to compute uniqueness on disjoint subset views of a dataset and store the uniqueness values under the same field.
Previously, this code would silently delete the existing uniqueness values because, technically, the user is overwriting a brain run with same key (
brain_key == uniquenesss_field
), which triggers a call to the existing run'scleanup()
method, which deletes the entireuniqueness_field
.Now,
cleanup()
is not called when runningcompute_uniqueness()
multiple times with the sameuniqueness_field
, which allows the user to build up uniqueness values over multiple runs. The reason this was not previously allowed is that, technically, methods likeload_brain_view()
will now only load the last view on which uniqueness was computed; the dataset is no longer "aware" that multiple brain runs were executed on the dataset.The way to have the best of both worlds would be to add a separate
brain_key
argument independent ofuniqueness_field
and also update thecleanup()
method to only clear theuniqueness_field
values if the input collection is a view rather than a dataset. However, I think the less invasive change in this PR is sufficient for now.