Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize reload docs when performing bulk writes #3942

Merged
merged 1 commit into from
Dec 19, 2023

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Dec 19, 2023

When performing dataset._bulk_write(), only reload the docs that were involved in the mutations, if known.

@brimoor brimoor added the enhancement Code enhancement label Dec 19, 2023
@brimoor brimoor requested review from swheaton and a team December 19, 2023 16:18
@brimoor brimoor self-assigned this Dec 19, 2023
@swheaton
Copy link
Contributor

Test snippet

import time
import fiftyone as fo
ds = fo.zoo.load_zoo_dataset('coco-2017', splits='validation')
ds.clear_sample_field("metadata")
_ = [s for s in ds.iter_samples()]

t1 = time.time()
ds.compute_metadata()
t2 = time.time()
print(t2-t1)

Timings

  1. 1.5s: clear singleton Sample instances first
  2. 64s: On release/v0.23.2
  3. 10s: On this branch

The speedup gets more pronounced as number of samples gets bigger though, so this change is definitely worth it

@@ -7586,7 +7588,7 @@ def _add_collection_with_new_ids(
for old_id, new_id in zip(old_ids, new_ids)
]

dataset._bulk_write(ops, frames=True)
dataset._bulk_write(ops, ids=[], frames=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice; they're new samples so we don't need to reload any singleton samples

Copy link
Contributor

@swheaton swheaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rationale discussed offline prior, so agreed on that.
implementation lgtm; changes needed, see prior comment

@swheaton swheaton merged commit f0e2c73 into release/v0.23.2 Dec 19, 2023
7 of 9 checks passed
@swheaton swheaton deleted the optimize-reload-docs branch December 19, 2023 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants