Skip to content

Commit

Permalink
Make batch_size an argument in labeler apply function (#209)
Browse files Browse the repository at this point in the history
Users should be able to control the batch size. This is especially helpful when debugging and prototyping on small datasets so that you can see how long it takes to run a labeler on e.g., 10 samples at a time rather than having to wait for a whole 10k samples to process.
  • Loading branch information
scottfleming authored Apr 16, 2024
1 parent 92fa5f6 commit 6b2f778
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion src/femr/labelers/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ def apply(
self,
dataset: datasets.Dataset,
num_proc: int = 1,
batch_size: int = 10_000,
) -> List[meds.Label]:
"""Apply the `label()` function one-by-one to each Patient in a sequence of Patients.
Expand All @@ -85,7 +86,7 @@ def apply(
dataset,
functools.partial(_label_map_func, labeler=self),
_label_agg_func,
batch_size=10_000,
batch_size=batch_size,
num_proc=num_proc,
)

Expand Down

0 comments on commit 6b2f778

Please sign in to comment.