Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SCDL Row Feature Index for Performance Improvement (Rebased) #466

Merged
merged 10 commits into from
Nov 25, 2024

Conversation

savitha-eng
Copy link
Collaborator

@savitha-eng savitha-eng commented Nov 20, 2024

Summary

We improve performance for looking up features in SCDL's row feature index via the .lookup function.

Details

We modify SCDL's RowFeatureIndex's feature_arr to store dictionaries (structured like: {"feat1_name": np.array(values1), "feat2_name": np.array(values2)} instead of Pandas dataframes. This significantly speeds up the feature lookup when return_features=True and feature_vars is specified in the call to scdl.get_row().

Usage

No changes for user interaction.

Testing

Added and modified unit tests for RowFeatureIndex and SingleCellMemmapDataset.

Tests for these changes can be run via:

pytest -v sub-packages/bionemo-scdl/tests/bionemo/scdl/index/test_row_feature_index.py
pytest -v sub-packages/bionemo-scdl/tests/bionemo/scdl/io/test_single_cell_memmap_dataset.py

@savitha-eng
Copy link
Collaborator Author

/build-ci

@savitha-eng
Copy link
Collaborator Author

/build-ci

@savitha-eng savitha-eng marked this pull request as ready for review November 21, 2024 01:06
@savitha-eng savitha-eng requested a review from edawson November 21, 2024 01:06
@savitha-eng
Copy link
Collaborator Author

/build-ci

@savitha-eng
Copy link
Collaborator Author

/build-ci

Copy link
Collaborator

@jstjohn jstjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@savitha-eng
Copy link
Collaborator Author

/build-ci

@polinabinder1 polinabinder1 merged commit 11a067a into main Nov 25, 2024
4 checks passed
@polinabinder1 polinabinder1 deleted the savitha/scdl-performance-improvements branch November 25, 2024 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants