Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++/python] Add soma_type argument for SOMAObject::open #2350

Merged
merged 6 commits into from
Mar 29, 2024

Conversation

nguyenv
Copy link
Member

@nguyenv nguyenv commented Mar 29, 2024

Issue and/or context:

SOMAObject::open introduced a performance degradation due to its overzealous usage of tiledb::Object::object, which checks for the type by opening the purpoted TileDB object at the given URI. In some cases, the type of the TileDB object is already known beforehand which makes these operations unnecessary.

The example below takes ~15 seconds to run and yields three 404 errors.

SOMA_URI = "tiledb://TileDB-Inc/soma-exp-tabula-sapiens-immune"
var_columns = ["soma_joinid", "var_id", "ensemblid", "feature_type"]

with tiledbsoma.open(SOMA_URI, context=sctx) as exp:
    var = (
        exp.ms["RNA"]
        .var.read(column_names=var_columns)
        .concat()
        .to_pandas()
        .set_index("soma_joinid")
    )

With the changes in this PR, the example takes ~8 seconds and yields no 404 errors.

Changes:

Bypass the need to use tiledb::Object::object by explcitly passing in a soma_type argument of either SOMAArray or SOMAGroup.

* Bypass the need to use `tiledb::Object::object`, which checks for
  the type by attempting the open the object, by explcitly passing in a
  `soma_type` argument of either `SOMAArray` or `SOMAGroup`
Copy link

codecov bot commented Mar 29, 2024

Codecov Report

Merging #2350 (f8a3f47) into main (7eef2c6) will decrease coverage by 3.24%.
Report is 1 commits behind head on main.
The diff coverage is 61.36%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2350      +/-   ##
==========================================
- Coverage   82.17%   78.94%   -3.24%     
==========================================
  Files          89      140      +51     
  Lines        8226    10856    +2630     
  Branches        0      216     +216     
==========================================
+ Hits         6760     8570    +1810     
- Misses       1466     2186     +720     
- Partials        0      100     +100     
Flag Coverage Δ
libtiledbsoma 67.77% <28.57%> (?)
python 90.59% <91.30%> (+0.02%) ⬆️
r 75.11% <ø> (+0.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 90.59% <91.30%> (+0.02%) ⬆️
libtiledbsoma 49.21% <28.57%> (∅)

@aaronwolen
Copy link
Member

[sc-44157]

Copy link

This pull request has been linked to Shortcut Story #44157: Notebook cloud-query performance, story 1 of 2.

johnkerl
johnkerl previously approved these changes Mar 29, 2024
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@johnkerl
Copy link
Member

@aaronwolen @nguyenv in order to take full advantage of this PR we should be using

with tiledbsoma.Experiment.open(SOMA_URI, context=sctx) as exp:

at the notebook level -- making tiledbsoma.open more performant will require core mods this PR cannot implement.

@johnkerl johnkerl self-requested a review March 29, 2024 14:38
@nguyenv nguyenv dismissed johnkerl’s stale review March 29, 2024 14:43

Found additional 404 error.

Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a missed callsite -- more details in Slack

The root cause is Experiment.open does not have its own impl and neither does CollectionBase.open and so therefore tiledbsoma.Experiment.open's impl is TileDBObject.open (along the inheritance hierarchy) and there is not a soma_type plumbed through to there

@nguyenv nguyenv requested a review from johnkerl March 29, 2024 19:36
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢
Thanks @nguyenv !!!! :)

@nguyenv nguyenv merged commit 8b03a5f into main Mar 29, 2024
15 checks passed
@nguyenv nguyenv deleted the viviannguyen/pass-in-tiledb-type-to-open branch March 29, 2024 20:18
github-actions bot pushed a commit that referenced this pull request Mar 29, 2024
* Bypass the need to use `tiledb::Object::object`, which checks for
  the type by attempting the open the object, by explcitly passing in a
  `soma_type` argument of either `SOMAArray` or `SOMAGroup`

* Pass in `SOMAGroup` for all Collection types

* Pass in `SOMAArray` for all Array types

---------

Co-authored-by: John Kerl <kerl.john.r@gmail.com>
github-actions bot pushed a commit that referenced this pull request Mar 29, 2024
* Bypass the need to use `tiledb::Object::object`, which checks for
  the type by attempting the open the object, by explcitly passing in a
  `soma_type` argument of either `SOMAArray` or `SOMAGroup`

* Pass in `SOMAGroup` for all Collection types

* Pass in `SOMAArray` for all Array types

---------

Co-authored-by: John Kerl <kerl.john.r@gmail.com>
johnkerl added a commit that referenced this pull request Mar 29, 2024
…2353)

* Bypass the need to use `tiledb::Object::object`, which checks for
  the type by attempting the open the object, by explcitly passing in a
  `soma_type` argument of either `SOMAArray` or `SOMAGroup`

* Pass in `SOMAGroup` for all Collection types

* Pass in `SOMAArray` for all Array types

---------

Co-authored-by: nguyenv <vivian@tiledb.com>
Co-authored-by: John Kerl <kerl.john.r@gmail.com>
johnkerl added a commit that referenced this pull request Mar 29, 2024
…2354)

* Bypass the need to use `tiledb::Object::object`, which checks for
  the type by attempting the open the object, by explcitly passing in a
  `soma_type` argument of either `SOMAArray` or `SOMAGroup`

* Pass in `SOMAGroup` for all Collection types

* Pass in `SOMAArray` for all Array types

---------

Co-authored-by: nguyenv <vivian@tiledb.com>
Co-authored-by: John Kerl <kerl.john.r@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants