SOMA Profiler RFCs #162

beroy · 2023-05-09T17:25:12Z

No description provided.

bkmartinjr · 2023-05-09T18:32:16Z

@beroy - one thing that would help me review this is some additional context and background information on the requirements and definition of success. Questions top of mind:

Is the primary use case automating a smoke test to detect regressions as part of CI? Or is the goal to help diagnose/debug regressions once they are detected? Or both?
If your goal is a diagnostic tool: my experience is that regressions are often I/O or schema related, or are highly obscured by concurrency issues. I am wondering if a focus on CPU/memory profiling is going to be useful in practice (vs. for example, the built-in TileDB query statistics). Put another way, if your goal is a diagnostic tool, is this the biggest bang for the buck? (very curious to hear what @Shelnutt2 and @gspowley think as I'm sure they have internal tooling that has covered some of this ground).
If your goal is detection/smoke testing for regressions, and you think a historical log of past runs will be useful, do you want to capture core DB statistics?

Lastly (minor point) - we should probably discuss if this belongs in the TileDB-SOMA repo, or as a peer repo (motivated by a goal of keeping each repo simpler/focused - i.e., the old monorepo debate :-)

beroy · 2023-05-10T01:26:42Z

@bkmartinjr, thanks a lot for the great comments. We had a design review last week and discussed some of the related issues but did not cover all the questions here. I try to address them here:
1, 2) The main goals of this are a) figuring the hot spots (perf/mem) in our code stack (either python or R) and try to reduce the bottleneck b) the ability of detect regressions as a part of CI. The generic profiler only allows detection in (b) and not necessarily diagnosis as you mentioned. Diagnosis could be a lot more complicated than what software stack frame graphs can provide.
3) That's a great point. I did not think about it but thinking more, I really think that can be very useful if I can do that as custom profiler! That means we still have the general architecture but we also have a TileDB profiler that gets attached to the app and collects the TileDB stats. Not sure how possible it is to access TileDB API from outside a process. In the worst case we can add an access point or just a flag for doing so.

Finally, this repo suggestion seems to come from an agreement between TileDB and CZI for sharing RFCs. @maniarathi can comment here.

bkmartinjr · 2023-05-10T16:49:10Z

Hi @beroy,

design review last week

Did this review include the TileDB team? Most of the code is in their DB stack (not in SOMA or Census code), and that is also where most of the complexity and performance sensitive code is (e.g, github.com/TileDB/TileDB and github.com/TileDB/TileDB-Py). The core TileDB team has extensive experience with performance work, and I presume have a tool stack already in place. I just want to make sure we are filling a gap that exists, and that we benefit from the existing knowledge and tools to build upon.

TileDB stats

AFAIK, you can't capture them from outside the process, but enabling in-process collection has very little overhead and they are designed to be captured in the normal course of using the DB. Given that you must run the system with your own driver process, I suggest just capturing them as part of that driver process. The total size of stats summary is tiny and can be written to logs as part of running the core tests.

There are some near-term caveats to this due to our double-instantiation of the core DB, @gspowley and others can talk you through their plans to centralize everything (tactically,just capture stats fro both instances, or perhaps even ignore the second if you are focused on read-only perf). The stats API in SOMA is already exposed via tiledbsoma.tiledbsoma_stats_*, and there are equivalent API in the tiledb package for the second DB instance.

johnkerl · 2023-05-10T17:43:52Z

@gspowley and others can talk you through their plans to centralize everything

@bkmartinjr the current POC is @nguyenv

beroy · 2023-05-10T18:46:20Z

@bkmartinjr. TileDB people were not in the review but I discuss their profiling tools with @gspowley. While their tool capture some of the needed stuff the main focus is on distributed nodes. Also my goal is to capture the breakdown across python/R and C++. Your suggestion about collecting DB stats makes total sense (directly log them from the driver). BTW, here by core DB do you mean TileDB? Also, I'm not familiar with the double-instantiation plan. Will check with them

bkmartinjr · 2023-05-10T19:05:25Z

@beroy - I'm using "core DB" and "embedded TileDB" as synonyms.

We end up with two separate instantiations, with their own private memory/context/etc due to the way we are bootstrapping the SOMA C++ layer. As noted by @johnkerl, @nguyenv is POC on this (apologies for earlier misdirect). Using Python as an example (similar issue in R):

We started using only TileDB-Py, which has its own linked TileDB core
We then introduced the SOMA-specific (libtiledbsoma) for a narrow set of code paths, which created a second linked C++ shared object, TileDB context, etc. Initial code path was focused on read, grew to include nnz, etc.
Over time, we are migrating all functionality to the SOMA C++ layer, and at some future date we should be able to remove the TileDB-Py (and TileDB-R) dependency.

I only point this out as the separate core DB instances have their own stats data.

thetorpedodog · 2023-05-10T19:43:59Z

two non–content-related notes from me (these apply to both open RFCs right now so I am pasting this note in both places):

Rebase so that you get the pre-commit format verification thingy
(Optional) Consider formatting to one sentence per line. I did this in 3b5891c, but I didn’t formally write it down, so it’s not a hard Rule but it is a format I have found useful for editing.

atolopko-czi

A few requests for clarifications and decisions about scope of the RFC.

rfcs/profiler.md

rfcs/images/profilerarchitecture.png

atolopko-czi

A few requests for clarifications and decisions about scope of the RFC.

rfcs/profiler.md

johnkerl · 2023-06-06T18:11:13Z

rfcs/profiler.md

+
+### Benefits
+
+- A major benefit of this design is that the profilers (both generic and custom ones) are not necessarily targeted toward single cell applications and can be used for any services across CZI.


Suggested change

- A major benefit of this design is that the profilers (both generic and custom ones) are not necessarily targeted toward single cell applications and can be used for any services across CZI.

- A major benefit of this design is that the profilers (both generic and custom ones) are not necessarily targeted toward SOMA and can be used for various services.

@beroy please make the requested change

rfcs/profiler.md

Co-authored-by: John Kerl <kerl.john.r@gmail.com>

ebezzi

LGTM. As Andrew suggested, I recommend to fill a more detailed implementation/deployment as a separate RFC or tech spec.

rfcs/profiler.md

atolopko-czi · 2023-08-02T14:59:09Z

@beroy @johnkerl As we now have a functioning profiler, can we merge the RFC and close this out?

beroy assigned atolopko-czi, bkmartinjr, maniarathi, ebezzi, gspowley and pablo-gar May 9, 2023

beroy force-pushed the profiler_rfc branch from ba26ab5 to a9d3378 Compare May 10, 2023 21:57

SOMA Profiler RFCs

c737e34

beroy force-pushed the profiler_rfc branch from a9d3378 to c737e34 Compare June 5, 2023 17:53

beroy mentioned this pull request Jun 5, 2023

Design doc for the profiler single-cell-data/TileDB-SOMA#1288

Closed

atolopko-czi requested changes Jun 6, 2023

View reviewed changes

ebezzi reviewed Jun 6, 2023

View reviewed changes

rfcs/profiler.md Outdated Show resolved Hide resolved

rfcs/profiler.md Outdated Show resolved Hide resolved

rfcs/profiler.md Outdated Show resolved Hide resolved

rfcs/profiler.md Show resolved Hide resolved

beroy requested review from gspowley and johnkerl June 6, 2023 16:49

johnkerl requested changes Jun 6, 2023

View reviewed changes

beroy and others added 2 commits June 6, 2023 12:00

Apply the review comments for profiler RFC

e86db4b

Apply suggestions from code review

ec980b8

Co-authored-by: John Kerl <kerl.john.r@gmail.com>

beroy force-pushed the profiler_rfc branch from df01d3f to ec980b8 Compare June 6, 2023 20:10

ebezzi approved these changes Jun 9, 2023

View reviewed changes

johnkerl requested review from johnkerl and atolopko-czi June 9, 2023 21:14

johnkerl reviewed Jun 9, 2023

View reviewed changes

rfcs/profiler.md Show resolved Hide resolved

atolopko-czi mentioned this pull request Aug 2, 2023

Update profiler docs for capturing tiledb stats single-cell-data/TileDB-SOMA#1569

Open

atolopko-czi closed this Aug 2, 2023

atolopko-czi reopened this Aug 2, 2023

johnkerl self-requested a review August 2, 2023 15:10

johnkerl approved these changes Aug 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOMA Profiler RFCs #162

SOMA Profiler RFCs #162

beroy commented May 9, 2023

bkmartinjr commented May 9, 2023

beroy commented May 10, 2023

bkmartinjr commented May 10, 2023

johnkerl commented May 10, 2023

beroy commented May 10, 2023

bkmartinjr commented May 10, 2023

thetorpedodog commented May 10, 2023

atolopko-czi left a comment

atolopko-czi left a comment

johnkerl Jun 6, 2023

johnkerl Aug 2, 2023

ebezzi left a comment

atolopko-czi commented Aug 2, 2023


		### Benefits

		- A major benefit of this design is that the profilers (both generic and custom ones) are not necessarily targeted toward single cell applications and can be used for any services across CZI.

SOMA Profiler RFCs #162

Are you sure you want to change the base?

SOMA Profiler RFCs #162

Conversation

beroy commented May 9, 2023

bkmartinjr commented May 9, 2023

beroy commented May 10, 2023

bkmartinjr commented May 10, 2023

johnkerl commented May 10, 2023

beroy commented May 10, 2023

bkmartinjr commented May 10, 2023

thetorpedodog commented May 10, 2023

atolopko-czi left a comment

Choose a reason for hiding this comment

atolopko-czi left a comment

Choose a reason for hiding this comment

johnkerl Jun 6, 2023

Choose a reason for hiding this comment

johnkerl Aug 2, 2023

Choose a reason for hiding this comment

ebezzi left a comment

Choose a reason for hiding this comment

atolopko-czi commented Aug 2, 2023