Support workspaces distributed over multiple numa nodes #302

kbowers-jump · 2023-04-28T03:39:59Z

For when one numa-node worth of DRAM just isn't enough.

Lower level support this also brought into existence is general support for multi-numa-node shared memory regions and a cstr API for parsing compactly specified taskset-cpu-like ulong sequences.

Representation is taskset-friendly and very similar to that for specify the tile to cpu mapping.

…ver multiple numa nodes

kbowers-jump · 2023-05-02T03:11:46Z

First, thanks for your interest! Second, apologies I didn't notice the earlier comment (some people were blocked on this PR and I reflexively clicked merge when I saw it was approved to unblock them ... didn't notice you had commented in the meantime). Third, no objection to the suggestions. As hopefully should be evident from the code already, we rarely met a comment we didn't like (at least if it is accurate and professional).

Overall, we tend to focus most on commenting and documenting in headers during development phases as that is usually pointy end of the stick.

E.g. typical dev pattern is "get code ... look for standalone documentation ... oh there is none ... or ... oh it is incomplete and inaccurate because the orig dev forget to update the manually maintained off-to-the-side documentation when they added pressing feature du jour ... look at headers ... okay ... still no documentation but have some idea of what to do ... oh ... there are a bunch of edge cases the orig dev didn't think through because they didn't document the header thoroughly or only tried to use their API it in their very bespoke use case ... sigh ... look at code ... discover there still no comments because the orig dev thought the code was self documenting ... a lot of cursing about the orig dev ...

And usually the orig dev and the dev are the same person ... just separated in time by a few weeks.

That is, big believer in the notion, IIRC originally attributed to Knuth, 'I don't need to see your code, I just need to see your headers'. And a big believer that developers spend orders of magnitude more time trying to understand and debug already written code than the physical act of writing (typing) the code. And a big believer that the act of writing detailed header documentation forces a devs to think through subtle edge cases, saving a lot of time in future debugging. And a big believer that writing detailed unit tests forces devs to eat their own dog food and do what hardware engineers call "design-for-test". (If the dev can't stand using their own API and/or their API doesn't support unit test with good coverage of edge cases to be written, the dev will have to fix the API before inflicting it on others.)

So lots of documentation and commenting is actually a massive productivity boost long term.

To that end, a huge amount of effort goes into the headers and having the documentation in the same place where the dev is working reduces friction to maintaining documentation. (Even then I see still drift but at a much lower rate and it is much harder for devs to justify the drift when they were staring at the doc the entire time they were banging away at the code.)

Additionally, the header docs are, not coincidentally, pretty close to be usable by doxygen-like tools to automatically generate standalone docs (manually generated standalone documentation off the side just doesn't get maintained in my experience ... YMMV).

Once we get outside of the headers, to minimize risk of conflicting information / drift / places to update in the code / etc, we try to avoid redundant comments between implementations, headers and tests.

So, with all that said:

If there are headers that do not have suitable descriptions like you described at the top, that's something we should fix and happy to have independent eyeballs looking at it and pens writing it.
Less concerned about it in other files but not opposed to it. Just want to minimize places where doc drift can occur.
Unit tests usually try to parallel the order things are presented in a header (they are usually written that way ... bring up the header in one window and write the unit test sequentially, ideally giving coverage of all the edge case / branch / etc of every single thing in the header and clarifying documentation in the process). Unit tests that take longer usually have some logging to help indicate what they are testing which naturally creates some of the sections you were asking about. No objection to additional one liners like you described in other cases though.

TL;DR

Yep

Mabubbbbbbb334 · 2024-05-01T23:37:55Z

src/util/cstr/test_cstr.c

@@ -47,7 +47,7 @@ main( int     argc,
  fd_rng_t _rng[1]; fd_rng_t * rng = fd_rng_join( fd_rng_new( _rng, 0U, 0UL ) );

  int ctr = 0;
-  for( long iter=0; iter<10000000; iter++ ) {


Best regards

1000544753235

kbowers-jump added 3 commits April 27, 2023 21:47

API for parsing cstrs into ulong sequences

68a99b6

Representation is taskset-friendly and very similar to that for specify the tile to cpu mapping.

Added ability to specify shared memory regions that are distributed o…

7aa862d

…ver multiple numa nodes

Plumbed through to support for multi-numa workspaces too

ca3674b

asiegel-jt approved these changes Apr 28, 2023

View reviewed changes

kbowers-jump added this pull request to the merge queue Apr 28, 2023

Merged via the queue into main with commit 6c531eb Apr 28, 2023

kbowers-jump deleted the kbowers-jump/multi-numa-shmem-regions branch April 28, 2023 16:32

ripatel-fd mentioned this pull request May 4, 2023

Add README.md to ballet/disco/funk/tango #320

Closed

Mabubbbbbbb334 reviewed May 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support workspaces distributed over multiple numa nodes #302

Support workspaces distributed over multiple numa nodes #302

kbowers-jump commented Apr 28, 2023

kbowers-jump commented May 2, 2023 •

edited

Loading

Mabubbbbbbb334 May 1, 2024

Mabubbbbbbb334 May 1, 2024

Support workspaces distributed over multiple numa nodes #302

Support workspaces distributed over multiple numa nodes #302

Conversation

kbowers-jump commented Apr 28, 2023

kbowers-jump commented May 2, 2023 • edited Loading

Mabubbbbbbb334 May 1, 2024

Choose a reason for hiding this comment

Mabubbbbbbb334 May 1, 2024

Choose a reason for hiding this comment

kbowers-jump commented May 2, 2023 •

edited

Loading