Fix #928 and #929 - Modularize software bus routing, add msg map hash #947

skliper · 2020-10-13T21:40:59Z

Describe the contribution
Fix #928 - refactor SB and move routing to a module
Fix #929 - adds message map hash implementation
Partial #948 - fixed double lock potential

Key concepts:

added SBR module
Message map and routing table are in SBR, access APIs on SB side
SB still owns destination logic
no more route index or pointers being passed around, all route id and message id based
hash message map "oversized" (4x) to minimize collisions (only ~10% single collisions on the real system with a full routing table)
- added debug event for collisions during add
dropped routing push/pop, dropped "key" in direct implementation
around a 10-20% performance hit to hash via rough comparison on a linux box, no memory impact from message id "space", just based on needed routes (with overhead)
hash designed for 32 bit, a change in CFE_SB_MsgId_Atom_t size may require implementation updates

Remaining work:

Unit test stubs (could split into new PR), 2 related TODOs in code

Also

Deletes unused code CFE_SB_FindGlobalMsgIdCnt
Note writes to file not protected by lock (before or after refactor)
CFE_SB_SendPrevSubs isn't really protected by the lock... might be worth removing the locks completely (they don't really solve anything, double lock potential eliminated
- could just use the private CFE_SBR_ForEachRouteId without a lock instead of messages and reduce all the extra logic
Whitespace/alignment is very difficult to work with
Excessive comments (in my opinion)
violates declare variables at the start of function coding standard, all observed violations fixed

Testing performed

All unit tests pass, rough performance testing, spot checked routing table size impacts on core with nominal (256), 64, and 32. 32 causes dropped subscriptions/full table and was useful for observing collisions (~10% one level seen) on a realistic set of message ids. Everything performed as expected.
Passed bundle CI at https://travis-ci.com/github/skliper/cFS/builds/191437587

Expected behavior changes

Individual events for deleting destinations when deleting a pipe removed to avoid a race condition
It's resource intensive to try to report message ID map in message ID order (either search or brute force) for any significantly sized map on an unsorted routing table, allowing out of order makes this reasonable again and gets rid of the need for ForEachMsgId API. Need to trade out of order vs implementing a search.

System(s) tested on

Hardware: cFS Dev Server
OS: Ubuntu 18.04
Versions: bundle + this commit

Additional context
None

Third party code
None

Contributor Info - All information REQUIRED for consideration of pull request
Jacob Hageman - NASA/GSFC

skliper · 2020-10-13T21:41:36Z

Ready for proof of concept discussion...

skliper · 2020-10-14T16:58:58Z

CCB 2020-10-14: Delayed discussion, will splinter or discuss next week.

skliper · 2020-10-14T17:02:03Z

Worth noting this is a step in the direction of implementing just the direct lookup (no longer has the extra abstraction layers for other implementations). The hash implementation will have the extra hash layer, but no need to maintain it in the direct lookup implementation. This results in simplification of the direct lookup implementation.

jphickey

Overall, looks pretty good ... nice cleanup/simplification to SB.

Most critical is really to get the API right, then the rest of the implementation can fall from that.

I'd also like to seriously consider changing the list structure to be circular in nature (where the end of the list points back to the beginning again, rather than NULL) as this makes the code much simpler.

fsw/cfe-core/src/sb/cfe_sbr.h

fsw/cfe-core/src/sb/cfe_sb_priv.c

skliper · 2020-10-15T16:07:32Z

Trade removing the individual unsubscribe events when deleting a pipe to avoid having to unlock which introduces a possible race (old implementation unlocked so could race). Would be messy to store all the messages up to send at the end after the unlock, other suggestions welcome.

skliper · 2020-10-16T17:30:23Z

Implementation/testing update - add a debug event for collisions (on subscribe), added hash implementation.

Enabled hash and there were no collisions with the nominal routing table size. Reduced table size to 32 and there were only 4 single level collisions so the factor of 4 on the hash table seems very reasonable for handing a full routing table (and the hash performed well). The messages past 32 (there's more like 40 or so in the framework) got rejected as expected.

Performing minor cleanup on the test cases (fails for size 32 routing table). Some event counts could theoretically change due to the new debug message but none seen in practice. Could switch subscribe test event count asserts to be >= to handle it, but as long as the hash function isn't changed and the routing table isn't incredibly small (16 or less) they will work fine.

skliper · 2020-10-20T13:56:12Z

Brute force ForEachMsgId is unreasonable for any significantly large MsgId space for the unsorted routing table implementation. Search would also be resource intensive for larger routing tables... only behavior difference from ForEachRouteId is order, propose removing ForEachMsgId (only used for message map reporting, could be sorted offline if needed). Note for sorted routing tables, the behavior of the two APIs is the same (in msgid order).

EDIT - specified this difference is only for unsorted implementation

excaliburtb · 2020-10-20T16:28:39Z

tagging self

- Removed all index/pointer accesses of message map and routing table - All references now via IDs and APIs - Note SB still owns destination logic (unchanged linear linked list) - Limited whitespace fixes for readability - Resolved observed instances of variables not declared at the start of functions - Cleaned comments - Resolved potential double locks in CFE_SB_SendPrevSubs - Route and message write to file no longer guaranteed in msgid order to maintain performance for large msgid space implementations - Removed unused CFE_SB_FindGlobalMsgIdCnt - Clarified CFE_PLATFORM_SB_MAX_MSG_IDS config param description - Eliminated potential race in CFE_SB_PipeDeleteFull - Individual destination removal debug events no longer reported during a CFE_SB_PipeDeleteFull

skliper · 2020-10-21T13:54:30Z

Refactored commits for review. Ready.

- Implementation for direct map and unsorted routing table - Includes full coverage tests - Removed msg key and route stack concepts from direct map

- Message map size based on used routes - Oversized (4x) to limit collisions while retaining resonable size related to routing table (still smaller) - ~10% single collisions seen for full routing table with realistic message ID use - Oversizing means map can never fill, simplifies logic - Observed approximately 10%-20% performance hit, trade against memory use (can now use full 32 bit MsgId space) - Hash intended for 32 bit, if CFE_SB_MsgId_Atom_t size changes may require modification to hash - Also added full coverage unit tests

- Linking SBR with SB unit test, not stubbed - Confirms matching functionality (with updates for intended changes)

skliper · 2020-10-21T14:03:48Z

Squashed cppcheck warning.

astrogeco · 2020-10-21T16:59:14Z

CCB 2020-10-21 APPROVED

Conversation about collisions. System can be sized appropriately to avoid collisions. Users can deterministically predict the bound by appropriately choosing the MsgIDs. There's a debug event upon subscription which alerts about collisions.

Open issue to add documentation about ideal usage and predetermining IDs to avoid collisions.

Because it's a module, users can replace the algorithm for efficient. QUESTION: How is this set up in the build system?

Make separate PR for stubs.

skliper · 2020-10-21T17:28:42Z

Issues submitted, see #961 and #962.

ghost · 2020-10-26T16:23:13Z

modules/sbr/src/cfe_sbr_map_hash.c

+ * Note: algorithm designed for a 32 bit int, changing the size of
+ * CFE_SB_MsgId_Atom_t may require an update to this impelementation
+ */
+CFE_SB_MsgId_Atom_t CFE_SBR_MsgIdHash(CFE_SB_MsgId_t MsgId)


Should an internal helper function be static?

Certainly could be. I'm not consistently strict about static since they are occasionally useful from a coverage test point of view, but it's not necessary in this case. Happy to change if you want to put in an issue.

ghost

Code looks good.
I'm a little less clear about the organization. We have the fsw/cfe-core directory with the cFE subsystems, then we have the modules directory that has the msg, sbr, cfe_assert, cfe_testcase, and cfe_testrunner.

In the mission_defaults.cmake we have cfe-core, osal, PSP, msg, and sbr. Seems like msg and sbr are peers to the cfe-core subsystems, (sb, es, tbl, eve, time) or maybe even subsystems of SB. I know mission_defaults.cmake does not define the architecture, but it leaves the impression to me that these modules are peers to osal, psp, etc.

The routing module has a common file and APIs, then it has optional (hash or direct) modules. Is the intention to allow the replacement of everything with a different implementation? Or would the common files (in unsorted.c) always be used? If these common functions in unsorted will always be used, I would argue for leaving them in the SB directory and having modules for hash and direct routing.

In the end, I think we need a big reorganization of the code to reflect the evolving nature of how things go together. But at a minimum I think it would be nice at least put the sbr and msg modules in the "fsw" directory next to "cfe-core":
Then we would have:
fsw/
fsw/cfe-core
fsw/modules
fsw/modules/sbr
fsw/modules/msg

or maybe even fsw/cfe-core/modules/...

skliper · 2020-10-27T18:50:13Z

The routing module has a common file and APIs, then it has optional (hash or direct) modules. Is the intention to allow the replacement of everything with a different implementation?

Yes. That's the real difference from the core, the module can be completely replaced with a custom implementation.

Or would the common files (in unsorted.c) always be used?

Not if someone implements a sorted routing table with a search instead of a message map.

In the end, I think we need a big reorganization of the code to reflect the evolving nature of how things go together.

Agreed! opened #972 to discuss. I like the msg/sbr modules and core being peers, and eventually all the cfe elements could become modules (someone could entirely replace table services or time services for example). It's been an as-needed process so far though.

skliper added the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Oct 13, 2020

skliper force-pushed the fix928-route-direct-module branch from 48abc44 to 80bce1a Compare October 13, 2020 22:09

astrogeco added the CCB-20201014 label Oct 14, 2020

skliper requested review from a user and jwilmot October 14, 2020 17:02

jphickey reviewed Oct 14, 2020

View reviewed changes

fsw/cfe-core/src/sb/cfe_sbr.h Outdated Show resolved Hide resolved

fsw/cfe-core/src/sb/cfe_sbr.h Outdated Show resolved Hide resolved

fsw/cfe-core/src/sb/cfe_sb_priv.c Show resolved Hide resolved

skliper mentioned this pull request Oct 14, 2020

Eliminating recursive locks in SB could allow for using more efficient resource #948

Closed

astrogeco removed the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Oct 15, 2020

skliper mentioned this pull request Oct 16, 2020

Convert software bus destination linked list to circular #953

Open

skliper force-pushed the fix928-route-direct-module branch from 8fa5fc1 to fa8a216 Compare October 19, 2020 13:56

skliper added the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Oct 20, 2020

skliper force-pushed the fix928-route-direct-module branch from eed8c3a to 7bb7585 Compare October 21, 2020 13:52

skliper changed the title ~~(WIP) Fix #928 - Modularize software bus routing~~ Fix #928 and #929 - Modularize software bus routing, add msg map hash Oct 21, 2020

skliper marked this pull request as ready for review October 21, 2020 13:55

skliper added 3 commits October 21, 2020 10:01

Fix nasa#928, Added software bus routing module

0aac36a

- Implementation for direct map and unsorted routing table - Includes full coverage tests - Removed msg key and route stack concepts from direct map

Fix nasa#928, Update SB unit test for SBR module with direct link

9b56d52

- Linking SBR with SB unit test, not stubbed - Confirms matching functionality (with updates for intended changes)

skliper force-pushed the fix928-route-direct-module branch from 7bb7585 to 9b56d52 Compare October 21, 2020 14:03

astrogeco added the CCB-20201021 label Oct 21, 2020

skliper requested a review from a user October 21, 2020 17:16

This was referenced Oct 21, 2020

Update documentation for message map hash implementation #961

Closed

Add stubs for software bus routing module APIs and update SB unit tests #962

Closed

skliper mentioned this pull request Oct 21, 2020

Remove CFE_SB_SendPrevSubs in favor of CFE_SBR_ForEachRouteId API #963

Open

astrogeco removed the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Oct 21, 2020

astrogeco changed the base branch from main to integration-candidate October 21, 2020 22:41

astrogeco added the IC-20201027 label Oct 21, 2020

astrogeco merged commit 12f2ff1 into nasa:integration-candidate Oct 21, 2020

astrogeco added a commit to nasa/cFS that referenced this pull request Oct 21, 2020

Add nasa/cFE#947

a13b509

ghost reviewed Oct 26, 2020

View reviewed changes

skliper mentioned this pull request Oct 27, 2020

Refactor directory structure so it makes sense again (fsw contains non-fsw) #972

Closed

astrogeco mentioned this pull request Oct 28, 2020

Integration Candidate 2020-10-27 #975

Merged

skliper deleted the fix928-route-direct-module branch February 1, 2021 22:08

skliper added this to the 7.0.0 milestone Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #928 and #929 - Modularize software bus routing, add msg map hash #947

Fix #928 and #929 - Modularize software bus routing, add msg map hash #947

skliper commented Oct 13, 2020 •

edited

Loading

skliper commented Oct 13, 2020

skliper commented Oct 14, 2020 •

edited by astrogeco

Loading

skliper commented Oct 14, 2020

jphickey left a comment

skliper commented Oct 15, 2020

skliper commented Oct 16, 2020

skliper commented Oct 20, 2020 •

edited

Loading

excaliburtb commented Oct 20, 2020

skliper commented Oct 21, 2020

skliper commented Oct 21, 2020

astrogeco commented Oct 21, 2020 •

edited

Loading

skliper commented Oct 21, 2020

ghost Oct 26, 2020

skliper Oct 28, 2020

ghost left a comment

skliper commented Oct 27, 2020

Fix #928 and #929 - Modularize software bus routing, add msg map hash #947

Fix #928 and #929 - Modularize software bus routing, add msg map hash #947

Conversation

skliper commented Oct 13, 2020 • edited Loading

skliper commented Oct 13, 2020

skliper commented Oct 14, 2020 • edited by astrogeco Loading

skliper commented Oct 14, 2020

jphickey left a comment

Choose a reason for hiding this comment

skliper commented Oct 15, 2020

skliper commented Oct 16, 2020

skliper commented Oct 20, 2020 • edited Loading

excaliburtb commented Oct 20, 2020

skliper commented Oct 21, 2020

skliper commented Oct 21, 2020

astrogeco commented Oct 21, 2020 • edited Loading

skliper commented Oct 21, 2020

ghost Oct 26, 2020

Choose a reason for hiding this comment

skliper Oct 28, 2020

Choose a reason for hiding this comment

ghost left a comment

Choose a reason for hiding this comment

skliper commented Oct 27, 2020

skliper commented Oct 13, 2020 •

edited

Loading

skliper commented Oct 14, 2020 •

edited by astrogeco

Loading

skliper commented Oct 20, 2020 •

edited

Loading

astrogeco commented Oct 21, 2020 •

edited

Loading