Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Change of Tag #391

Closed
RalphSteinhagen opened this issue Aug 8, 2024 · 3 comments
Closed

Design Change of Tag #391

RalphSteinhagen opened this issue Aug 8, 2024 · 3 comments
Assignees

Comments

@RalphSteinhagen
Copy link
Member

Add frequently used Tag Infos as strongly-typed fields or as part of a bit-field mask

In GR4 we use Tag as a standardised form to annotate streams, indicate trigger, context changes, or settings changes that should be downstream propagated within a graph. The initial design assumption was that Tags are rare compared to the payload stream frequency they are attached to (e.g. one tag every ~1024 samples or more). Pushing the envelope a bit further and publishing, for example (worst case), a tag on every float sample, we noticed that the event handling rate goes down to ~300 kMsgs/s. Semën Lebedev wrote a new performance monitor block that can be inserted anywhere in the graph and track the throughput performance and memory consumption: #358

For the typical use case, the given Tag[Source, Monitor, Sink] graph topology (blocks are non-trivial) can be run up to about 500 MS/s (and Tags roughly every 1k samples) but drops down to about 30 kMsg/s when there is a tag attached to every sample. This is probably acceptable for most real-world use cases. Still, we felt that this could be further improved as the main bottleneck turned out to be the access of Tag meta information in the property_map (which internally is a std::map<std::string, pmt::pmtv>) and often just to check if there is an end-of-stream (EoS) tag in the stream. The frequent map accesses make the processing slow.

Thus, we tinkered and played with the idea of modifying the existing Tag definition from:

struct alignas(hardware_constructive_interference_size) Tag {
    using signed_index_type = std::make_signed_t<std::size_t>;
    signed_index_type index{0};
    property_map      map{};
// ...
};

to a slightly larger structure where certain common fields and key-value pairs are hard-coded, namely:

using property_map = std::map<std::string, pmt::pmtv>;

inline constexpr std::size_t kTagEndOfStream          = 1UZ;
inline constexpr std::size_t kTagTrigger              = 2UZ;
inline constexpr std::size_t kTagEmptyMap             = 4UZ;
inline constexpr std::size_t kTagContext              = 8UZ;
inline constexpr std::size_t kTagAutoForwardParameter = 16UZ;
// ...

struct alignas(std::hardware_constructive_interference_size) Tag { // 192 bytes
    using index_type = std::size_t; // N.B. upcoming change from signed->unsigned (to fix UB)
    index_type   _index{0};
    std::size_t   mask;           // new: bit-mask storing common states, definitions see above
    std::uint64_t trigger_time;   // new, was earlier part of map (parsing overhead)
    float         trigger_offset; // new
    pmt::pmtv     trigger_name;   // new alt: std::string
    pmt::pmtv     ctx;            // new, was earlier part of map (parsing overhead)
    property_map  map{}; // as before
};

The rationale behind the proposed changes is to minimise the access times to the using property_map = std::map<std::string, pmt::pmtv>; that are non-negligible in case there is a tag, for example, on every sample. See for instance, https://quick-bench.com/q/P7GaIXJLAJfnhCUT_29J6J7S-LY

The other update is how to generalise the tag forwarding using meta-programming. We can provide default implementations for the first ones:

enum class TagPropagationPolicy {
    TPP_DONT       = 0, /*!< Scheduler doesn't propagate tags from in- to output. The block itself is free to insert tags. */
    TPP_ALL_TO_ALL = 1, /*!< Propagate tags from all in- to all outputs. The scheduler takes care of that. */
    TPP_ONE_TO_ONE = 2, /*!< Propagate tags from n. input to n. output. Requires same number of in- and outputs */
    TPP_CUSTOM     = 3  /*!< Like TPP_DONT, but signals the block, it should implement application-specific forwarding behaviour. */
};

But we hit too many corner cases for the last one. Meta-programming read: C++ only, and/or it would be hard to wrap/write reasonably performing Python wrappers for it. We can post the exact signature later, but in a nutshell, it's similar to the for-each-port method, but it provides both input and output streaming ports.

I hope the design above is reasonable. There are the following main items to be discussed:

  1. While the bit-mask should be used sparingly (for future upgrade options), are there other high-frequency standard tag types that should be defined right away?
  2. We suggest to also explicitly add the trigger tuple information for type-safety, access speed and to ensure that always all three variants are declared (we discovered that it's easy to miss/mistype/forget one of them).
    • Sub-question: trigger_name is presently a std::string but we thought of also allowing a pmt::pmtv type for a more economic, for example, int-based comparisons.
  3. We suggest also explicitly hard-coding the context ctx as key, as this drives a lot of the multiplexing and settings updates. Similar open question to the trigger: std::string and/or pmt::pmtv.
@RalphSteinhagen
Copy link
Member Author

After a design iteration with @ivan-cukic, we have settled on the following revised Tag definition (see full compiler-explorer example here):

struct alignas(std::hardware_constructive_interference_size) Tag { 
    // size: 192 bytes (N.B. cache-aligned)
    using index_type = std::size_t;
    using property_map = std::map<std::string, pmt::pmtv>;  // 48 bytes

    index_type _index{0};  // 4 or 8 bytes

    bool isEndOfStream : 1 = false;
    bool isTrigger : 1 = false;
    bool hasContextDefined : 1 = false;
    std::size_t _unused : (sizeof(std::size_t) * 8) - 3 = 0; // 4 or 8 bytes with the above
    // add more bitfields as needed

    std::uint64_t trigger_time{0UZ}; // 8 bytes
    float trigger_offset{.0f}; // 4 bytes
    pmt::pmtv trigger_name;
    pmt::pmtv ctx;

    property_map map{};

    constexpr void clearFlags() noexcept {
        isEndOfStream = false;
        isTrigger = false;
        hasContextDefined = false;
        _unused = 0;
    }

    constexpr void clear() noexcept {
        clearFlags();
        map.clear();
        trigger_time = 0UZ;
        trigger_offset = .0f;
        trigger_name = std::monostate{};
        ctx = std::monostate{};
    }
};

This Tag struct is copyable, movable, destructible, and explicitly cache-aligned (192 bytes on most 64-bit machines). The pmt::pmtv type is used for trigger_name and ctx to facilitate faster comparisons, typically leveraging integer-based operations, rather than minimizing the size of the Tag struct.

@RalphSteinhagen
Copy link
Member Author

There is a test-branch implementing this feature.
Initial perf tests using the one-tag-per-sample didn't improve. The assumed tag-processing bottleneck may be somewhere else.

Will leave this open for some time for others to give it a try/diagnose the real bottlenecks and then close it otherwise.

@RalphSteinhagen
Copy link
Member Author

I will close this as it is idle and unlikely to be followed up (the corresponding branch is lagging too much behind).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: QA-Accepted/Merged (∞)
Development

No branches or pull requests

1 participant