-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#IC-87] Dismantle durable functions #169
Conversation
Example of PR titles that include pivotal stories:
|
fda8cb6
to
83c70fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For [REDACTED] project we need to collect several metrics that can tell us how much time is elapsed between a message reaches the
Ideally: the time spent in each step of the message path. This need is due to the reason we probably need some priority mechanism to deliver the messages that come from some "critical" services, but currently we don't know the average time needed to deliver one message (inbox + push, email does not matter atm). If the average time is under, lets say 10 seconds, even under peak loads, probably we won't need a refactor at all. Merely this rector cuts out some formerly tracked events that could have helped here. I've read the IO-RFC with the analysis of probable bottlenecks and, as far as I understood, the main ones are storage queues (Am I missing something @BurnedMarshal ?). Another option, to get some insights, could be to monitor the length of the queues which are used currently. What do you think about that? |
We can reintroduce such events with little effort. Please keep in mind that such events are (and were) sampled, so that you can easily quantify the % of processed/failing/whatever messages, but you cannot trace a single message through the entire flow. What we might want to achieve is a transactional tracing of some form, so that we can follow (every? some?) messages in each step from the sender to the Citizen's inbox. |
No they werent' as a specific flag was introduced by @infantesimone to avoid sampling those events. Anyway, it's ok even if they're sampled since what we want here is the average timing. The query on application insights would @pasqualedevita currently we have two custom AI events:
do you think it's possibile to get the average difference time( |
AFAIK we had empiric evidence that sampling was applied anyway, but I may be wrong. |
I think is possibile (not simple) but using app insight can be not accurate due to sampling and we don't know if u can disable sampling with custom events. As alternative I want to propose to send relevant events in a kafka queue (eventhub). In this way u can read the queue and create all kpi that u need in PDND dashboards (filer by sender service, priority, average, percentile distribution, ecc) PS: I think we are under 10s, probably now the bottleneck is only on fn-pushnotification |
If you group by messageId sampling shouldn't matter since any of those two values would be simply undefined (you have to filter out undefined entries).
great! |
Motivation and Context
Following #168 we remove unused code
List of Changes
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: