Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update num-msgs archive metrics every minute and not only at the beginning #2287

Merged
merged 2 commits into from
Dec 14, 2023

Conversation

Ivansete-status
Copy link
Collaborator

@Ivansete-status Ivansete-status commented Dec 13, 2023

Description

This issue was reported/detected by @richard-ramos
There was a problem that the archive metrics (number of archived messages) were not properly updated regularly.

With this, we update the metrics every minute

@Ivansete-status Ivansete-status marked this pull request as ready for review December 13, 2023 16:00
Copy link

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2287

Built from c2f34ca

self.retPolicyFut = self.loopApplyRetentionPolicy()
self.retMetricsRepFut = self.loopReportStoredMessagesMetric()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, how are these started? Don't they require the await keyword at some point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question!
This can be considered a "normal" proc that starts an infinite loop inside. This proc is marked with the {.async.} pragma, which in turn, makes the proc return a Future[void], implicitly. And this returned Future[void] needs to be either attended or discarded.

afaik, there are two ways to attend a "future": keep a reference of it or await it immediately. And in this case, we keep a reference that is used to cancel the future returned. In this case, the returned future is directly linked with the internal await sleepAsync(.. call, which in turn conforms a cancellation point.

Therefore, when we are cancelling that future, we are forcing the code to break and goto to end that proc call, and there is no need to perform an explicit break of the infinite loop. Notice that this "cancellation point" feature is given by our nim-chronos library.

I think this should be the standard way to proceed in such infinite periodic tasks.

Nevertheless, @arnetheduck can shed more light on this and provide a more accurate reasoning.

note: in case I didn't suggested before, this is a great explanation of how async works in Nim: https://peterme.net/asynchronous-programming-in-nim.html (I revisit it now and then ;P)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I understood correctly, this line self.retMetricsRepFut = self.loopReportStoredMessagesMetric() implicitly runs a waitFor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I understood correctly, this line self.retMetricsRepFut = self.loopReportStoredMessagesMetric() implicitly runs a waitFor?

self.retMetricsRepFut = self.loopReportStoredMessagesMetric() <- This line should be understood as a way of starting an async and endless task.

waitFor is a mechanism by which the thread gets blocked until a certain Future is completed. waitFor should only be used in the upper layers of every app to prevent deadlocks, and by "upper layer" I mean the procs that are at the same level as the "main" module.

Looking at the waitFor implementation it can be seen that it simply waits synchronously (doesn't have the {.async.} pragma) for the future to complete while calling poll(), which is responsible for the dispatcher to progress and attend all the timers, network events, and async tasks: https://github.com/status-im/Nim/blob/3c9b68dc157804885b14a1984efc25e8b7cc861d/lib/pure/asyncdispatch.nim#L1958-L1963

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, okay then. The piece I'm missing is:
Why, if just running self.loopReportStoredMessagesMetric() starts the async task, does the compiler tell you you need to (discard) await it?
Or does it only start the async task when the rvalue of that is assigned to a Future variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or does it only start the async task when the rvalue of that is assigned to a Future variable?

Good question! I think the answer is "yes". I think it only starts when the future is consumed either with await or assigned. Is just an assumption, I don't know 100% sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would actually make sense with all other parts... I was making some assumptions from being used to Python's async 😅

Copy link
Contributor

@AlejandroCabeza AlejandroCabeza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can gather, LGTM
Added a couple comments for my own understanding of the big picture :)

Copy link
Contributor

@gabrielmer gabrielmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Ivansete-status Ivansete-status merged commit 0fc617f into master Dec 14, 2023
11 of 12 checks passed
@Ivansete-status Ivansete-status deleted the fix-report-archive-metrics branch December 14, 2023 16:00
Copy link
Contributor

@NagyZoltanPeter NagyZoltanPeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants