-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output request queue time for Django #55
Output request queue time for Django #55
Conversation
Codecov Report
@@ Coverage Diff @@
## master #55 +/- ##
==========================================
+ Coverage 54.63% 55.66% +1.02%
==========================================
Files 11 11
Lines 410 424 +14
==========================================
+ Hits 224 236 +12
- Misses 186 188 +2
Continue to review full report at Codecov.
|
HireFire includes a strategy for scaling web dynos based on Heroku request queue times. This change adds Django support for queue time scaling according to the official docs: https://help.hirefire.io/article/49-logplex-queue-time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat. I don't use the HireFire service anymore for my projects (though I do still use this package to get metrics), but this seems like a cool way to scale the web dynos.
Are you up for maintaining the code in this PR for this package? I don't plan on using it, so I won't easily be able to tell if something's not working with it.
now_timestamp_ms - request_start_timestamp_ms | ||
if now_timestamp_ms >= request_start_timestamp_ms | ||
else 0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does HireFire not understand clock drift? It seems pretty apparent to me that this is throwing away data, and I hate to see that happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I believe that, unfortunately, the HireFire service will happily ingest a negative queue time value, which affects the aggregate. I think that's less of an issue when one is using a percentile for scaling (e.g. we use 95th percentile queue time in production), but if one uses an average queue time, it could potentially produce an average queue time that is a negative value. Since HireFire has a minimum value of 1ms for the up/down threshold settings, that seems a bit problematic.
Arguably some smarts could be implemented here to add an offset equal to the absolute value of the smallest negative value it has encountered to all future timestamps from the current dyno, but I had hoped to keep things simple. I don't think there's a perfect solution though, since there's always the possibility that the system clock on a dyno will be off compared to the clock on the Heroku router. Anecdotally, however, I've noticed the clock difference between the Heroku router and any given dyno to remain generally within +/-10ms.
All that said, I'm open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my suggestion that the times be passed along exactly as calculated by default. However, I'd be fine with an option to round off, as long as there's a documentation note about what it does and the trade-offs associated with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that sound alright to you? I just hate to have a default of losing data, even if the data is less than ideal.
Sure, I can look after this code. I don't expect it will evolve much, if at all, but I will be using it in production, so I don't mind. |
HireFire includes a strategy for scaling web dynos based on Heroku request queue times. This change adds a second Django middleware with support for queue time scaling adhering to the official docs: https://help.hirefire.io/article/49-logplex-queue-time. I've verified that it works correctly end-to-end as a dependency in a WIP branch of a production application (closed source, so I can't link to it).