Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output request queue time for Django #55

Merged
merged 1 commit into from
Aug 14, 2020
Merged

Output request queue time for Django #55

merged 1 commit into from
Aug 14, 2020

Conversation

OldSneerJaw
Copy link
Contributor

HireFire includes a strategy for scaling web dynos based on Heroku request queue times. This change adds a second Django middleware with support for queue time scaling adhering to the official docs: https://help.hirefire.io/article/49-logplex-queue-time. I've verified that it works correctly end-to-end as a dependency in a WIP branch of a production application (closed source, so I can't link to it).

@codecov
Copy link

codecov bot commented Oct 4, 2019

Codecov Report

Merging #55 into master will increase coverage by 1.02%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #55      +/-   ##
==========================================
+ Coverage   54.63%   55.66%   +1.02%     
==========================================
  Files          11       11              
  Lines         410      424      +14     
==========================================
+ Hits          224      236      +12     
- Misses        186      188       +2
Impacted Files Coverage Δ
hirefire/contrib/django/middleware.py 90.56% <85.71%> (-1.75%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf8686d...241a0cc. Read the comment docs.

HireFire includes a strategy for scaling web dynos based on Heroku request queue
times. This change adds Django support for queue time scaling according to the
official docs: https://help.hirefire.io/article/49-logplex-queue-time.
Copy link
Owner

@ryanhiebert ryanhiebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat. I don't use the HireFire service anymore for my projects (though I do still use this package to get metrics), but this seems like a cool way to scale the web dynos.

Are you up for maintaining the code in this PR for this package? I don't plan on using it, so I won't easily be able to tell if something's not working with it.

now_timestamp_ms - request_start_timestamp_ms
if now_timestamp_ms >= request_start_timestamp_ms
else 0
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HireFire not understand clock drift? It seems pretty apparent to me that this is throwing away data, and I hate to see that happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I believe that, unfortunately, the HireFire service will happily ingest a negative queue time value, which affects the aggregate. I think that's less of an issue when one is using a percentile for scaling (e.g. we use 95th percentile queue time in production), but if one uses an average queue time, it could potentially produce an average queue time that is a negative value. Since HireFire has a minimum value of 1ms for the up/down threshold settings, that seems a bit problematic.

Arguably some smarts could be implemented here to add an offset equal to the absolute value of the smallest negative value it has encountered to all future timestamps from the current dyno, but I had hoped to keep things simple. I don't think there's a perfect solution though, since there's always the possibility that the system clock on a dyno will be off compared to the clock on the Heroku router. Anecdotally, however, I've noticed the clock difference between the Heroku router and any given dyno to remain generally within +/-10ms.

All that said, I'm open to suggestions.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my suggestion that the times be passed along exactly as calculated by default. However, I'd be fine with an option to round off, as long as there's a documentation note about what it does and the trade-offs associated with it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that sound alright to you? I just hate to have a default of losing data, even if the data is less than ideal.

@OldSneerJaw
Copy link
Contributor Author

This is neat. I don't use the HireFire service anymore for my projects (though I do still use this package to get metrics), but this seems like a cool way to scale the web dynos.

Are you up for maintaining the code in this PR for this package? I don't plan on using it, so I won't easily be able to tell if something's not working with it.

Sure, I can look after this code. I don't expect it will evolve much, if at all, but I will be using it in production, so I don't mind.

@ryanhiebert ryanhiebert merged commit df7f650 into ryanhiebert:master Aug 14, 2020
ryanhiebert added a commit that referenced this pull request Aug 14, 2020
- Remove an annoying warning (#57)
- Output request queue time for Django (#55)
@OldSneerJaw OldSneerJaw deleted the request-queue-time branch October 30, 2020 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants