Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added initial event loop copy #373

Merged
merged 5 commits into from
Sep 11, 2019

Conversation

TheHollidayInn
Copy link
Contributor

Here is the initial rough draft for #294

@goldbergyoni
Copy link
Owner

@TheHollidayInn I will review this in the next few days. Looks like a great start!

@goldbergyoni
Copy link
Owner

/remind me in 4 days

@TheHollidayInn
Copy link
Contributor Author

@i0natan and updates on this?

@goldbergyoni
Copy link
Owner

@TheHollidayInn Sorry for the delay, I'm just struggling to jungle so many balls :)

If works for you, I can prioritize this bullet and run now a simulntanous discussion to finalize it next week.

In overall, I believe we should separate this into 3 bullets, sorted by priority:

  1. Avoiding CPU heavy tasks - "Other CPU intensive tasks, such as DNS Lookup or Crypto will be offload to lubuv, take up other threads and starve new CPU intensive tasks"

  2. Monitor the event loop

  3. Avoid sync operations - "TLDR; Avoid using syncronous and other Event Loop blocking functions"

@TheHollidayInn
Copy link
Contributor Author

No problem! I'm happy to split/work on the three

@goldbergyoni
Copy link
Owner

goldbergyoni commented May 28, 2019

@TheHollidayInn Cool, so here's my proposed TOC for 1':

[TLDR Basics] The Node.js userland is single-threaded, keeping the CPU busy might put ~2000 concurrent users in idle.
[TLDR spice it up a little] Yet, it's important to understand that most reasonable CPU operations, including simple cryptography and basic loops, barely affects the end user latency. One has iterate over arrays of million items to generate a noticeable block. Mitigating this demands no more than a simple monitor and developer awareness
[Otherwise] If only one request is doing some crazy computations (e.g. 100 crypto methods per request), all the other 1000 will wait patiently for that one to complete

[One paragraph]

  • Two lines about the event loop model - our app code is executed using a single thread (as opposed to the IO calls that go outside), this is great for minimizing context switching but assume that all behave nicely as they share the same...

[Examples & code]

  • Diagram of the event loop that illustrates the block
  • 7 lines code example that shows loop of crypto and a setTimeout that is called-back after 5 seconds instead of 2 (settime is set to 2000 ms but it will never called before the crypto part ends)
  • Some nice quote
  • Benchmark! I have this ready, Express route with 1...1000 loop, if I approach it using 1000 concurrent users, the loop add 3ms latency to the end-user (negelectible)

What do you think?

@TheHollidayInn
Copy link
Contributor Author

Hmm. Rethinking this and I might be missing how to separate all three. The issue with using CPU intensive tasks is that they are sync and block the event loop. I think all bullet points might be the same.

Taking a look at your copy now. Also, have you had a chance to read through my copy in this PR?

@goldbergyoni
Copy link
Owner

@TheHollidayInn

Yes, I read your copy which was good but felt like we mix concerns, hence we need to separate,
I don't see how the 3 are the same.

  1. Avoid CPU intensive tasks - this issues is related to code that is CPU intensive like crypto, see the proposed TOC

  2. Is about monitoring it...

  3. sync operations - When you use fs.readfilesync then there's no CPU work here at all, it just blocks your loop until the IO work is over. That's a very specific way of blocking the loop and hurting performance. Very few people do that

Thoughts?

@TheHollidayInn
Copy link
Contributor Author

Ah okay, after reading everything, I am seeing what you mean.

I think where I am getting confused is that both IO and CPU operations cause the same problem which is blocking the loop. And both are run on worker threads: https://nodejs.org/en/docs/guides/dont-block-the-event-loop/#what-code-runs-on-the-worker-pool

So in general the idea is to watch for any loop blockers - dns, file io, crypto, encryption, etc.

But there are quite a few to watch for. I'd agree to separate them up for sure. I'll go ahead and write those up :D

@goldbergyoni
Copy link
Owner

@TheHollidayInn

Sounds good, I suggest starting with 1 bullet, see my proposed TOC for the 1st one. Does this TOC resonate with you?

Minor technical remark:

"And both are run on worker threads"

Most CPU intensive tasks won't run on worker thread, any user code except some Crypto/DNS libraries will get executed within the main thread. This article refers to the Node core code, not to the user-land code

@TheHollidayInn
Copy link
Contributor Author

TheHollidayInn commented Jun 6, 2019

@i0natan Yep! Works for me :D

I'm not sure I follow the difference between your distinction of node core and user-land. Crypto is considered CPI intensive tasks. DNS is IO. Both go to the worker pool. The section I linked refers to apis you can use in node modules.

However, your first TOC is "Avoid CPU intensive tasks - this issues is related to code that is CPU intensive like crypto, see the proposed TOC". Are you referring to the cypto module: https://nodejs.org/api/crypto.html - because the cpu intensive tasks from that module are run on the Worker Pool: crypto.pbkdf2(), crypto.scrypt(), crypto.randomBytes(), crypto.randomFill(), crypto.generateKeyPair()

[editing multiple times - sorry :P]

So I guess my new thought is there are operations to avoid that are cpu expensive - not on worker threads:

  • algorithms with large number of operations (like your while loop)
  • Large json parsing
  • Certain regex queries
  • long promises/awaits

Then there are synchronous apis (includes some cpu intensive, io intensive and worker pool)

Encryption:
        crypto.randomBytes (synchronous version)
        crypto.randomFillSync
        crypto.pbkdf2Sync
        You should also be careful about providing large input to the encryption and decryption routines.
    Compression:
        zlib.inflateSync
        zlib.deflateSync
    File system:
        Do not use the synchronous file system APIs. For example, if the file you access is in a distributed file system like NFS, access times can vary widely.
    Child process:
        child_process.spawnSync
        child_process.execSync
        child_process.execFileSync

@goldbergyoni
Copy link
Owner

So I guess my new thought is there are operations to avoid that are cpu expensive - not on worker threads:

  • algorithms with large number of operations (like your while loop)
  • Large json parsing
  • Certain regex queries
  • long promises/awaits

Yes! Those examples make this bullet shine. Crypto is just another example.

Now let's discuss Crypto which is just another bullet:

  • Crypto work (except some built-in crypto utils that are executed using the libuv threads pool)

Only few crypto libraries, a subset of those that are part of Node, are executed using the thread pool, this is why crypto is still a good example

Synchronous APIs are a different bullet, keep it focused on a single challenge - CPU. Because soon you'll have to remind solutions and mitigating CPU intensive tasks take a different form than mitigating sync-IO tasks (which are very eay to remedy - just avoid them :))

Makes sense?

@TheHollidayInn
Copy link
Contributor Author

Alright, got a new draft up. Can you let me know what you think?

Copy link
Owner

@goldbergyoni goldbergyoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start, very challenging topic, let's perform 2 iterations until we perfect it?

@BrunoScheufler Need your help here, seems like there are few formatting issues?

README.md Outdated

## ![✔] 7.1. Don't block the event loop

**TL;DR:** TLDR; Avoid CPU intensive tasks as they will block the Event Loop.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate TLDR?

Maybe add "block the mostly single-threaded event loop"?

@@ -0,0 +1,57 @@
Node handles the Event Loop on a single thread rotating through multiple queues. Operations with high complexity, large json parsing, unsafe regex querieies, and long awaited promises are some of the operations that can cause the event loop to stall. Preventing this is as easy as setting up monitoring for the the Event Loop and resolving the blocker.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget the page title? see other 'more info' files, they all contain a title

@@ -0,0 +1,57 @@
Node handles the Event Loop on a single thread rotating through multiple queues. Operations with high complexity, large json parsing, unsafe regex querieies, and long awaited promises are some of the operations that can cause the event loop to stall. Preventing this is as easy as setting up monitoring for the the Event Loop and resolving the blocker.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"handles the Event Loop on a single thread " -> "mostly on a single thread" (just to be precise)

"Operations with high CPU cost" (it's not about complexity, IO can be too complex)

"querieies" - typo?

"long awaited promises" - what do you mean here? not sure this is in-general true, usually promises are used to wait for IO

"Preventing this is as easy as setting up monitoring for the the Event Loop and resolving the blocker." - monitoring is detection, not prevention. Let's keep the monitoring to a dedicated bullet. My thoughts "Avoid this off-loading CPU intensive tasks to dedicated workload that doesn't serer users (e.g. job server), or by breaking long tasks into small steps that stand on the event queue lines, by using crypto libraries that utilize the event loop thread pool. To name a few techniques that can keep the event loop alive"

Copy link
Contributor Author

@TheHollidayInn TheHollidayInn Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of my info comes from this article: https://nodejs.org/en/docs/guides/dont-block-the-event-loop/

From this section: https://nodejs.org/en/docs/guides/dont-block-the-event-loop/#don-t-block-the-event-loop

For promises: "You should make sure you never block the Event Loop. In other words, each of your JavaScript callbacks should complete quickly. This of course also applies to your await's, your Promise.then's, and so on."

For computational complexity
"A good way to ensure this is to reason about the "computational complexity" of your callbacks."

For non-sync IO, should I state that you should also watch for large IO tasks?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For promises: "You should make sure you never block the Event Loop. In other words, each of your JavaScript callbacks should complete quickly. This of course also applies to your await's, your Promise.then's, and so on."

Note there is a signficant difference between what you wrote and the article - you alluded to 'long awaited promises', they refer to the .then part. Practically, most promises wrapping some IO work, if you wait for long time then it's not related to CPU work. Their advice of not slowing down the .then handler makes sense, but it's too specific, we can generalize this advice to any other code...

"A good way to ensure this is to reason about the "computational complexity" of your callbacks."

Well, it's probably right technically, is it a simple and well understood term? I guess the answer is subjective. I suggest that our guide should curate and simplify those technical guides

For non-sync IO, should I state that you should also watch for large IO tasks?

I think it doesn't belong here

Note that I also included in my comment multiple mitigation techniques

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll definitely include many of your suggestions. I'm just trying to hone in on "high CPU usage" because it is super generic. I want to tell people what to look for.

I believe that these are three examples: Operations with high complexity, large json parsing, unsafe regex queries. So "high complexity" is just one example.

Also, promises are also used a lot for db access. If you have long queries that you are are awaiting you could block the loop like our example, I think.

Anyway, I'm just trying to give devs examples of "cpu work" to look for :D. I'll get your edits in this weekend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OKay, after reading this I'd retract my comment about awaiting. I was missing the idea that promises are added to the job queue: https://github.com/getify/You-Dont-Know-JS/blob/master/async%20%26%20performance/ch1.md. Soooo sorry about that :P

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you await on IO operation (E.g. DB call) - this will not the event loop


server.listen(3000)

process.on('SIGINT', function () {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you shorten this example by 50%? the SIGINT for example is not relevant to this topic

```

## Image of the Event Loop
![1*aU5dr98pxTsZ4AMfnA6lNA.png](https://cdn-images-1.medium.com/max/1600/1*aU5dr98pxTsZ4AMfnA6lNA.png)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this image is descriptive enough to explain why the loop is sensitive to CPU-bounded tasks. Can we find a better one?

@TheHollidayInn
Copy link
Contributor Author

@i0natan Can you review my updates :D

Copy link
Owner

@goldbergyoni goldbergyoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for improving, absolutely getting closer to 'production grade'. I estimate that we would need ~2 more iterations to make it 100% concise and fun to read. If preferable we can work on it simultaneously (e.g. git chat or hangouts).

README.md Outdated

## ![✔] 7.1. Don't block the event loop

**TL;DR:** Avoid CPU intensive tasks as they will block the mostly single-threaded Event Loop.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You stated what NOT to do, let's include your recommendation? for example "Avoid CPU intensive tasks as they will block the mostly single-threaded Event Loop and offload those to a dedicated thread, process or even a different technology based on the context"

README.md Outdated

**TL;DR:** Avoid CPU intensive tasks as they will block the mostly single-threaded Event Loop.

**Otherwise:** As the Event Loop is blocked, Node.js will be unable to handle other request thus causing delays for concurrent users.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider emphasizing the severity of this in a bolder language - "3000 users are waiting for a response, the content is ready to be served, but one single request blocks the server from dispatching the results back"


<br/><br/>

Node handles the Event Loop mostly on a single thread rotating through multiple queues. Operations with high complexity, large json parsing, unsafe regex querieies, and large IO operations are some of the operations that can cause the Event Loop to stall. Avoid this off-loading CPU intensive tasks to a dedicated service (e.g. job server), or breaking long tasks into small steps then using the Worker Pool are some examples of how to avoid blocking the Event Loop.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more example "large JSON parsing, applying logic over huge arrays, unsafe regex..."


<br/><br/>

Node handles the Event Loop mostly on a single thread rotating through multiple queues. Operations with high complexity, large json parsing, unsafe regex querieies, and large IO operations are some of the operations that can cause the Event Loop to stall. Avoid this off-loading CPU intensive tasks to a dedicated service (e.g. job server), or breaking long tasks into small steps then using the Worker Pool are some examples of how to avoid blocking the Event Loop.
Copy link
Owner

@goldbergyoni goldbergyoni Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's now take it one step up and make our advice even more accurate, I'll be glad to explain why is this true:

"When handling CPU-intensive tasks, different techniques should be applied based on of two different scenarios: if these CPU-intensive tasks are needed rarely (e.g. triggered by a few requests only) - then they can be offloaded to a dedicate process, thread or get handled in chunks (See code example below). However, the more challenging scenario is where most or all requests need to trigger CPU-intensive code - in that case, this task must be offloaded to a different workload, one that doesn't dispatch content to the end-users, like a job-server, Serverless, and similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me and makes sense. Should I add this as a separate paragraph?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or maybe to the main one?


Node handles the Event Loop mostly on a single thread rotating through multiple queues. Operations with high complexity, large json parsing, unsafe regex querieies, and large IO operations are some of the operations that can cause the Event Loop to stall. Avoid this off-loading CPU intensive tasks to a dedicated service (e.g. job server), or breaking long tasks into small steps then using the Worker Pool are some examples of how to avoid blocking the Event Loop.

## Example
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting issue - in all other pages, the title contains the main message, e.g. "Example: blocking the event loop"

## Example
Let's take a look at an example from [Node Clinic](https://clinicjs.org/documentation/doctor/05-fixing-event-loop-problem).
```
const restify = require('restify')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code examples should be really short and concise, the first two lines and the last one are not needed (they make it look long)

sections/performance/block-loop.md Show resolved Hide resolved
sections/performance/block-loop.md Show resolved Hide resolved
```

## Image of the Event Loop
┌───────────────────────────┐
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image is better, it gives a sense of the queues that might stay idle. Thinking out loud whether it's the best we can get? does it illustrate how some request blocks others? I tried to sketch something, see below, what do you think?

image

@TheHollidayInn
Copy link
Contributor Author

TheHollidayInn commented Jul 25, 2019

I missed your comment but I'm happy to do git chat or hangouts :)

@js-kyle
Copy link
Contributor

js-kyle commented Jul 28, 2019

Awesome work going on here! Nice

@goldbergyoni
Copy link
Owner

@TheHollidayInn Let me know when do you want me to review the whole, I noted that you answered some of the comments and other don't

Copy link
Contributor

@BrunoScheufler BrunoScheufler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content looks great to me, awesome work 🙌

@goldbergyoni
Copy link
Owner

@TheHollidayInn Any chance that you're available for a 1-hour finalization session afterwhich, we will publish this? when might be a good time next week?

Best if you can merge all my remarks into the PR by then so we just focus on final touches

@goldbergyoni
Copy link
Owner

@BrunoScheufler Can you review the lexical part + formatting?

@TheHollidayInn
Copy link
Contributor Author

Sweeeet!

@i0natan Do you have any time Tuesday or Thursday after 6pm UTC?

@goldbergyoni
Copy link
Owner

@TheHollidayInn What timezone are you in? my calendar seems busy at those timeslots, do you happen to be available on Friday 6-8am UTC or Sunday 12-16 UTC or Monday 12-16?

@goldbergyoni
Copy link
Owner

@TheHollidayInn Do you have some availability this week and can proposes 2-3 slots?

@TheHollidayInn
Copy link
Contributor Author

Sorry! I actually had to do a last minute travel for work.

I am back on Wednesday. I can do Wed-Fri after 6pm UTC. I'll need to check Sat, but I'm more flexible on Sun.

I'm in MST, btw. In Colorado. What about you?

@TheHollidayInn
Copy link
Contributor Author

@goldbergyoni Just following up. Are you free this week?

@goldbergyoni
Copy link
Owner

@TheHollidayInn Sorry for the delay, on a very tight trip now in US and struggle to find time.

My suggestion - let's merge this item as-is now, then there is a couple of enrichment I'd like us to perform as additional PRs. Works for you?

@BrunoScheufler @js-kyle Could you please help with reviewing the formatting/grammar/wording?

p.s. @TheHollidayInn Did you see the credit I gave you in my testing practices guide?

@TheHollidayInn
Copy link
Contributor Author

Works for me! I hope your trip is going well :D

I have your testing practices on my list to read. I just saw it in Node Weekly, congrats!!
I now see the link to me! Thank you :D :D

@goldbergyoni
Copy link
Owner

@BrunoScheufler Could you please help with format and language review?

@goldbergyoni goldbergyoni merged commit d237500 into goldbergyoni:master Sep 11, 2019
@goldbergyoni
Copy link
Owner

@TheHollidayInn I've merged this, big congrats to us and many thanks for the high effort here! 🎇 🚀 🙏

I believe we can and should take 1/2 level up, I'll pr with some suggestions

elite0226 pushed a commit to elite0226/nodebestpractices that referenced this pull request Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants