Skip to content
This repository has been archived by the owner on Aug 7, 2023. It is now read-only.

Add Netflix survey #12

Merged
merged 1 commit into from
Jul 19, 2018
Merged

Add Netflix survey #12

merged 1 commit into from
Jul 19, 2018

Conversation

misterdjules
Copy link
Contributor

This PR adds:

  1. A section about users research. While I would think our efforts should not be informed only by surveying users, it can help us get a better understanding of what current usage patterns are. Hopefully this section is a good place to store all the work done to understand those usage patterns.

  2. The information about a recent survey I've done internally at Netflix. Please make sure to read the disclaimer(s) about the limitations of this survey.

This is not groundbreaking or very comprehensive, but I believe it'll definitely help me to move forward with some of the current discussions and in my future work. I hope it can do the same for others.

I also hope that this can help inform some of the questions we want to ask in nodejs/user-feedback#77.

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM looks like some useful feedback and good to capture. Of course we'll want other perspective as well as broader input (such as a survey by the project/foundation) to compare the results with.

the process when an unhandled rejection occurs, but they seem to be fine with
being able to override the default behavior.

### Exiting on garbage collection doesn't seem to be considered as a desirable behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just context:

Exiting on garbage collection is only done because removing the ability to attach catch handlers asynchronously isn't possible given our data. If we could theoretically get everyone to agree to not use code with async catch handlers then it would be great - but I'm afraid that ship has sailed and it would meet a lot of resistance in core anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the data on "exit on GC" from the survey? I didn't see any questions referring to it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the data on "exit on GC" from the survey? I didn't see any questions referring to it

In the survey's UI, if you look at the question entitled:

When a rejected promise doesn't have a catch handler, what would you like the behavior of your runtime to be?

You'll see on the right that there is a set of responses. The first responses down to "I don't know" were the responses that were pre-populated by the author of the survey (me). The rest are responses that were written by respondents when they chose the "Other" response.

From there you can see that the answer "waiting for that rejected promise to be garbage collected, and then exit" was available, but no respondent chose it.

Does that clarify things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does - though that's not the behaviour anyone would want (me included). The behaviour I'd actually want is for unhandled rejections to terminate immediately - the problem is that we can't tell if a promise rejection will never be handled so it can't be implemented.

Copy link
Contributor Author

@misterdjules misterdjules Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamingr

Exiting on garbage collection is only done because removing the ability to attach catch handlers asynchronously isn't possible given our data. If we could theoretically get everyone to agree to not use code with async catch handlers then it would be great - but I'm afraid that ship has sailed and it would meet a lot of resistance in core anyway.

What this report is saying is that, regardless of whether promises users attach catch handlers asynchronously, exiting when a rejected promise is garbage collected doesn't seem to be a behavior that users would consider valid.

Moreover, interviews of a selected number of respondents actually seemed to show that they would consider that behavior detrimental to their ability to reason about the correctness of their programs.

In other words, even if exiting on garbage collection was the only possibility for the runtime to exit on an unhandled promise rejection event, this report suggested that:

  1. there is no evidence that it would be useful
  2. there is evidence that it would be harmful

Therefore it seems implementing it would be detrimental to the runtime regardless of any other consideration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamingr

Also:

given our data

What data are you referring to?

@benjamingr
Copy link
Member

Very interesting! From the google docs link (and we) should discuss this:

  • 90% using promises in Netflix, that's more than I expected!
  • Not using promises: often because they're mixing it with another abstraction (Netflix's own RxJS or callbacks). It would also be interesting to look only at Node.js users out of everyone.
  • Not using async-await: only one person mentioned errors, also very interesting.
  • Over 40% (50 including "don't know") don't differentiate errors like "undefined property access" from "connection timeout" errors. This is a strong motivation for writing the promises guide!
  • I'm not sure I understand "If you use both mechanisms above for different use cases" - since it's unclear which two it refers to.
  • "Do you always attach a "catch" handler on promises?" is a little problematic - I don't for one since I often add a catch handler at the end of the chain. That is - stuff like p.then(f).then(g).catch(e) has 3 promises that don't have a catch handler attached but no unhandled rejection potential.
  • The fact one person says they attach catch handlers asynchronously is worth investigating and asking that person (if possible) why they're doing that.
  • For "When a rejected promise doesn't have a catch handler, what is the behavior of your application today?" - I think it's surprising (in a positive way) that a significant amount of people choose a non-default behavior (crashing) over the default one (logging).
  • "When a rejected promise doesn't have a catch handler, what would you like the behavior of your runtime to be?" seems to favour crashing but by a small margin.
  • The fact 40% answer "Do you add custom event handlers to the Node.js "uncaughtException" event?" with 'yes' indicates postmortem debugging with core dumps doesn't have very high adoption.

@misterdjules
Copy link
Contributor Author

misterdjules commented Jul 18, 2018

90% using promises in Netflix, that's more than I expected!

90% of the people who responded to that survey use promises. There is likely selection bias: someone who doesn't use promises may be more inclined to not respond to the survey, and vice versa. The actual number for Netflix at large might differ a lot, it might not. The concept of "using promises" is also not well defined: does having code that uses promises deep in a dependency tree qualify as using promises, even if that's not what people writing application code use?

Not using promises: often because they're mixing it with another abstraction (Netflix's own RxJS or callbacks). It would also be interesting to look only at Node.js users out of everyone.

I can probably get those numbers if you're interested!

This is a strong motivation for writing the promises guide!

In general one of the things that doing that survey and the interviews brought to light is the need for more education about promises. This is also highlighted in the report.

I'm not sure I understand "If you use both mechanisms above for different use cases" - since it's unclear which two it refers to.

The question above ("What mechanism do you use to reject promises?") as seen by respondents had only three possible answers: "throwing errors", "calling reject" and "other". The term "both mechanisms" referred to "throwing errors" and "calling reject". I agree that question was poorly worded.

"Do you always attach a "catch" handler on promises?" is a little problematic - I don't for one since I often add a catch handler at the end of the chain. That is - stuff like p.then(f).then(g).catch(e) has 3 promises that don't have a catch handler attached but no unhandled rejection potential.

Very good point, thank you! I think your suggestion about being clear here about a promises chain would make that question much better.

The fact one person says they attach catch handlers asynchronously is worth investigating and asking that person (if possible) why they're doing that.

That's what I've done and what the report mentions.

The fact 40% answer "Do you add custom event handlers to the Node.js "uncaughtException" event?" with 'yes' indicates postmortem debugging with core dumps doesn't have very high adoption.

I think it's difficult to infer that from this question alone, especially when considering that adding an 'uncaughtException' event handler doesn't prevent a node process from aborting when --abort-on-uncaught-exception is used. Even if that was the case, would that mean that those 40% are adding that event handler because they purposely didn't want to use the post-mortem debugging methodology and tooling available today?

Regardless:

  1. I don't think the fact that post-mortem debugging doesn't have high adoption in general is controversial, and that's OK. It doesn't mean that it's not valuable, or that the minority of its users should not be supported.

  2. This survey is about error handling, not post-mortem debugging. It may help informing some decisions regarding post-mortem debugging, but it's not its main goal. If we wanted to know whether people are using post-mortem debugging we would probably want to ask the question directly.

@misterdjules
Copy link
Contributor Author

@mhdawson

Of course we'll want other perspective as well as broader input (such as a survey by the project/foundation) to compare the results with.

To be clear and to make sure there's no misunderstanding: the intention of this survey and its report was not to represent a broad perspective and it wasn't meant as a definitive way to make any decision on a broad number of topics. It was, as the report states, to:

"[G]et a better understanding on what promises error handling modes recently proposed for the Node.js
runtime
are relevant for users, and how they would impact their ability to run applications reliably at Netflix while providing a good developer and operators experience."

I was not suggesting that this is the end game of the users research we may want to do.

@misterdjules
Copy link
Contributor Author

@benjamingr I intend to merge this shortly, unless you have objections.

@mhdawson
Copy link
Member

@misterdjules agreed, was just clarifying as part of my support for landing.

@benjamingr
Copy link
Member

I'm a little concerned about the conclusions but I am very +1 on landing the data.

I don't think we can conclude anything about "GC on rejections" based on the survey since the idea is quite hard to explain to those who haven't heard about it before. I'm fine merging with the conclusion though since the survey and questions are included and people can get their own idea.

@misterdjules
Copy link
Contributor Author

@benjamingr

Thanks for the feedback!

I don't think we can conclude anything about "GC on rejections" based on the survey

It may be more effective to discuss this in person as discussing the finer details of a survey can be quite challenging, but I thought I'd still provide some additional context. The conclusion section of the report states:

We haven't managed to find evidence of exiting on GC being a valid use case. As a result we are still leaning towards recommending not to add this behavior to the Node.js runtime.

In other words, it concludes that this survey hasn't found evidence that GC on unhandled rejection is desirable for the survey respondents. On the other hand, it has found evidence (via interviews), that exiting on GC is perceived as concerning by all interviewees. Do you think there is evidence of the contrary in that survey?

since the idea is quite hard to explain to those who haven't heard about it before.

For what it's worth, in the (limited number of) interviews I've done with survey respondents, they seemed to understand the implications of exiting on GC very well, and they seemed to be quite concerned by those implications. As mentioned in the report though, those interviews were not recorded (and I'm sorry for that), so I understand that using those interviews as a data point to inform my conclusions is not very transparent, which is unfortunate.

Maybe one of the things that this survey could have done better was to have at least one more question specific to the behavior of exiting of GC, and whether this is something users would find desirable or not. We could also pair that with a question testing their understanding of the implications of such behavior, so that we can determine whether their response is based on a good or bad understanding of them

Is it something we should add to the foundation survey?

I would add that I'm not suggesting we should base our technical decisions on survey data. If survey respondents all answered that they'd prefer a solution that aborts the process at a random date when a promise rejection is unhandled, we would probably take that with a grain of salt and come to other conclusions than recommending that behavior.

@misterdjules misterdjules merged commit ec0e6f6 into master Jul 19, 2018
@misterdjules misterdjules deleted the netflix-survey branch July 19, 2018 16:14
@benjamingr
Copy link
Member

so I understand that using those interviews as a data point to inform my conclusions is not very transparent, which is unfortunate.

To be honest I don't think there is a big need to be pedant about this. If you interviewed developers and got that impression I trust your judgment and collection.

The question I'm asking is basically whether or not the respondents understood the tradeoffs involved in exiting on GC vs. the other alternatives. At the moment our alternatives as I understand them are "exit on GC" and "don't exit at all".

We might be able to bring some convincing evidence for exiting on the current unhandledRejection hook which is sooner but that has received a lot of resistance in the past.

Given a significant amount of participants voted to abort on errors and that GC is the first point we can prove a rejection is unhandled - what's the alternative?

@misterdjules
Copy link
Contributor Author

The question I'm asking is basically whether or not the respondents understood the tradeoffs involved in exiting on GC vs. the other alternatives.

As I mentioned previously, during those interviews it seemed that interviewees were well aware of those tradeoffs. But again, that assumes that I am well aware of those tradeoffs, which is not necessarily true (I like to think that I am, but we all have blind spots).

With those questions and interviews, I was not trying to get a definitive answer on whether exit on GC is the right technical decision, and whether respondents understood 100% of the implications. I was trying to find evidence that went against my initial bias towards not recommending that solution. Since I haven't found any evidence in that survey of respondents who consider exit on GC as not confusing, I'm thinking that maybe it should make us pause and dig deeper into that.

Given a significant amount of participants voted to abort on errors and that GC is the first point we can prove a rejection is unhandled - what's the alternative?

We don't necessarily need to consider exit on GC in comparison with other potential solutions. My current thinking is that exit on GC would be confusing for users, and I'd rather see the current behavior of the runtime not change than introduce the exit on GC behavior.

That being said, a significant number of respondents also seemed to express that exiting on an unhandled rejection, synchronously or asynchronously, is a behavior that they find useful, as long as they can override it.

So I'm leaning towards recommending that, which is what the report states.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants