Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Reload window" breaks EH debugging #55955

Closed
weinand opened this issue Aug 7, 2018 · 28 comments
Closed

"Reload window" breaks EH debugging #55955

weinand opened this issue Aug 7, 2018 · 28 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug candidate Issue identified as probable candidate for fixing in the next release debug Debug viewlet, configurations, breakpoints, adapter issues verified Verification succeeded
Milestone

Comments

@weinand
Copy link
Contributor

weinand commented Aug 7, 2018

VS Code Insider (1.26):

  • use the "Hello World" extension.
  • F5
  • in the new window do a "Reload Window"

Observe:

  • extension reloads
  • VS Code restarts debug session
  • after 10 seconds this message appears:
    2018-08-07_17-59-28

Analysis:
I tried to reproduce this when running out of source, but I couldn't.
This makes me think that the problem is a timing issue.

Looking at the dev tools console and our own debug console has some evidence that supports this hypothesis:

2018-08-07_18-03-19

Explanation:

EH Debugging assigns debug port 8790 for the debug session (and the idea is that the EH continues to use this port even after being restarted).
However the dev tools console shows that the restarted EH listens on port 8791 after the "reload window".

This means that it could not open port 8790 and picked the next available: 8791.
The reason why 8790 is not available is probably because 8790 is still in use because the EH is still running. I haven't seen this behaviour in EH debugging before, but it could be a new feature of the updated node.js version we are using.

The consequence of this is that VS Code tries to attach (again) to port 8790 instead of 8791 which results is the broken EH experience.

And this is probably the deeper issue behind #55916 and #55025.

I do not yet know what exactly has changed and I do not have a fix.

/cc @bpasero @roblourens any ideas?

@weinand weinand added bug Issue identified by VS Code Team member as probable bug debug Debug viewlet, configurations, breakpoints, adapter issues labels Aug 7, 2018
@weinand weinand self-assigned this Aug 7, 2018
@bpasero
Copy link
Member

bpasero commented Aug 7, 2018

I can reproduce but I do not have an explanation for this behaviour. All I know is that I had to remove our various usages of --debug with --inspect in scripts and code.

We should probably treat this with higher priority to see if we can fix it for our upcoming release. I am sure our devs often reload windows and would get frustrated quickly.

/cc @Tyriar

@roblourens
Copy link
Member

I can't repro, someone set "trace": true and send me logs?

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

@bpasero replacing --debug with --inspect is independent from EH debugging because EH debugging only works for --inspect for quite some time.

@alexandrudima your C++ code for fixing the exit/flushing problem of Electron node versions < 8.0 is still in, correct? But you are not aware of any problems of this code in node versions >= 8.0?

@Tyriar
Copy link
Member

Tyriar commented Aug 7, 2018

Should this have the candidate/milestone set to increase visibility? Would this block the release?

@weinand weinand added the candidate Issue identified as probable candidate for fixing in the next release label Aug 7, 2018
@weinand weinand added this to the July 2018 milestone Aug 7, 2018
@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

Yes, I've added the label/milestone.

@roblourens
Copy link
Member

I still can't repro it. I don't think this is a candidate, if it is then surely #55916 should be.

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

I'm pretty sure that #55916 and #55955 are the same.

@roblourens here is the trace:
vscode-node-debug2.txt.zip

The problem with the trace is that it only contains info from the first EH.

@roblourens
Copy link
Member

I don't think that's from the first EH. It shows the same thing that the log in #55899 shows, the DA attached and sent "initialized", and vscode never sent configuration requests.

@roblourens
Copy link
Member

roblourens commented Aug 7, 2018

This seems the same as #55899, if Reload Window is basically the same case as openFolder. I can only repro this one on windows too.

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

Reload Window and openFolder both result in a restart of the EH.

@alexdima
Copy link
Member

alexdima commented Aug 7, 2018

@alexandrudima your C++ code for fixing the exit/flushing problem of Electron node versions < 8.0 is still in, correct? But you are not aware of any problems of this code in node versions >= 8.0?

Yes, that code is still in and was not touched.

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

@roblourens if you see the problem, could you please verify that the dev tools console in the EH windows shows the message pointed by the arrow:

2018-08-07_21-26-45

I think that explains what you are seeing in the trace:

VS code tries to attach to the provided port (7727) but that port is not free and VS Code uses a different port.

@roblourens
Copy link
Member

I can't repro this anymore. Is it possible that you were trying on yesterday's insiders? (before my terminate fix)

@roblourens
Copy link
Member

Actually I do repro it on yesterday's, but not today's. On yesterday's, I do see the same issue as you.

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

I'm always building VS Code from the 1.26 branch and I can reliably reproduce the problem.
But maybe my 2018 6-core MBP makes a (timing) difference...

When running out of source I cannot repro the problem.

And when I delay the creation of the new EH session a bit (so that the old session has more time to terminate properly and free up the debug port), the problem disappears.

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

@roblourens can you repro #55899 with today's build?

@roblourens
Copy link
Member

No. Are you sure it's downloading the new version of node2? Can you try with the latest insiders?

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

Both my Insiders and my own build use this version of node-debug2:

2018-08-07_23-01-36

weinand added a commit that referenced this issue Aug 7, 2018
@roblourens
Copy link
Member

Where do we check if the port is free? The debug adapter doesn't do that. Does vscode do it? It seems like it shouldn't because in this case we are reusing a port that was already used.

@roblourens
Copy link
Member

@ramya-rao-a still reproes #55899 with today's build, I still don't think it's the same

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

I've pushed this workaround/fix which solved the problem for me reliably.
Without the fix I was able to reproduce the problem in 10 out of 10 runs.

@roblourens extensionHost.ts:290

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

@ramya-rao-a I've pushed a fix and we should rebuild.

@roblourens
Copy link
Member

Do you know what changed to cause this? I don't understand why we are starting the new EH before the old EH has shut down.

@weinand
Copy link
Contributor Author

weinand commented Aug 7, 2018

I suspect that this was either something Isi changed while refactoring or that the node v8.x in Electron behaves slightly differently than node v7.x. We should investigate when Isi is back next week.

I suspect that the new EH is started before the old EH has shut down to speed up "Reload Window".
But I think this strategy collides with the approach to reuse the debug port across restarts.

Another fix/workaround would be to retry the same port in extensionHost.ts:290 until it becomes free instead of trying a different port that nobody expects.

@weinand weinand closed this as completed Aug 7, 2018
@joaomoreno joaomoreno reopened this Aug 8, 2018
@joaomoreno
Copy link
Member

Reopening since there is still a timing issue, as per standup discussion.

@weinand
Copy link
Contributor Author

weinand commented Aug 8, 2018

more investigation by @bpasero and myself:

  • the problem does not occur if we change the Electron 2.x back to Electron 1.7.x,
  • the problem is related to having two EH processes running in parallel: if the waiting period of the old EH is increased (in https://github.com/Microsoft/vscode/blob/a8033aaf216d684d775e058489d25186702d28e3/src/vs/workbench/node/extensionHostMain.ts#L183) the problem occurs reliably. But the debug port is not the issue: if the initial port has not been freed by the EH, VS Code determines another free port and that is passed back to the debugger properly and the debugger attaches to that port correctly.
  • the problem is more difficult to reproduce is there are breakpoints (which influences the timing before the "continue" request is send to waiting node.js).
  • yesterday's workaround to delay the new session by 50ms was too short. A delay by 1000ms seems to work on slower machines or machines under heavy load.

I'm still investigating why two parallel EH sessions are unreliable in VS Code based on Electron 2.x...

@weinand
Copy link
Contributor Author

weinand commented Aug 8, 2018

Found the real issue:

  • for the new EH-DA we register an event handler for the DAP "intialized" event
  • the old EH-DA terminates and unregisters the "intialized" event handler
    -> we no longer receive the "intialized" event and we never sent a "configurationDone" request which is needed to continue execution in the waiting EH.

In the old Electron either the EH-DA terminates earlier so that the "register an event handler" comes after the "unregister" (unlikely), or the EH-DA terminates way later so that the "intialized" event handler is unregistered after the one and only "intialized" event has already arrived and processed (more likely).

All listeners are managed in one pool and the disposal code removes listeners independent from whether they are used or not. This fix is to scope listeners based on the DA they are tracking and only dispose those when the associated DA terminates.

My delay workaround avoids the issue by delaying the creation of the new EH-DA so that the "register" comes after the "unregister".

I'm working on a real fix now.

@weinand
Copy link
Contributor Author

weinand commented Aug 8, 2018

A "good" fix would be larger (and higher risk) than what I want to release at this point in time.
For this reason I'll push the "delay" workaround (with a larger delay value).

Since we have analysed the problem in great detail, we now understand why the delay workaround actually works. At this point in time it is the best what we can (and should) do.

I'll keep my good fix for @isidorn to review and approve.

@weinand weinand closed this as completed Aug 8, 2018
@bpasero bpasero added the verified Verification succeeded label Aug 9, 2018
@isidorn isidorn mentioned this issue Aug 27, 2018
3 tasks
@vscodebot vscodebot bot locked and limited conversation to collaborators Sep 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug candidate Issue identified as probable candidate for fixing in the next release debug Debug viewlet, configurations, breakpoints, adapter issues verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

6 participants