Skip to content
This repository has been archived by the owner on Jul 15, 2023. It is now read-only.

Language Server stops working after a while #853

Closed
ramya-rao-a opened this issue Mar 10, 2017 · 31 comments
Closed

Language Server stops working after a while #853

ramya-rao-a opened this issue Mar 10, 2017 · 31 comments

Comments

@ramya-rao-a
Copy link
Contributor

ramya-rao-a commented Mar 10, 2017

See #702 (comment)

The features powered by the language server (go to def, find refs etc.) stop working after some period of time after setting go.useLanguageServer to true

@ramya-rao-a
Copy link
Contributor Author

@josh-tepper The next time this happens, can you check if there is a process called "go-langserver" running?

@mattetti @zmb3 @junkblocker I know you have tried the language server as well. Have you seen any issues like this one?

cc @sqs @keegancsmith

@keegancsmith
Copy link

We haven't seen any issues like this in production. However, the server can occasionally OOM in production (but that is usually when several large repos end up on the same machine).

@josh-tepper
Copy link

josh-tepper commented Mar 10, 2017

Ok -- this happened again. This time, it was accompanied by vscode becoming slow as well.
go-langserver was running, but it was spiking the cpu:

$ ps -O %mem,%cpu,rss 32815
  PID %MEM  %CPU    RSS   TT  STAT      TIME COMMAND
32815  5.3 189.0 892168   ??  R     82:25.01 /Users/joshtepper/work/clarifai/go/bin/go-langserver -mode=stdio

Had to restart vscode which became laggy even when typing

@josh-tepper
Copy link

josh-tepper commented Mar 10, 2017

Since the last comment, I restarted, and now seeing the same thing again -- even larger cpu number:

$ ps -O %cpu,%mem 71085
  PID  %CPU %MEM   TT  STAT      TIME COMMAND
71085 326.7 10.2   ??  R     28:57.74 /Users/joshtepper/work/clarifai/go/bin/go-langserver -mode=stdio

I have to switch configurations so that I can work. But let me know if you want me to switch back to test anything.

@mattetti
Copy link
Contributor

I am seeing issues too. VSCode gets really slow to recover type information. It looks like I have 2 processes running which might be due to the fact that I have 2 projects open.

activity_monitor__all_processes_

@ramya-rao-a
Copy link
Contributor Author

@mattetti Yes, as per current design, a new language server gets spawned for each VS Code window that is open

@keegancsmith
Copy link

I'm used to dealing with servers which I can poke and prod to debug this sort of thing, so hard to know what is going on here. Here is a dump of how to understand what is going on:

  • We support logging to a file, these logs are likely a good first step to understand what is taking a long time to run.
  • We support sending opentracing spans over the telemetry/event to vscode. Not sure if vscode does anything with this.
  • We also support running a pprof http endpoint so you can get some mem/cpu/goroutine/etc dumps. It also has prometheus stats which probably won't be that useful in this case.
  • I can add a signal handler to dump information to stderr.

@ramya-rao-a how can people run go-langserver with the -trace and possible the -logfile flag? Also if I make our log handler send window/logMessage would this make our life easier?

@ramya-rao-a
Copy link
Contributor Author

ramya-rao-a commented Mar 14, 2017

@keegancsmith Currently, users cannot run the the language server with the flags you mentioned, but I can add the provision for the same and release an update in a couple of days.

Since you have a repro for the issue now, you can use your local dev environment for the extension.
Let me know if you need any help there

I am not sure what happens to the telemetry/event. Looking into it.

@keegancsmith
Copy link

Thanks. We believe the root of this issue is our poor use of RAM for editor environments. We have this issue in the go-langserver repo to track sourcegraph/go-langserver#178

@josh-tepper
Copy link

@keegancsmith -- take a look at the numbers from my process table above. They are not consistent with the high memory reports in the issue you linked. On there was very high cpu usage. This was on a machine with 16GB of memory. I'll try to get some better statistics about page faults/memory usage/cpu/maybe an strace/etc the next time this happens.

@josh-tepper
Copy link

Any progress on this? I just wanted to check that this is not blocked on more reports from me. If so, I apologize for the confusion! I'm not currently running the langserver because it wasn't usable, but I can switch back if you need more statistics. Let me know!

@ramya-rao-a
Copy link
Contributor Author

Thanks for circling back @josh-tepper

@keegancsmith I'll be enabling the use of -trace and -logfile flags in the next update. Do you need any more info from the users?

@ramya-rao-a
Copy link
Contributor Author

In the latest update to the Go extension (0.6.56), there will be a new setting go.languageServerFlags that will be passed while running the Go language server.

  • Users can set this to ["trace"] to see the traces from the language server in the output pane under the channel "go-langserver"
  • Users can set this to ["trace", "logfile", "path to a text file to log the trace] to log the traces and errors from the language server to a file.

cc @keegancsmith

@mattetti
Copy link
Contributor

Thanks, I turned on trace and see a weird issue:

[Error - 8:27:47 PM] Request textDocument/hover failed.
  Message: invalid location: /Users/mattetti/Code/golang/src/<>foo.go:#1508
  Code: 0 
[Error - 8:27:51 PM] Request textDocument/hover failed.
  Message: invalid location: /Users/mattetti/Code/golang/src/<>foo.go:#1530
  Code: 0 
[Error - 8:27:52 PM] Request textDocument/hover failed.
  Message: invalid location: /Users/mattetti/Code/golang/src/<>foo.go:#1531
  Code: 0 
[Error - 8:27:53 PM] Request textDocument/definition failed.
  Message: invalid location: /Users/mattetti/Code/golang/src/<>foo.go:#1531
  Code: 0 

The actual path was edited for privacy reason, but the file it's trying to access only has 87 LOC. So something is really going wrong. Is there anything else I can do to help debugging?

@ramya-rao-a
Copy link
Contributor Author

ramya-rao-a commented Mar 30, 2017

That most probably is the offset not line number

@josh-tepper
Copy link

josh-tepper commented Mar 31, 2017

Unfortunately, while I saw the high cpu situation again, I'm not seeing any logs from go-langserver. I basically see a blank golangserver channel when I look at the output in vscode:

screen shot 2017-03-31 at 3 30 16 pm

And the output file that I specified is 0 bytes long. Looking at the go-langserver command from the process table, I can see that it's not being started with --trace or --logfile (though, note the very high CPU!):

$ ps -O %cpu,%mem 48762
  PID  %CPU %MEM   TT  STAT      TIME COMMAND
48762 335.3  7.1   ??  R     19:42.34 /Users/joshtepper/work/clarifai/go/bin/go-langserver -mode=stdio

Here are the go. settings that I've specified:

    "go.docsTool": "gogetdoc",
    "go.formatOnSave": false,
    // because gogetdoc is slow and godoc is broken, use the experimental language server
    // from sourcegraph instead; but this actually causes issues as well
    "go.useLanguageServer": true,
    "go.formatFlags": ["trace", "logfile", "/Users/joshtepper/project/golang_log.txt"],

I do specify go.docsTool, but the go.userLangaugeServer overrides it (it's just there because that's what I revert to; I've commented it out, and still no logs).

Note: I have version 0.6.57 of the go extension, on vscode 1.10.2

@ramya-rao-a
Copy link
Contributor Author

@josh-tepper You have set the trace and logfile flags for "go.formatFlags" instead of "go.languageServerFlags" :)

@josh-tepper
Copy link

I apologize for the delay -- I've been away and should have noted that here.

I've grabbed a few examples of the logs from the high cpu situation. Unfortunately, I didn't realize how much actually code and code structure is included in the logs, and the consensus here is that we're not comfortable sharing it. This is a closed source project

@keegancsmith -- is there a way that I can do analysis of the process? Is there another flag that I can pass that will dump less of this code/structure to the logs?

@keegancsmith
Copy link

@josh-tepper we are aware of a few issues when using the go-langserver on a normal dev machine vs production at sourcegraph.com. We will be prioritising time soon to address them. At this point I don't think you need to gather any data for us. Rather when we have addressed the issues I'd love for you to retry the langserver. Thanks for your patience thus far.

@cyc115
Copy link

cyc115 commented Apr 24, 2017

Thanks for the development of the go plugin!

I also have this problem and is getting very annoying (it would hog the cpu every two minutes or so if I use the "peek / goto definition" feature.
To solve this I used a script to check process name and cpu usage and kill the process with process name matching go-langserver every 2 second if cpu usage exceeds 95%.

@ramya-rao-a
Copy link
Contributor Author

@cyc115 I'd suggest to disable the language server and try again after @keegancsmith and his team have addressed the issues

@nicpottier
Copy link

Just adding a quick note here that I'm facing the some problems and had to disable the lang-server to make vs-code workable again on a go repo.

@josh-tepper
Copy link

Just wondering if there's any update on this. Is there build that we should test? @keegancsmith

@ramya-rao-a
Copy link
Contributor Author

@keegancsmith is no longer working on this as far as I know.
pinging @slimsag instead

@sqs
Copy link

sqs commented Jun 13, 2017

Indeed - @slimsag is working on this now (@keegancsmith is on parental leave for a few more weeks 🎊)

@emidoots
Copy link

emidoots commented Jun 14, 2017

I'll have an updated build for folks to try soon -- apologies for the delay. :)

In the meantime, if anyone would like to help out, please make sure your binary is updated (run go get -u github.com/sourcegraph/go-langserver) and capture a heap/CPU profile per https://github.com/sourcegraph/go-langserver#profiling whenever the issue occurs.

emidoots added a commit to sourcegraph/go-langserver that referenced this issue Jun 14, 2017
…rging

Typechecking used to cause linear memory growth, now it doesn't. In the docker repository, editing and hovering 5 times (to trigger typechecking) produces:

Without this change:

1. 719MB
2. 1.21GB
3. 1.89GB
4. 2.24GB
5. 2.51GB

With this change:

1. 655MB
2. 776MB
3. 787MB
4. 787MB
5. 787MB

i.e. we used to grow by about 500MB per typecheck operation, now we just idle at ~500MB.

I actually planned to remove the cache _entirely_, but with a change in direction about how I will be tackling the slowness of go to definition and hover, removing the cache entirely is not possible anymore. This is the next best thing, and is still a HUGE improvement.

Each `LangHandler` has it's own cache (see `(*LangHandler).resetCaches`), but they are backed by the same LRU cache (see `cache.go`). Every cache that wraps the LRU cache (see `newTypecheckCache`) uses a global cache ID to avoid conflicts:

```
	// cacheID is used to prevent key conflicts between different
	// LangHandlers in the same process.
	cacheID int64
```

That is, we use the same backing LRU cache but keep our data separate. The intent of this, I presume, is to keep memory usage lower (because 2 LRU caches would have 2x the number of objects in caches overall). At first glance, this seems like a good idea. But when it comes to purging the cache, issues crop up. For example:

```
func (c *boundedCache) Purge() {
	// c.c is a process level cache. Since c.id is part of the cache keys,
	// we can just change its values to make it seem like we have purged
	// the cache.
	c.mu.Lock()
	c.id = nextCacheID()
	c.mu.Unlock()
}
```

This code assumes that we can purge the cache by incrementing the cache ID. And that's true -- data from the old cache ID will never be returned. However, this doesn't actually purge the old cache contents from memory!

In the case of sourcegraph.com, not purging things from memory is OK (we expect our process to hold all of it's memory, basically). But for local editors using the language server, this isn't OK. Every time the user modifies a file, the cache is purged (because the typechecking data is invalid). Because the purging doesn't actually clear contents from memory, we store (by default) up to 500 typechecked copies of the user's program. For something like docker, these cache objects can be up to 500MB of memory each (and we'll store 500 of them).

Given this is the case, we should really purge the cache contents instead of just invalidating them via ID incrementing. This will actually help lower memory consumption on sourcegraph.com, too, so it's a win-win situation.

Helps microsoft/vscode-go#853
Helps #178
emidoots added a commit to sourcegraph/go-langserver that referenced this issue Jun 14, 2017
…rging

Typechecking used to cause linear memory growth, now it doesn't. In the docker repository, editing and hovering 5 times (to trigger typechecking) produces:

Without this change:

1. 719MB
2. 1.21GB
3. 1.89GB
4. 2.24GB
5. 2.51GB

With this change:

1. 655MB
2. 776MB
3. 787MB
4. 787MB
5. 787MB

i.e. we used to grow by about 500MB per typecheck operation, now we just idle at ~500MB.

I actually planned to remove the cache _entirely_, but with a change in direction about how I will be tackling the slowness of go to definition and hover, removing the cache entirely is not possible anymore. This is the next best thing, and is still a HUGE improvement.

Each `LangHandler` has it's own cache (see `(*LangHandler).resetCaches`), but they are backed by the same LRU cache (see `cache.go`). Every cache that wraps the LRU cache (see `newTypecheckCache`) uses a global cache ID to avoid conflicts:

```
	// cacheID is used to prevent key conflicts between different
	// LangHandlers in the same process.
	cacheID int64
```

That is, we use the same backing LRU cache but keep our data separate. The intent of this, I presume, is to keep memory usage lower (because 2 LRU caches would have 2x the number of objects in caches overall). At first glance, this seems like a good idea. But when it comes to purging the cache, issues crop up. For example:

```
func (c *boundedCache) Purge() {
	// c.c is a process level cache. Since c.id is part of the cache keys,
	// we can just change its values to make it seem like we have purged
	// the cache.
	c.mu.Lock()
	c.id = nextCacheID()
	c.mu.Unlock()
}
```

This code assumes that we can purge the cache by incrementing the cache ID. And that's true -- data from the old cache ID will never be returned. However, this doesn't actually purge the old cache contents from memory!

In the case of sourcegraph.com, not purging things from memory is OK (we expect our process to hold all of it's memory, basically). But for local editors using the language server, this isn't OK. Every time the user modifies a file, the cache is purged (because the typechecking data is invalid). Because the purging doesn't actually clear contents from memory, we store (by default) up to 500 typechecked copies of the user's program. For something like docker, these cache objects can be up to 500MB of memory each (and we'll store 500 of them).

Given this is the case, we should really purge the cache contents instead of just invalidating them via ID incrementing. This will actually help lower memory consumption on sourcegraph.com, too, so it's a win-win situation.

Helps microsoft/vscode-go#853
Helps #178
emidoots added a commit to sourcegraph/go-langserver that referenced this issue Jun 23, 2017
This change adds a godef-based hover backend, similar to the godef-based
`textDocument/definition` backend added prior. The motivation for this
change is to avoid typechecking the entire program just to serve a single
`textDocument/hover` request. After this change, hover and definition are
both very quick and use little resources. Larger requests like `workspace/symbol`,
or `textDocument/references`, will continue to use more resources as they must
perform typechecking.

The output style of this hover implementation does vary from our prior
typechecking implementation in slight ways, but overall the implementation
always produces results that are on par or slightly better than our typechecking
implementation.

As with the `textDocument/definition` implementation, we should attempt consolidation
of this hover implementation with our typechecking variant further in the future. At
this point, of course, they are too different to share much code or implementation.

Helps #178
Helps microsoft/vscode-go#853
@emidoots
Copy link

Hi everyone ( /cc @ramya-rao-a ), this issue should be fixed now. 😃 Very significant improvements have been made to both goto definition and hover requests, and they now use a tiny fraction of the memory / CPU usage seen before.

To update, please run:

go get -u github.com/sourcegraph/go-langserver

And once again set go.useLanguageServer to true inside your settings to enable the language server.

Please feel free to provide any feedback here, and if all looks good we can close this issue out!

@josh-tepper
Copy link

josh-tepper commented Jun 26, 2017

This appears to completely solve the problem for me.

$ ps -O %cpu,%mem,rss 27370
  PID  %CPU %MEM    RSS   TT  STAT      TIME COMMAND
27370   0.0  0.0   5836   ??  S      0:48.00 /Users/joshtepper/work/clarifai/go/bin/go-langserver -mode=stdio

@ramya-rao-a
Copy link
Contributor Author

Thanks @slimsag for the amazing work!

@josh-tepper Thanks for confirming
@nicpottier @cyc115 Can you get the latest for the language server go get -u github.com/sourcegraph/go-langserver, turn the setting back on and see if it works well for you too?

@emidoots
Copy link

@ramya-rao-a What do you think of closing this issue now (and folks can open new issues if they run into any problems)? 😄

@ramya-rao-a
Copy link
Contributor Author

Agreed :)

@vscodebot vscodebot bot locked and limited conversation to collaborators Jan 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants