Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very High CPU consumption #217

Closed
jamesskinner opened this issue Jul 8, 2020 · 23 comments · Fixed by #236
Closed

Very High CPU consumption #217

jamesskinner opened this issue Jul 8, 2020 · 23 comments · Fixed by #236
Labels
bug Something isn't working upstream/hcl

Comments

@jamesskinner
Copy link

Server Version

0.4.1

Terraform Version

0.12.24

Client Version

neovim (v0.4.3) coc plugin

Terraform Configuration Files

Log Output

https://gist.github.com/jamesskinner/b759199add68935349926b9e052d9bbd

Expected Behaviour

minimal cpu usage

Actual Behaviour

99.9% cpu usage

Steps to Reproduce

I don't see a specific scenario. I do work on a codebase containing multiple root terraform repositories in subfolders. I am on MacOS v10.15.4

@radeksimko radeksimko added the bug Something isn't working label Jul 8, 2020
@radeksimko
Copy link
Member

Thank you for sharing the log @jamesskinner and filing the issue.

I can see from the log that you have a folder with at least 16 root modules in it which the server is attempting to load schema for.

There may be some ways we can tackle this problem, but before jumping to workarounds and solutions I'd like to first understand the full scale of your setup. Do you mind sharing how many root modules in total would you typically open in your editor within the context of a single folder "hierarchy"? i.e. can you count the number of .terraform directories in the whole hierarchy?

@jamesskinner
Copy link
Author

@radeksimko yes; we use terraform cloud to provision an environment which is made up of (at the moment) 11 workspaces. Each one maps to a root terraform directory which we keep in a single git repository.

The root of the repo looks roughly like this:

.
├── account
│   ├── main.tf
│   └── variables.tf
├── components
│   ├── a
│   │   ├── main.tf
│   │   └── variables.tf
│   ├── b
│   │   ├── main.tf
│   │   └── variables.tf
│   ├── c
│   │   ├── main.tf
│   │   └── variables.tf
│   └── d,e,f...
└── tfe-workspaces
    ├── main.tf
    └── variables.tf

Then each of the subdirectories under components/, as well as the account and tfe-workspaces direcories are their own root terraform modules.

typically I would open my editor at the root of the repository and have buffers open from multiple "component" subdirectories. Some of the sub-directories, I will have run terraform init inside, some I will not have done

Hope that helps :-)

@ghost ghost removed waiting-response labels Jul 8, 2020
@radeksimko
Copy link
Member

radeksimko commented Jul 9, 2020

Seeing the hierarchy is useful, and 11 root modules (reflected as TFC's workspaces) doesn't sound too high. I can see that the schema was obtained for the first 5 of those within 1-2 seconds, which also suggests you don't have too complex provider setup (not too many providers per root module or no providers with large schemas).

The log spans about 12 seconds - does that roughly correspond with the CPU spike you experienced?

It would be useful to gather some CPU usage data, but we don't really have any tools in place to do that yet.
Just to gather some anecdotal evidence - what's the hardware (processor speed & memory capacity) you're running on? I'm guessing MacBook from the path patterns in the log. I will do some profiling myself, but knowing the variables might help.

The upcoming 1.4.2 release will make most of the loading logic asynchronous (#196 #219). I'm not sure what exact impact it will have on the CPU usage. I suspect it will retain any spike you may experience, but the spike may become shorter.

@jamesskinner
Copy link
Author

Sorry not to be clear on this - I do not see a spike in CPU, after the process has been running for a while it starts to continuously consume 99% CPU until I kill it, like its stuck in an infinite loop or similar.

This continues whether I am actively using the editor or not

Hardware: Macbook pro - 2.7 GHz, i7 and 16GB RAM

I will see if I can nail down exactly when this happens and get some sort of CPU profile for the process

@radeksimko
Copy link
Member

radeksimko commented Jul 9, 2020

I do not see a spike in CPU, after the process has been running for a while it starts to continuously consume 99% CPU until I kill it, like its stuck in an infinite loop or similar.

I see, that certainly should not be happening then, especially not on the hardware you have.

I will see if I can nail down exactly when this happens and get some sort of CPU profile for the process

Thanks, and I'm sorry we don't have any easy way of collecting the data yet.

FWIW the ~12 seconds of log you shared describes probably the most CPU-intensive operations that the server does, so it may be worth looking into the remaining part of it. If you don't feel comfortable sharing it in plain-text, you can PGP encrypt it with our key: https://www.hashicorp.com/security#secure-communications

Alternatively tracing the syscalls might provide us some hints as to what's going on, if logs really can't do that.
dtruss is pretty good on macOS. The only caveat is that LSP (protocol used between your editor and the server) communicates over stdin/stdout and dtruss by default prints output to stdout, so you'd need to launch the server as usual and then connect to it via PID to prevent your editor from being confused by the extra output.

If we can confirm this is unrelated to the initial schema loading, my next suspicion would be the file watcher or the file walker.

@jamesskinner
Copy link
Author

I removed the extra log output as it was just logging lsp request and responses and contains code (which does not belong to me). There is no continuous logging output going on while the editor is idle and the terraform-ls process is still consuming all the CPU.

If I get some time today I will try and do the profiling and encrypt some log output

Another note on my setup that might be relevant - I usually have multiple editor sessions open so there are often multiple terraform-ls processes running at the same time. Anecdotally I think only the process related to the more complicated codebase I described above causes issues, the others work as expected and consume expected amounts of CPU

@radeksimko
Copy link
Member

Another note on my setup that might be relevant - I usually have multiple editor sessions open so there are often multiple terraform-ls processes running at the same time.

That is a useful data point. Every client (editor) controls launching and managing server instances slightly differently from my experience. I'm not familiar with the NeoVim one myself though, so I can't comment on how/why there should/would be multiple processes. Sublime Text usually manages just a single instance of a server and just opens new session for each folder AFAIK.

It would be interesting to see if you can also reproduce this in a different editor - e.g. VSCode or Sublime Text.

However this shouldn't cause consistently high CPU usage unless the sessions fight over the same file descriptors (i.e. you manage to open the same files/dirs in two or more sessions at the same time).

@jamesskinner
Copy link
Author

jamesskinner commented Jul 9, 2020

Encrypted log output

dtruss output

I ran dtruss against the same process as the logs were for but around 30 mins after the last log line, at this stage the process was consuming all the CPU but not being requested by the editor. I then manually killed the process, which you can see in the logs.

When I say I run multiple "sessions" this means multiple isolated processes of neovim, which in turn spin up their own terraform-lsp terraform-ls process. They are not running in the same directories though and don't share resources (open buffers etc) as far as I know - so unlikely there is some fight over file descriptors. In this case too, I expect you would see both processes in the "fight" experiencing performance issues, not just one ¯_(ツ)_/¯

@jamesskinner
Copy link
Author

I should probably add that its not crazy to imagine people opening multiple editor processes in the same project if you think that is something that would potentially cause an issue regardless of this problem

@radeksimko
Copy link
Member

Thank you for providing both the log and dtruss output. Unfortunately I wasn't able to spot anything unusual there at this point.

I raised #223 to aid with debugging of this problem by collecting CPU/memory profiles. I'm hoping to get it shipped in the next release (either later today or early next week), but you would be more than welcomed to compile it yourself in the meantime.

I should probably add that its not crazy to imagine people opening multiple editor processes in the same project if you think that is something that would potentially cause an issue regardless of this problem

Yep, I wasn't suggesting it's unreasonable, ideally this shouldn't cause any problems, but if it does then I think it's not unreasonable to guess that high CPU may be one of the symptoms of such bug.

@radeksimko
Copy link
Member

which in turn spin up their own terraform-lsp process

Just to double check - am I right that this was merely a typo and you're not running the other server alongside this one?

@jamesskinner
Copy link
Author

I'm hoping to get it shipped in the next release (either later today or early next week), but you would be more than welcomed to compile it yourself in the meantime.

Wow, nice turnaround! I will probably just wait for the release if its that close

am I right that this was merely a typo

@radeksimko yep typo :-)

@radeksimko
Copy link
Member

I have just shipped 0.5.0 which is now also installable via Homebrew btw.

Do you mind updating and try to collect the CPU profile via the new flag as documented here?
https://github.com/hashicorp/terraform-ls/blob/master/docs/TROUBLESHOOTING.md#cpu-profiling

@jamesskinner
Copy link
Author

I have been running 0.5.0 all day today...have had no issues with CPU. Not been doing a lot of terraform development though so will see how it behaves this week - is it possible something fixed the issue in this release?

@ghost ghost removed waiting-response labels Jul 13, 2020
@radeksimko
Copy link
Member

FYI 0.5.0 had a bug which was subsequently fixed in 0.5.1, but I'm assuming you just upgraded to the latest version? i.e. you are in fact on 0.5.1?

0.5.0 doesn't load any schema at all as a result of the bug. #227 so if you are really running that version then this may also be a side effect of nothing being executed.

I am not sure what could've caused the high CPU consumption, but most operations on root modules are now done asynchronously and all the different stages of loading a module are much more clearly defined (compared to before), so it's possible there was a bug before which got fixed by this, but I haven't had the chance to experience it myself or get as far in debugging it that I could confirm correlation with the changed code.

@Liquidmantis
Copy link

Liquidmantis commented Jul 13, 2020

I've had the CPU bug with 0.4.1 on my very small projects. I've been using VSCode on Mac, and it seems like the intellisense would crash the language server, causing the 100% CPU (single-core) peg until killed terraform-ls. I just upgraded to 0.5.1 and haven't had an opportunity to do any TF work with it yet, but it was a frequent issue on 0.4.1.

@radeksimko
Copy link
Member

@m1keil and anyone else coming here to report high CPU related issues - please check that you are on the latest language server version - v0.5.1 (the VS Code extension updates it automatically when VS Code launches).

If the problem persists then I would greatly appreciate if you can modify the launch arguments in Settings to collect the CPU profile and attach that profile to your comment here for further inspection.

Thank you 🙏

@jamesskinner
Copy link
Author

jamesskinner commented Jul 14, 2020

@radeksimko. Yep I was running 5.0.0 :-) I updated to latest, the process was running for a day or so before I actually started to use it, at which point I saw the same issue.

Here are memory and cpu profile files:

terraform-ls-prof.zip
terraform-ls-87908-log.zip

I have not included the log file, but can if needed included the log file

@radeksimko
Copy link
Member

radeksimko commented Jul 14, 2020

@jamesskinner Thank you - this is very useful and I think we can start narrowing it down to certain parts of the codebase.

It appears that most CPU time is being spent on config parsing, which is triggered by a completion request. 🤔 Am I right that the CPU doesn't spike if you leave the IDE open after launch and avoid requesting completion? Keep in mind that some IDEs send completion requests automatically on-type (VS Code does that), I'm not sure if NeoVim does - I will check.

So in other words if you just open a file and avoid doing anything - does the CPU usage remain high?

The log would also be useful - ideally log from the same run that the profile is from, if you have it.

@jamesskinner
Copy link
Author

Added the log into my previous comment. If I just open a file, CPU does not spike, in fact what I did with this process is open neovim and open a buffer from all the terraform root subdirectories in my repository (10 or 11), then I just left it and for a say or so nothing happened. Then today I actually started to write some code and the CPU started spiking. Every time this happens it goes up to 100% and stays there no matter if I am using the editor or not

@radeksimko
Copy link
Member

radeksimko commented Jul 14, 2020

I managed to come up with a reliable repro case which leads to pretty much the same hot path highlighted in my collected CPU profile as @jamesskinner 's (and indeed 100% CPU usage)

provider "null" {
}

variable "name" {
  default = "
}

^ triggered by completion near the unclosed double quote (default = "<HERE>)

I'm going to try to isolate the code which is causing this.

@radeksimko
Copy link
Member

Thank you all (and @jamesskinner in particular) for your help in debugging. #236 with the patch was just merged and released as part of 0.5.2.

@ghost
Copy link

ghost commented Aug 15, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the context necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Aug 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working upstream/hcl
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants