Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes watch connection don't release when reload create runner failed #12096

Closed
DanielQujun opened this issue May 8, 2019 · 7 comments · Fixed by #16349
Closed

kubernetes watch connection don't release when reload create runner failed #12096

DanielQujun opened this issue May 8, 2019 · 7 comments · Fixed by #16349
Assignees
Labels
containers Related to containers use case Team:Integrations Label for the Integrations team

Comments

@DanielQujun
Copy link
Contributor

https://discuss.elastic.co/t/kubernetes-watch-connection-never-close-when-inputreload-create-runner-failed/180112
run filebeat in kubernetes cluster ad set input reload true,when reload start runner failed but the kubernetes processor have already started a watch, after ten seconds reload will try start runner again failed, at last it will keep a lot of connection with kube-apiserver, in my case, the number of connections create by filebeat have been more than 10000, and lead to kube-apiserver cost 10G memory.

@DanielQujun
Copy link
Contributor Author

DanielQujun commented May 8, 2019

https://github.com/elastic/beats/blob/master/libbeat/cfgfile/list.go#L90

for hash, config := range startList {
		// Pass a copy of the config to the factory, this way if the factory modifies it,
		// that doesn't affect the hash of the original one.
		c, _ := common.NewConfigFrom(config.Config)
		runner, err := r.factory.Create(r.pipeline, c, config.Meta)
		if err != nil {
			r.logger.Errorf("Error creating runner from config: %s", err)
			errs = append(errs, errors.Wrap(err, "Error creating runner from config"))
			continue
		}
		r.logger.Debugf("Starting runner: %s", runner)
		r.runners[hash] = runner
		runner.Start()
	}

runner may create failed, but the kubernetes watch connection has been started in process of factory.Create

@exekias
Copy link
Contributor

exekias commented May 8, 2019

Could you please share the config you are using? Starting a runner should not create a new watcher.

@exekias exekias added Team:Integrations Label for the Integrations team [zube]: Investigate containers Related to containers use case discuss Issue needs further discussion. labels May 8, 2019
@zube zube bot unassigned exekias and odacremolbap May 8, 2019
@exekias exekias removed the discuss Issue needs further discussion. label May 8, 2019
@DanielQujun
Copy link
Contributor Author

FYI: config files
filebeat.yml

  prospectors:
    path: ${path.config}/prospectors.d/*.yml
    reload.enabled: false
  modules:
    path: ${path.config}/modules.d/*.yml
    reload.enabled: false

processors:
- add_cloud_metadata:

output.elasticsearch:
  hosts: ['elasticsearch:9200']
  username: elastic

inputs.d/emptydir.json

[{"type": "log", "max_procs": 2, "processors": [{"add_kubernetes_metadata": {"default_matchers.enabled": false, "matchers": [{"logs_path": {"logs_path": "/var/lib/kubelet/pods/", "resource_type": "pod"}}], "in_cluster": true, "indexers": [{"pod_uid": null}], "default_indexers.enabled": false, "include_pod_uid": true}}], "paths": [], "scan_frequency": "20s"}]

inputs.d/stdout.json

[{"containers.ids": ["37eed6ed8bc6e158a9d4b9ec7c8de702b2b19a48cb293b7010594dd83d405c6d", "7364b0262f545a61e186927f7b4e0b6f40bfeab2c57d357bd6eb9693ac4f1d5a", "43635673033c993c0964264d72331f74e8eff0d1c702383603aae061394c174e", "cc2c4c10aafd486f8576641fd9232b210ded7789505f875741bf5d549a4c9c59", "1f56e5238a66865e7258c832a1855e7bd13d565e6bed421a008ea39a1319dd55", "f4a9270e732aa01c000b724f5f10c269623179743429485237fe4929c3d44c3c", "da5f3ec35eb96a09d3898fa49f3bbf30f00568b46f99be04515a324bdae4ad4b", "c624db7638702d401af18d69c7f5ff4e87c84f457daee662e98963b978508a1d", "7480aa01cfe5f25f387250fdca51547fe54676c993934e2477510a8329b0b84c", "17ea053adbb5e457966a63ae4ff0f53052b55aa226700575775d19c78ba6025a", "ea40aadca84a7f5699fba2cfa5dc6a227341df836dd0d503cbe1d7758961c425", "bea979aeb3efc01ae4fea3aef0ccb847fed8c5d2629ac920e81b8e1718a4b638", "b298abe4aa51d4580f5e3d4374b1077ee33544934482af6accfa2fffdbf55f21", "3bd3f7b2afa0bf6e03e251e7ee1b140d07f9e9d6337e046b6748ad7cf71a5008", "78d54330220bcf19fcc3a3ebbfc49b3f21ea808d02ce7e69bd36fe26a98eee53", "017d84ba07b57557d73317f801b06b75e3bba5ecb7af5e04f798b985ad0b3f08", "38bc6162190fb85bd8cdd5095882607f1da765d65c3b666a311f0407e97e2f23", "4625267e29eb134ca5b93f42acf04b40a048c5620131f16ce3151de6757a0a6f", "612a7c1a20c13e2a7d327616605b861e7d3d01acb15c2f7dd0024587a5476a74", "8e2bdeabb4bd6b3223af9bb5de15e13de22af67b5eee6c628d1b3508f5f4d857", "0c017d8788d0833bf5d2317a7cde0c050e86e94ffde81c51135bbab16befaa15", "7efa4f4163576e6fd856709152a76944e66c923a2af9e36ecc092ad18cccab05", "1c38d276884f7a40f41bb97cac1fd766bf459b57a9e386569be40a1e60ca3a1c", "3702ca47a1a0e406510ecb82a0f18f67f5cfc291b135882c4cba2f2176fa2b3e", "7c3e6aa70220127af8bdf33f6cedba2b54168f9a1ad3875517a8e17f6fba073e", "c6939f990b135d216433f0a3028980094d2cd297b9267976f0a3d0790e17ef2f", "72cedd97d953b33be55ed4ca6d932558c1dbe21564f08636a2b659003c49521b", "d3d1cd9dab94b2223880e3a778ca4a010f0d947d39d2b640b20cb419eb892874", "d6f520ffe4d4e43f763f5c46dc5a0708f8c60d1a94ea4962c370f5cc2f3c9c0f", "6e2a331ce256d72918bf46810bc6fc293745e51732a9524ca904bbefa750e808", "4727a42dda3b7a289b30ed713eaf82390d67143d882c827efdd2278b508fb847", "f672777f6a401f7a36f266fa79cece65d3e9664dcb51c63493bc8190245b1e30", "b71de2c21bcb0475de606c37791fba7d075627018517cb69d39656e17c87e50f", "8811251a101a153f817d2d04b52ee661c8a674588b8f072396b1be8eb027e549", "8d048a18fb34ef4b8be1cdffb4ce395bb6615c46a2779d5f806a255f100a7d3d"], "processors": [{"add_kubernetes_metadata": {"in_cluster": true}}], "type": "docker"}]

@DanielQujun
Copy link
Contributor Author

DanielQujun commented May 9, 2019

I followered debug log and source code , it seems when create A newInput it will start the processors at the first line
https://github.com/elastic/beats/blob/master/libbeat/cfgfile/list.go#L94
;https://github.com/elastic/beats/blob/master/filebeat/input/log/input.go#L86;
https://github.com/elastic/beats/blob/master/libbeat/processors/processor.go#L84,

and the add_kubernets_metadata will start watch here:
https://github.com/elastic/beats/blob/master/libbeat/processors/add_kubernetes_metadata/kubernetes.go#L110

so if there any error happens in newinput process,it will create a useless watch.

I'm fresh at filebeat, if I missed something please tell me, thx.

@exekias
Copy link
Contributor

exekias commented May 9, 2019

I wonder if #12106 would fix this

@DanielQujun
Copy link
Contributor Author

yes, fix goroutine leak #12125 add cleanup check when Input creating failed, but I wonder if this could fix kubernetes watch leak, cause it was created by add_kubenetes_metadata processor.

@exekias
Copy link
Contributor

exekias commented Jun 4, 2019

I think we don't currently have a way to free processors on input stopping. A way to avoid this issue is configuring the add_kubernetes_metadata processor in the global scope? that should be more efficient in general, as you will only configure one for all containers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
containers Related to containers use case Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants