-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extension/observer/k8sobserver] Flaky test (DATA RACE) - TestExtensionObservePods #29448
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
It seems that there is concurrent access in
# Tests:
cmd = go test -v -count=1 -race -timeout 300s -parallel 4 --tags="" -cover ./... -covermode=atomic -args -test.gocoverdir=/path/to/otel/coverage/unit
cd opentelemetry-collector-contrib/extension/observer/k8sobserver
for i in {1..20}; do echo "Run $i"; ${cmd} &; done When I commented out the following code, the opentelemetry-collector-contrib/internal/common/k8sconfig/config.go Lines 107 to 114 in 593351e
|
…ACE) of TestExtensionObservePods (#29463) **Description:** Try to fix flaky test (DATA RACE) of `TestExtensionObservePods`. Since `config.ObservePods` is set true by default, for `TestExtensionObserveServices` and `TestExtensionObserveNodes` test case, we just focus on services observer and nodes observer, so we don't need to use `k8sObserver.podListerWatcher` which can avoid data race in the same process. **Link to tracking Issue:** #29448 **Testing:** go test for k8sobserver **Preparation:** ``` Shell DIR = opentelemetry-collector-contrib/extension/observer/k8sobserver DATA_DIR = /path/to/otel/coverage/unit CMD = go test -run TestExtensionObserve -v -count=1 -race -timeout 300s -parallel 4 --tags="" -cover ./... -covermode=atomic -args -test.gocoverdir=${DATA_DIR} MULTI_CMD = for i in {1..50}; do echo "Run $i"; ./{CMD} &; done ``` **Tests:** 1. I found `config.ObservePods` is set true by default. Then `TestExtensionObserveServices` and `TestExtensionObserveNodes` will enable `k8sObserver.podListerWatcher` and try to access the same http proxy since they are in the same process. https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/39641dfac9482c62ece9505fef89fcff81ee258b/extension/observer/k8sobserver/factory.go#L26-L33 2. I add log(as shown below) for `TestExtensionObserve*` functions and found test case **in the same process**. Refer to [stack overflow](https://stackoverflow.com/a/24376644), go test are executed as goroutines and executed concurrently. <img width="783" alt="image" src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/3955787/a9baba19-0911-4d49-9395-8fb4a58a5e9b"> > --------- TestExtensionObserveServices process ID: 38910 > --------- TestExtensionObservePods process ID: 38910 > --------- TestExtensionObserveNodes process ID: 38910 3. I add log (as shown below) for http proxy setting addr, and found use the same address: <img width="951" alt="image" src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/3955787/a5c8e4e7-b6bd-4d8b-a82f-c9142d31a249"> > -------------- use proxy connectMethodForRequest t.Proxy addr: 0xc0007142f0 ------------------- > -------------- transport t.Proxy (is) nil > -------------- send request: {Method:GET URL:https://mock:12345/api/v1/pods?limit=500&resourceVersion=0 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Accept:[application/json, */*] User-Agent:[k8sobserver.test/v0.0.0 (darwin/amd64) kubernetes/$Format]] Body:<nil> GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:mock:12345 Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc000410120} 4. So we may find one goroutine is writing http proxy and the other goroutine read with http proxy(HTTP request for **Pods Lister**). As a result, this has led to data race. **Inference:** I found that `config.ObservePods` is set true by default. For `TestExtensionObserveServices` and `TestExtensionObserveNodes` test case, we just focus on services observer and nodes observer, so we don't need to use `k8sObserver.podListerWatcher` which can avoid data race in the same process. **Documentation:** --------- Signed-off-by: sakulali <sakulali@126.com>
Resolved by #29463 |
Hi @crobert-1, i found this failed test time seems before the commit time? |
Good catch @sakulali, I saw this on a currently open PR, didn't realize the failed test was that long ago 👍 |
Component(s)
extension/observer/k8sobserver
Describe the issue you're reporting
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6959238824/job/18936065429?pr=26644
The text was updated successfully, but these errors were encountered: