Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(instrumentation-http): close server span when response finishes #3407

Merged
merged 6 commits into from
Dec 20, 2022

Conversation

legendecas
Copy link
Member

@legendecas legendecas commented Nov 14, 2022

Which problem is this PR solving?

Close server span when the ServerResponse emits 'close' event.

Fixes #3104

Short description of the changes

ServerResponse.end() writes data to the socket asynchronously. The span should be closed when the response has been sent.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Verifies that responseHook can be used to set additional attributes when ServerResponse.end is called.

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added

@legendecas legendecas force-pushed the instrumentation/http-end branch from e2a7b61 to bca9b9a Compare November 14, 2022 07:41
@legendecas legendecas marked this pull request as ready for review November 14, 2022 07:41
@legendecas legendecas requested a review from a team November 14, 2022 07:41
@legendecas legendecas force-pushed the instrumentation/http-end branch from bca9b9a to 336e7bd Compare November 14, 2022 07:47
@codecov
Copy link

codecov bot commented Nov 14, 2022

Codecov Report

Merging #3407 (55e0510) into main (2fb80eb) will increase coverage by 0.72%.
The diff coverage is 72.72%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3407      +/-   ##
==========================================
+ Coverage   92.99%   93.72%   +0.72%     
==========================================
  Files         241      248       +7     
  Lines        7096     7567     +471     
  Branches     1494     1581      +87     
==========================================
+ Hits         6599     7092     +493     
+ Misses        497      475      -22     
Impacted Files Coverage Δ
...ges/opentelemetry-instrumentation-http/src/http.ts 93.89% <72.72%> (-0.97%) ⬇️
...s/opentelemetry-instrumentation-fetch/src/fetch.ts 97.00% <0.00%> (ø)
...mentation-xml-http-request/src/enums/EventNames.ts 100.00% <0.00%> (ø)
...emetry-instrumentation-xml-http-request/src/xhr.ts 97.58% <0.00%> (ø)
...es/opentelemetry-context-zone-peer-dep/src/util.ts 100.00% <0.00%> (ø)
...ry-context-zone-peer-dep/src/ZoneContextManager.ts 85.00% <0.00%> (ø)
...-instrumentation-fetch/src/enums/AttributeNames.ts 100.00% <0.00%> (ø)
...ation-xml-http-request/src/enums/AttributeNames.ts 100.00% <0.00%> (ø)
packages/opentelemetry-sdk-trace-web/src/utils.ts 95.03% <0.00%> (+29.19%) ⬆️

@legendecas legendecas requested a review from a team November 22, 2022 08:48
@dyladan
Copy link
Member

dyladan commented Dec 5, 2022

@open-telemetry/javascript-approvers we need more reviews here.

@Flarna
Copy link
Member

Flarna commented Dec 5, 2022

finish is not emitted in some cases (e.g. if socket disconnects early for some reason like client abort).

Therefore this results in a behavior change because the hook called now in end might be never called after this PR.
But not sure if this is problematic as the response is likely not sent at all in these cases.

Also the span duration changes. Consider someone passes a large amount of data to res.end(data) on a slow network connection.
Before the span was short, not it's long.
It depends on what one would like to monitor (with/without network time) if the one or the other is better.

@dyladan
Copy link
Member

dyladan commented Dec 5, 2022

finish is not emitted in some cases (e.g. if socket disconnects early for some reason like client abort).

Therefore this results in a behavior change because the hook called now in end might be never called after this PR. But not sure if this is problematic as the response is likely not sent at all in these cases.

Also the span duration changes. Consider someone passes a large amount of data to res.end(data) on a slow network connection. Before the span was short, not it's long. It depends on what one would like to monitor (with/without network time) if the one or the other is better.

Even finish doesn't guarantee the client has received the response yet, just that it has been handed off to the kernel. I think the network time is not fully captured by the finish event.

Previously we were ending the response when the response.end was called, which would I guess correspond with the finish of what you might call the "processing" stage of a request. The finish callback would include the time to hand data to the kernel which might be a little longer but I wouldn't expect to be a lot longer right?

@Flarna
Copy link
Member

Flarna commented Dec 6, 2022

I haven't tested but I guess the diff can be significant if you pass a large chunk to end or write very fast to a stream and the JS stream buffers a lot data (not the kernel).
With processing time I mean the time one needs in his user code to receive the request, process it and prepare the result. So that part which is in the direct hand of the application programmer.

@legendecas
Copy link
Member Author

I haven't tested but I guess the diff can be significant if you pass a large chunk to end or write very fast to a stream and the JS stream buffers a lot data (not the kernel).
With processing time I mean the time one needs in his user code to receive the request, process it and prepare the result. So that part which is in the direct hand of the application programmer.

To track user code time, people (usually web frameworks instrumentation) can create nested spans (like layers and middlewares) to track exact processing time.

As the 'finish' event is an indicator when the in-process buffer is flushed, this is helpful to investigate the duration of the response body to be written to the kernel. So I'd find longer HTTP span duration for large chunks is the expected behavior.

@legendecas
Copy link
Member Author

finish is not emitted in some cases (e.g. if socket disconnects early for some reason like client abort).
Therefore this results in a behavior change because the hook called now in end might be never called after this PR.
But not sure if this is problematic as the response is likely not sent at all in these cases.

IIUC, in the case of connection abort after response.end is called, one of the events 'error' or 'finish' should be emitted exactly once. Would you mind sharing the case where either 'error' or 'finish' is not emitted?

@Flarna
Copy link
Member

Flarna commented Dec 6, 2022

const http = require("http");

http.createServer((req, res) => {
    console.log("onRequest")
    req.on("error", (e) => console.log("REQ error:", e.message));
    req.on("aborted", () => console.log("REQ aborted"));
    req.on("close", () => console.log("REQ close"));
    req.on("end", () => console.log("REQ end"));
    
    res.on("error", (e) => console.log("RES error:", e.message));
    res.on("close", () => console.log("RES close"));
    res.on("finish", () => console.log("RES finish"));

    setTimeout(() => {
        console.log("calling end")
        res.end();
    }, 2000);
}).listen(8000);

const cltReq = http.get("http://localhost:8000");
cltReq.on("error", () => {});
setTimeout(() => cltReq.destroy(), 1000);

with node 18.12.1 on windows this results in

onRequest
REQ aborted
RES close
REQ error: aborted
REQ close
calling end

==> no finsish nor an error event on response but a close event
==> error event is on request

@legendecas
Copy link
Member Author

@Flarna thanks for sharing. Seems like we should distinguish early 'close' events without response.end() and 'finish'/'error' events (not sure if this fits in the spec statement https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md#:~:text=network%20error%20receiving%20the%20response%20body). I'll update this PR to reflect this behavior.

# Conflicts:
#	experimental/CHANGELOG.md
#	experimental/packages/opentelemetry-instrumentation-http/src/http.ts
#	experimental/packages/opentelemetry-instrumentation-http/test/functionals/http-enable.test.ts
@legendecas
Copy link
Member Author

@Flarna updated with test coverage on premature closes. PTAL again, thank you!

@legendecas legendecas merged commit faa0a33 into open-telemetry:main Dec 20, 2022
@legendecas legendecas deleted the instrumentation/http-end branch December 20, 2022 16:47
fuaiyi pushed a commit to fuaiyi/opentelemetry-js that referenced this pull request Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add responseBodyHook for http request and response
4 participants