Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gracefully shutdown reporter & add agent-test-tool based e2e case #9

Merged
merged 10 commits into from
Jan 14, 2022

Conversation

tisonkun
Copy link
Member

Signed-off-by: tison wander4096@gmail.com

cc @wu-sheng @Shikugawa

Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun force-pushed the trace-report-example branch from e8dd804 to ee8b736 Compare January 10, 2022 03:06
Signed-off-by: tison <wander4096@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Jan 10, 2022

Codecov Report

Merging #9 (b21e2f4) into master (1465d9f) will decrease coverage by 3.44%.
The diff coverage is 7.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #9      +/-   ##
==========================================
- Coverage   88.60%   85.15%   -3.45%     
==========================================
  Files           9        9              
  Lines         272      283      +11     
==========================================
  Hits          241      241              
- Misses         31       42      +11     
Impacted Files Coverage Δ
src/context/trace_context.rs 69.81% <0.00%> (-6.48%) ⬇️
src/reporter/grpc.rs 0.00% <0.00%> (ø)
src/context/propagation/encoder.rs 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1465d9f...b21e2f4. Read the comment docs.

@tisonkun
Copy link
Member Author

I noticed that we may not use mpsc::channel at all.

Currently, the reporter flushes per context on the streaming rpc. May you provide other languages' reporter implementation as references and I can work for a proper API.

@wu-sheng
Copy link
Member

Hi, if you want self-contained, this is another example. As a client, the official way to test is set up e2e, which this lib is only working as a client.
Self-contained is tricky because without SkyWalking OAP running, this client could send tracing w/ grpc means nothing.

@wu-sheng
Copy link
Member

This is how the test thing works, https://github.com/SkyAPM/cpp2sky/blob/main/.github/workflows/main.yml

@wu-sheng
Copy link
Member

I would like to recommend you to add an e2e test case for this repo, which has more value from our experiences.

@Shikugawa
Copy link
Member

Shikugawa commented Jan 10, 2022

@tisonkun Thank you for awesome contribution! If you are interested in e2e framework, you should see here (This is my experimental work of e2e, but it is not ready for production). It may help you.

Copy link
Member

@Shikugawa Shikugawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure how situation we should use self-contained reporters. Could you show me the usecase for it?

}
}
}
rx.close();
while let Some(message) = rx.recv().await {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be required anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If a select chooses shutdown first, there can be outstanding messages to be processed. We close the rx so that there is no more inflight message, but the outstanding ones should be processed.

See also https://docs.rs/tokio/1.15.0/tokio/sync/mpsc/struct.Receiver.html#method.close for example.

examples/simple_trace_report.rs Show resolved Hide resolved
@tisonkun
Copy link
Member Author

I would like to recommend you to add an e2e test case for this repo, which has more value from our experiences.

That makes sense. I'm confused about how to run the example properly and failed to find a resource to set up the environment.

Thanks for your information. Let me dig it out.

@wu-sheng
Copy link
Member

That makes sense. I'm confused about how to run the example properly and failed to find a resource to set up the environment.

Because you need a backend and UI, https://skywalking.apache.org/docs/main/v8.9.1/readme/.
About the self-hosted, it is not 100% meaningless, but mostly, it just could print the segment values.

@Shikugawa
Copy link
Member

I would like to recommend you to add an e2e test case for this repo, which has more value from our experiences.

That makes sense. I'm confused about how to run the example properly and failed to find a resource to set up the environment.

Thanks for your information. Let me dig it out.

We preapred simple docker-compose envs here.
Also, your change will break e2e code here.

@tisonkun
Copy link
Member Author

Also, your change will break e2e code here.

All checks have passed in CI. How do I verify this sentence?

@tisonkun
Copy link
Member Author

tisonkun commented Jan 10, 2022

@wu-sheng It seems the example can leave as is (without starting a server) and I can verify the graceful shutdown logic by fixing current e2e test.

@wu-sheng
Copy link
Member

@tisonkun E2E has a very specific meaning in the SkyWalking context.

There are 2 kinds of E2E tests.

  1. E2E with SkyWalking backend. (I think this isn't @Shikugawa 's suggestion, just mention it here to make the context clear)
    You could learn this tool, which is an established sub-project for a docker-compose/KinD(k8s) env testing to verify how components work with each other because the SkyWalking ecosystem is super large and complex.

  2. Plugin tests with agent-test-tool, which is a mock OAP server to collect and print data to a YAML for you. Here is how it is used in cpp2sky.


You should follow (2) more for the rust plugin. This test is working as rust-a -> rust-b, and both of them are sending trace data to mock-collector. Then verifying the data from mock-collector with the expected data file you provided.

@tisonkun tisonkun changed the title self-contained example & gracefully shutdown reporter gracefully shutdown reporter Jan 10, 2022
@tisonkun
Copy link
Member Author

@wu-sheng @Shikugawa I can try to fix and verify e2e failure locally, but it seems our CI doesn't check the e2e test. Shall we add it also?

@wu-sheng
Copy link
Member

Yes, e2e is a new case.

Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun force-pushed the trace-report-example branch from bc855b3 to b8fc551 Compare January 10, 2022 15:34
@tisonkun tisonkun marked this pull request as draft January 11, 2022 03:09
@tisonkun tisonkun force-pushed the trace-report-example branch 4 times, most recently from 11bc092 to 7f46110 Compare January 13, 2022 06:46
@tisonkun tisonkun force-pushed the trace-report-example branch 2 times, most recently from 515a14a to 716bdb4 Compare January 13, 2022 13:52
@tisonkun tisonkun force-pushed the trace-report-example branch from 1240d59 to 54cfc17 Compare January 13, 2022 14:27
Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun force-pushed the trace-report-example branch from 54cfc17 to 9b20bb7 Compare January 13, 2022 14:49
@tisonkun
Copy link
Member Author

It seems that depends_on doesn't take effect and consumer/producer runs before collector and panic...I don't know how to process then. If I manually start service parts one by one, the test can pass.

@kezhenxu94
Copy link
Member

kezhenxu94 commented Jan 13, 2022

It seems that depends_on doesn't take effect and consumer/producer runs before collector and panic...I don't know how to process then. If I manually start service parts one by one, the test can pass.

depends_on should take effects, what is out of expectation is that consumer/provider tries to do something before OAP is ready. Please use

depends_on:
  oap:
    condition: service_healthy

And add a health check for OAP.

NOTE: version 3.7 of docker compose file might not accept this syntax but I know 3.8 and 2.1 can.

@wu-sheng
Copy link
Member

I have seen you added a health check. This is important.

Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun force-pushed the trace-report-example branch from d32013f to 86241d4 Compare January 13, 2022 15:14
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Adding healthCheck causes hang without explicit reason.

Given that the consumer started quickly, I tend to revert the healthCheck for now.

Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun marked this pull request as ready for review January 13, 2022 16:04
@tisonkun
Copy link
Member Author

tisonkun commented Jan 13, 2022

Finally I hack through it.

I think this is ready for review, but I still confuse why bbda7cf causes hanging forever :( EDIT: I use a slim container which doesn't have a curl installed >_<

Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun requested a review from Shikugawa January 13, 2022 16:36
pip3 install --upgrade pip
pip3 install setuptools
pip3 install -r requirements.txt
python3 tests/e2e/run_e2e.py --expected_file=tests/e2e/data/expected_context.yaml --max_retry_times=3 --target_path=/ping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find the tests/e2e/data/expected_context.yaml. How does this test pass? Could you check the logs?
Is this a run_e2e.py bug or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already in the codebase here: https://github.com/apache/skywalking-rust/blob/1465d9fc6811dcb009e75f5c1dd2d5b1ef7a52a9/tests/e2e/data/expected_context.yaml

I don't add a new test case but according to your suggestion #9 (comment), enable the existing but not in regression e2e test - it's a plugin test based on agent-test-tool.

@wu-sheng wu-sheng added this to the 0.1.0 milestone Jan 14, 2022
@wu-sheng
Copy link
Member

@Shikugawa Please check the codes.

@wu-sheng
Copy link
Member

@tisonkun Please rename the title to make the changes clear.

@tisonkun tisonkun changed the title gracefully shutdown reporter gracefully shutdown reporter & add agent-test-tool based e2e case Jan 14, 2022
@tisonkun
Copy link
Member Author

@wu-sheng updated. Notify me if it's better to split into two PRs.

@wu-sheng
Copy link
Member

I am fine that these two are in one PR, as this repo is a very early stage.

But I hope you could recheck the test coverage

src/context/trace_context.rs | 69.81% <0.00%> (-6.48%)

I am not sure whether e2e testing could be counted in Rust, but it can be in the Java.

@tisonkun
Copy link
Member Author

tisonkun commented Jan 14, 2022

@wu-sheng I checked the report and don't think it's a real regression:

  1. "Regressions" on impl fmt:Debug. It's meaningless to test a standard lib implementation.
  2. "Regressions" on shutdown/sender. The e2e test should cover it.

Copy link
Member

@Shikugawa Shikugawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thank you for changing!

@wu-sheng wu-sheng merged commit f4743a9 into apache:master Jan 14, 2022
@tisonkun tisonkun deleted the trace-report-example branch January 14, 2022 14:59
@wu-sheng
Copy link
Member

I got a notification that e2e of merged commit has failed in 9s after e2e began.
It makes me feel the health check seems not working well.
I rerun the test, and will see whether it could pass.

@tisonkun
Copy link
Member Author

@wu-sheng @Shikugawa according to the failure report, it seems the received data is unstable (not stably matches the expected data).

https://github.com/apache/skywalking-rust/runs/4818192129?check_suite_focus=true

diff list: 
- meterItems: []
  segmentItems:
  - segmentSize: 1
    segments:
-   - segmentId: '184682932856352416001309273098852514966'
+   - segmentId: not null
      spans:
      - componentId: 11000
-       endTime: 1642171583
+       endTime: gt 0
        isError: false
+       operationId: 0
        operationName: /pong
        parentSpanId: 1
        peer: consumer:8082
        skipAnalysis: false
        spanId: 2
        spanLayer: Http
        spanType: Exit
-       startTime: 1642171583
+       startTime: gt 0
      - componentId: 11000
-       endTime: 1642171583
+       endTime: gt 0
        isError: false
+       operationId: 0
        operationName: /ping
        parentSpanId: 0
        peer: ''
        skipAnalysis: false
        spanId: 1
        spanLayer: Http
        spanType: Entry
-       startTime: 1642171583
+       startTime: gt 0
    serviceName: producer
  - segmentSize: 1
    segments:
-   - segmentId: '251652345953849008728622634953354684403'
+   - segmentId: not null
      spans:
      - componentId: 11000
-       endTime: 1642171583
+       endTime: gt 0
        isError: false
+       operationId: 0
        operationName: /pong
        parentSpanId: 0
        peer: ''
        refs:
        - networkAddress: consumer:8082
          parentEndpoint: /pong
          parentService: producer
          parentServiceInstance: node_0
          parentSpanId: 2
-         parentTraceSegmentId: '184682932856352416001309273098852514966'
+         parentTraceSegmentId: not null
          refType: CrossProcess
-         traceId: '280108941388649570187429178111613482643'
+         traceId: not null
        skipAnalysis: false
        spanId: 1
        spanLayer: Http
        spanType: Entry
-       startTime: 1642171583
+       startTime: gt 0
    serviceName: consumer
Traceback (most recent call last):
  File "tests/e2e/run_e2e.py", line 60, in <module>
    assert res.status_code == 200
AssertionError

@wu-sheng
Copy link
Member

I think the key point is AssertionError, which seems the observed services are not available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants