Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] About support multi threads and auto finalize for skywalking-rust. #9297

Closed
3 tasks done
jmjoy opened this issue Jul 1, 2022 · 25 comments
Closed
3 tasks done
Assignees
Labels
feature New feature rust Rust SDK
Milestone

Comments

@jmjoy
Copy link
Member

jmjoy commented Jul 1, 2022

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Now the skywalking-rust only support one thread tracing, but usually there are some scenes that require multi threads/coroutines tracing.

Use case

Rust application requires multi threads and coroutines tracing.

Related issues

apache/skywalking-rust#20 (comment)
apache/skywalking-rust#26
apache/skywalking-rust#28
apache/skywalking-rust#29

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@jmjoy jmjoy added the feature New feature label Jul 1, 2022
@jmjoy
Copy link
Member Author

jmjoy commented Jul 1, 2022

I think the new api design is (fake code):

use skywalking_rust::context::trace_context::TracingContext;
use skywalking_rust::reporter::grpc::Reporter;
use tokio;

static TRACING_MANAGER: Lazy<TracingManager> = Lazy::new(|| TracingManager::default("svc", "ins"));

async fn handle_request(reporter: ContextReporter) {
    let mut ctx = TRACING_MANAGER.create_context();

    {
        // Generate an Entry Span when a request
        // is received. An Entry Span is generated only once per context.
        let span = ctx.create_entry_span("operation1").unwrap();

        // Something...

        {
            // Generates an Exit Span when executing an RPC.
            let span2 = ctx.create_exit_span("operation2").unwrap();
            
            // Something...

            // Drop span2 and finalize to ctx.
        }

        // Drop span and finalize to ctx.
    }

    // Create coroutine to handle async task.
    spawn(async move {
        let mut ctx2 = TRACING_MANAGER.create_context();

        {
            let span3 = ctx2.create_entry_span("operation3").unwrap();

            // Something...

            // Drop span2 and finalize to ctx2.
        }

        // Drop ctx2 and send to TRACING_MANAGER.
    });

    // Drop ctx and send to TRACING_MANAGER.
}

#[tokio::main]
async fn main() {
    let tx = Reporter::start("http://0.0.0.0:11800").await;

    // Start server...

    // Block to report.
    loop {
        let context = TRACING_MANAGER.receive_context();
        reporter.send(context).await;
    }
}

Add the TracingManager which should be singleton in the application, to manage the contexts, and use notify mechanism for reporting.

@wu-sheng wu-sheng added the rust Rust SDK label Jul 1, 2022
@wu-sheng wu-sheng added this to the Rust - 0.2.0 milestone Jul 1, 2022
@wu-sheng
Copy link
Member

wu-sheng commented Jul 1, 2022

Do you want to implement ctx#capture and ctx2#continous in these APIs, which could link ctx and ctx in a whole trace although they are created in two threads originally?

@jmjoy
Copy link
Member Author

jmjoy commented Jul 1, 2022

Do you want to implement ctx#capture and ctx2#continous in these APIs, which could link ctx and ctx in a whole trace although they are created in two threads originally?

OK, let me think about the design of API.

@jmjoy
Copy link
Member Author

jmjoy commented Jul 1, 2022

What is the purpose of ctx#capture and ctx2#continous? Is there other language can be referenced? I just think it can add a parameter like TRACING_MANAGER.create_context(parent_context).

@lujiajing1126
Copy link
Contributor

lujiajing1126 commented Jul 1, 2022

What is the purpose of ctx#capture and ctx2#continous? Is there other language can be referenced? I just think it can add a parameter like TRACING_MANAGER.create_context(parent_context).

Every Segment in SkyWalking exists in a single thread while a transaction/trace may consist of multiple operations across threads. To rebuild the whole trace in the backend (i.e. OAP), we have to link segments together with the help of contextual information, e.g. traceID, segmentID, spanID and user-defined correlation context.

Context.capture and Context. continous are key methods to capture the "context" in a thread and restore it in another thread.

You may refer to java impl, https://github.com/apache/skywalking-java/blob/main/apm-sniffer/apm-agent-core/src/main/java/org/apache/skywalking/apm/agent/core/context/ContextManager.java#L157

@wu-sheng
Copy link
Member

wu-sheng commented Jul 1, 2022

What is the purpose of ctx#capture and ctx2#continous? Is there other language can be referenced? I just think it can add a parameter like TRACING_MANAGER.create_context(parent_context).

The parent segment is not a very accurate concept, as there are many spans in current and parent context, you can't know which span of ctx is referring another span in ctx2.

This doc may help you understanding this concept, https://skywalking.apache.org/docs/skywalking-java/latest/en/setup/service-agent/java-agent/java-plugin-development-guide/#contextsnapshot.

In the protocol, the span's reference doesn't only represent the peer of an RPC/MQ, it also could represent a cross-thread reference.

https://github.com/apache/skywalking-data-collect-protocol/blob/906a834dc0c3fb66719b8313f8f9298be3a27349/language-agent/Tracing.proto#L125-L128

https://github.com/apache/skywalking-data-collect-protocol/blob/906a834dc0c3fb66719b8313f8f9298be3a27349/language-agent/Tracing.proto#L79-L82

@jmjoy
Copy link
Member Author

jmjoy commented Jul 3, 2022

I'm studying capture and continous through Java Kafka. I've opened foo and bar consumers, and produced messages through the request of /mq2.
image

I've determined through breakpoint debugging that the application has called to generate LocalSpan of Kafka/Producer/Callback.
image

But I'm confused that LocalSpan of Kafka/Producer/Callback is not displayed on the interface.

image

@wu-sheng
Copy link
Member

wu-sheng commented Jul 3, 2022

Maybe some visualization bug? I am not sure about this.
Because this span has two references, one is RPC peer, the other is from the local span's snapshot.

If you purely want to check capture and continuous, all async HTTP call or callback should include this case.

@jmjoy
Copy link
Member Author

jmjoy commented Jul 3, 2022

Maybe some visualization bug? I am not sure about this. Because this span has two references, one is RPC peer, the other is from the local span's snapshot.

If you purely want to check capture and continuous, all async HTTP call or callback should include this case.

Yes, I switch to using the old 8.x UI, and the Kafka/Producer/Callback is shown.

@wu-sheng
Copy link
Member

wu-sheng commented Jul 3, 2022

FYI @Fine0830 We may miss mutiple references in the span page in 9.x

@lujiajing1126
Copy link
Contributor

Maybe some visualization bug? I am not sure about this. Because this span has two references, one is RPC peer, the other is from the local span's snapshot.

If you purely want to check capture and continuous, all async HTTP call or callback should include this case.

I've confirmed the issue with both 9.0.0 and 9.1.0.

The issue is caused by EndpointDepFromCrossThreadAnalysisListener introduced in v9, NPE is thrown during the analysis , then the segment will not be saved to the storage.

org.apache.skywalking.oap.server.receiver.trace.provider.handler.v8.grpc.TraceSegmentReportServiceHandler - 75 [grpcServerPool-1-thread-14] ERROR [] - null
java.lang.NullPointerException: null
	at org.apache.skywalking.oap.server.core.source.EndpointRelation.prepare(EndpointRelation.java:101) ~[server-core-9.1.0.jar:9.1.0]
	at org.apache.skywalking.oap.server.core.analysis.DispatcherManager.forward(DispatcherManager.java:59) ~[server-core-9.1.0.jar:9.1.0]
	at org.apache.skywalking.oap.server.core.source.SourceReceiverImpl.receive(SourceReceiverImpl.java:36) ~[server-core-9.1.0.jar:9.1.0]
	at org.apache.skywalking.oap.server.analyzer.provider.trace.parser.listener.EndpointDepFromCrossThreadAnalysisListener.lambda$build$0(EndpointDepFromCrossThreadAnalysisListener.java:130) ~[agent-analyzer-9.1.0.jar:9.1.0]
	at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_332]
	at org.apache.skywalking.oap.server.analyzer.provider.trace.parser.listener.EndpointDepFromCrossThreadAnalysisListener.build(EndpointDepFromCrossThreadAnalysisListener.java:124) ~[agent-analyzer-9.1.0.jar:9.1.0]
	at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_332]
	at org.apache.skywalking.oap.server.analyzer.provider.trace.parser.TraceAnalyzer.notifyListenerToBuild(TraceAnalyzer.java:75) ~[agent-analyzer-9.1.0.jar:9.1.0]
	at org.apache.skywalking.oap.server.analyzer.provider.trace.parser.TraceAnalyzer.doAnalysis(TraceAnalyzer.java:71) ~[agent-analyzer-9.1.0.jar:9.1.0]
	at org.apache.skywalking.oap.server.analyzer.provider.trace.parser.SegmentParserServiceImpl.send(SegmentParserServiceImpl.java:40) ~[agent-analyzer-9.1.0.jar:9.1.0]
	at org.apache.skywalking.oap.server.receiver.trace.provider.handler.v8.grpc.TraceSegmentReportServiceHandler$1.onNext(TraceSegmentReportServiceHandler.java:72) [skywalking-trace-receiver-plugin-9.1.0.jar:9.1.0]
	at org.apache.skywalking.oap.server.receiver.trace.provider.handler.v8.grpc.TraceSegmentReportServiceHandler$1.onNext(TraceSegmentReportServiceHandler.java:63) [skywalking-trace-receiver-plugin-9.1.0.jar:9.1.0]
	at io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262) [grpc-stub-1.46.0.jar:1.46.0]
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:318) [grpc-core-1.46.0.jar:1.46.0]
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:301) [grpc-core-1.46.0.jar:1.46.0]
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834) [grpc-core-1.46.0.jar:1.46.0]
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) [grpc-core-1.46.0.jar:1.46.0]
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) [grpc-core-1.46.0.jar:1.46.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_332]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_332]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]

@wu-sheng
Copy link
Member

wu-sheng commented Jul 3, 2022

@lujiajing1126 Could you be more specific what causes the NPE?

@lujiajing1126
Copy link
Contributor

lujiajing1126 commented Jul 3, 2022

@lujiajing1126 Could you be more specific what causes the NPE?

image

Here sourceEndpointOwnerServiceName is not null ("kafka-scenario"), but sourceEndpointOwnerServiceLayer is null.

serviceLayer.isNormal() will report NPE,

image

I suppose, we miss a line here,

image

We have to add sourceBuilder.setSourceEndpointOwnerServiceLayer(Layer.GENERAL); after line 95 just as we have done in the RPCAnalysisListener.java

@wu-sheng
Copy link
Member

wu-sheng commented Jul 3, 2022

Could you send a pull request to fix this?

@lujiajing1126
Copy link
Contributor

Yes, I switch to using the old 8.x UI, and the Kafka/Producer/Callback is shown.

Hi @jmjoy, the bugfix has been merged to the master branch. You may have a try.

@wu-sheng wu-sheng modified the milestones: Rust - 0.2.0, Rust - 0.3.0 Jul 4, 2022
@jmjoy
Copy link
Member Author

jmjoy commented Jul 5, 2022

I just tested it and found it to be fixed.

Yes, I switch to using the old 8.x UI, and the Kafka/Producer/Callback is shown.

Hi @jmjoy, the bugfix has been merged to the master branch. You may have a try.

I just tested it and found it to be fixed.

@jmjoy
Copy link
Member Author

jmjoy commented Jul 10, 2022

Add Tracer: apache/skywalking-rust#26

@wu-sheng
Copy link
Member

 Auto finalize context and span when dropped. apache/skywalking-rust#28

@wu-sheng
Copy link
Member

We only have cross threads left for this issue, right?

@jmjoy
Copy link
Member Author

jmjoy commented Jul 16, 2022

We only have cross threads left for this issue, right?

Yes.

@wu-sheng
Copy link
Member

Add context capture and continued methods. through apache/skywalking-rust#29

@wu-sheng
Copy link
Member

@jmjoy Should we consider a 0.3.0 release? Any other things you may add?

In the future, I think log collecting and meter(Metrics) reports are good choices.

@jmjoy
Copy link
Member Author

jmjoy commented Jul 19, 2022

@jmjoy Should we consider a 0.3.0 release? Any other things you may add?

In the future, I think log collecting and meter(Metrics) reports are good choices.

No problem, future changes can be put into 0.4.0.

@wu-sheng
Copy link
Member

@jmjoy I could start the release, could you update the version declare with e2e dependency?

@jmjoy
Copy link
Member Author

jmjoy commented Jul 19, 2022

@jmjoy I could start the release, could you update the version declare with e2e dependency?

OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature rust Rust SDK
Projects
None yet
Development

No branches or pull requests

3 participants