Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python celery plugin #125
Python celery plugin #125
Changes from all commits
7c564db
02058ab
c2e6b76
c41df48
dbef873
8371f01
b0ccc45
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tom-pytel I missed this in this PR, but this makes
requests
a mandatory dependency of skywalking-python, please also take a look at apache/skywalking#7282 thatrequests
depends on a LGPL licensed dependency that we cannot ship with in ASF project. As we have this inextras_require/http
, can we just remove this? When users want to usehttp
protocol, they can use something likepip install skywalking-python[http]
.FYI @wu-sheng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a plugin requires an agent-level dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We support grpc and http protocols, for http protocol, we use
requests
to send http requests, as we use grpc as default protocol and http is optional (can be installed bypip install skywalking-python[http]
), I think @tom-pytel missed that and wanted to test http protocol so he add the dependency hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, get it. It is glad we don't really depend on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure if it is problematic then we remove it from the required dependencies if the license will cause problems. I could also look into using a different communication method like
urllib.request
orurllib3.request
?As for http protocol, we are doing stress testing here and finding that grpc is not entirely reliable and the http protocol is actually a lot more stable. Not sure why this is happening, maybe grpc is not configured correctly or the timeouts are causing problems. But the main result is that you should consider the http protocol a little more than just optional at this point since it is capable of working in scenarios for us where grpc breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tom-pytel Could you share how you test the performance in another separate issue? From the last several weeks' perf tests, the JSON really doesn't have good performance from a Java perspective, tested in the OAP backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make gRPC work in fork? Like I said in DM, recreate a brand new gRPC channel in forked process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried closing down channel and recreating in both parent and child after fork. It is possible I did not do it right since I am not grpc expert but I got one of two results:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to close parent channel and create in child process by reusing the agent in parent. What I meant is to start another independent agent in child process and leave the parent one there because there may be other things that may need to be traced in parent process. Can you take a look at
skywalking-python/skywalking/trace/ipc/process.py
Lines 30 to 34 in c733985
... and see whether that helps, it is generally what I propose to do in forked processes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I your current implementation, when new processes are spawned, the agent in parent process takes no effect then, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried several things like:
GrpcServiceManagementClient
andGrpcTraceSegmentReportService
in child.close()
, useunsubscribe()
.unsubscribe()
thenclose()
before fork or after in child.I also forgot to mention there was a third result I was getting sometime, deadlock hang. It is possible I missed some permutations or a different function to call, but in general researching python grpc with multiprocessing on the the net I found basically the following answers, either 1. "don't do it", or 2. "grpc must be started only after forking everything, then maybe it will work". Here are some links:
googleapis/synthtool#902
https://stackoverflow.com/questions/62798507/why-multiprocess-python-grpc-server-do-not-work
So as I said, it may be possible but I have not hit on how to do it. If you want to give it a shot I will add a simple test scrip to the end of this message. I also didn't test anything with Kafka and assume it will not work correctly forking until someone validates that.
As for current higher level flow, keep in mind it can be modified in the future according to what protocol is in use, but for now - Nothing special is done before fork or after in the parent. In those cases all threads and sockets and locks continue operating as if nothing had happened. In the child, new report and heartbeat threads are started since threads don't survive into children. And specifically in the http protocol, the duplicated sockets are closed and new ones are opened on next heartbeat or report.
There is a potential problem with the
__queue
object as a thread may have been holding an internal lock on it before fork and since that thread is no longer present the queue will remain in a locked state. Not sure how to resolve this yet, but it should be a very rare event. Even rarer may be the same lock problem with the__finished
event, but I wouldn't expect that to happen basically ever.Right now I have other stuff on my plate but if you have any suggestions on what to try I may revisit this at some point in the future. Or if you ant to try yourself, here is a test script:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But also, the missing
failed to install plugin sw_celery
tells me you are not running this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was running on master branch, not this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same result with or without a 2 sec delay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried a few more times, with upstream/master, and still bad results. I did get one run where I got all 4 spans but the rest of the runs were 3 spans with a couple of deadlocks. Apart from that, upstream/master can not possibly run correctly in a multiprocessing scenario because on fork() no other threads are duplicated in the child (like report or heartbeat), they need to be explicitly recreated in a fork child (which I do in this PR).
I don't have time allocated now to look into the grpc issue but I do know that http protocol in this PR works with fork() for sure. So how do you want to proceed? I could remove that warning message if you want, or change it to something a little less absolute like "fork() may not work correctly with grpc protocol"? But in general this PR does not change anything about how grpc worked before, just fixes the http protocol and adds restart of report and heartbeat threads in fork() child. And also the celery plugin of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this is not the end of the road though. Our internal stress tests show problems with spans mixing or disappearing so I need to go back into core functionality and fix all that. Maybe overhaul how span context is tracked like in the Node agent (especially since async wansn't originally a design consideration in this agent). So treat this PR as a single step towards getting all that fixed.