Python celery plugin #125

tom-pytel · 2021-06-21T16:15:59Z

There are several things here all mixed together because they are required to make celery work. The main problem is that the default celery configuration is to run the backend server via multiprocessing fork() for true concurrency, which this agent was not set up to handle. I added this but unfortunately I could not get grpc working with fork() (does not mean it can't, just I couldn't do it in my restricted timeline), so I fixed a minor bug in the http protocol which now works correctly with fork(). A few more tweaks needed before this can be merged.

NOTE: The changes in context.py are not finished. Entry and exit spans can not indiscriminately reuse each other since they may not be related at all. An explicit inheritance mechanism is needed to indicate which plugins can inherit a span from which others. I will eventually implement this in the same way I did for the Node agent, but for now this works. I will probably eventually implement most of the Node features and cleanups here.

Add a test case for the new plugin
Add a component id in the main repo
Add a logo in the UI repo
Rebuild the requirements.txt by running tools/env/build_requirements_(linux|windows).sh

skywalking/config.py

tom-pytel · 2021-06-21T21:18:01Z

Just noticed Python 3.6 is missing the critical os.register_at_fork(), will probably just not have support for forking under that version then.

tom-pytel · 2021-06-22T13:33:28Z

This should be final-ish, PR good to go. The only thing I didn't do is a test case because those take me way too long due to extremely slow iteration. Our own internal tests for this plugin are good so maybe leave the official SW test for this to an intern?

kezhenxu94 · 2021-06-22T14:42:09Z

skywalking/plugins/sw_celery.py

+            url = urlparse(broker_url)
+            peer = '{}:{}'.format(url.hostname, url.port)
+        else:
+            peer = '???'


Is this the final state?

Shorthand for unknown peer, hostname should never not be present so this is an extreme just-in-case. Want me to change text to something like "unknown host" or something?

kezhenxu94 · 2021-06-23T02:11:26Z

skywalking/agent/__init__.py

+    if config.protocol != 'http':
+        logger.warning('fork() not currently supported with %s protocol' % config.protocol)


Is it possible to make gRPC work in fork? Like I said in DM, recreate a brand new gRPC channel in forked process?

Tried closing down channel and recreating in both parent and child after fork. It is possible I did not do it right since I am not grpc expert but I got one of two results:

Worked in exactly one of the forks, parent or child, but not both.

Didn't work in either.

I didn't mean to close parent channel and create in child process by reusing the agent in parent. What I meant is to start another independent agent in child process and leave the parent one there because there may be other things that may need to be traced in parent process. Can you take a look at

skywalking-python/skywalking/trace/ipc/process.py

Lines 30 to 34 in c733985

def run(self):

if agent.started() is False:

config.deserialize(self._sw_config)

agent.start()

super(SwProcess, self).run()

... and see whether that helps, it is generally what I propose to do in forked processes?

I your current implementation, when new processes are spawned, the agent in parent process takes no effect then, right?

I didn't mean to close parent channel and create in child process by reusing the agent in parent. What I meant is to start another independent agent in child process and leave the parent one there because there may be other things that may need to be traced in parent process. Can you take a look at

I tried several things like:

Not doing anything before the fork then creating new GrpcServiceManagementClient and GrpcTraceSegmentReportService in child.

The above but closing channel in child before creating new.

Closing the channel before fork then recreating in both parent and child.

Instead of close(), use unsubscribe().

Both unsubscribe() then close() before fork or after in child.

I did also try waiting for empty queue before allowing fork to proceed but that was unnecessary as I wasn't even sending anything before the fork, just for form.

I also forgot to mention there was a third result I was getting sometime, deadlock hang. It is possible I missed some permutations or a different function to call, but in general researching python grpc with multiprocessing on the the net I found basically the following answers, either 1. "don't do it", or 2. "grpc must be started only after forking everything, then maybe it will work". Here are some links:

googleapis/synthtool#902
https://stackoverflow.com/questions/62798507/why-multiprocess-python-grpc-server-do-not-work

So as I said, it may be possible but I have not hit on how to do it. If you want to give it a shot I will add a simple test scrip to the end of this message. I also didn't test anything with Kafka and assume it will not work correctly forking until someone validates that.

As for current higher level flow, keep in mind it can be modified in the future according to what protocol is in use, but for now - Nothing special is done before fork or after in the parent. In those cases all threads and sockets and locks continue operating as if nothing had happened. In the child, new report and heartbeat threads are started since threads don't survive into children. And specifically in the http protocol, the duplicated sockets are closed and new ones are opened on next heartbeat or report.

There is a potential problem with the __queue object as a thread may have been holding an internal lock on it before fork and since that thread is no longer present the queue will remain in a locked state. Not sure how to resolve this yet, but it should be a very rare event. Even rarer may be the same lock problem with the __finished event, but I wouldn't expect that to happen basically ever.

Right now I have other stuff on my plate but if you have any suggestions on what to try I may revisit this at some point in the future. Or if you ant to try yourself, here is a test script:

import multiprocessing as mp import time from skywalking.trace.context import get_context from skywalking import agent, config config.init(collector='127.0.0.1:11800', service='your awesome service') agent.start() def foo(): with get_context().new_local_span('child before error'): pass time.sleep(2) # this needed to flush send because python doesn't run atexit handlers on exit in forked children # import atexit # atexit._run_exitfuncs() if __name__ == '__main__': p = mp.Process(target = foo, args = ()) with get_context().new_local_span('parent before start'): pass p.start() time.sleep(1) with get_context().new_local_span('parent after start'): pass p.join() with get_context().new_local_span('parent after join'): pass

But also, the missing failed to install plugin sw_celery tells me you are not running this PR.

But also, the missing failed to install plugin sw_celery tells me you are not running this PR.

I was running on master branch, not this PR

Same result with or without a 2 sec delay

I have tried a few more times, with upstream/master, and still bad results. I did get one run where I got all 4 spans but the rest of the runs were 3 spans with a couple of deadlocks. Apart from that, upstream/master can not possibly run correctly in a multiprocessing scenario because on fork() no other threads are duplicated in the child (like report or heartbeat), they need to be explicitly recreated in a fork child (which I do in this PR).

I don't have time allocated now to look into the grpc issue but I do know that http protocol in this PR works with fork() for sure. So how do you want to proceed? I could remove that warning message if you want, or change it to something a little less absolute like "fork() may not work correctly with grpc protocol"? But in general this PR does not change anything about how grpc worked before, just fixes the http protocol and adds restart of report and heartbeat threads in fork() child. And also the celery plugin of course.

BTW, this is not the end of the road though. Our internal stress tests show problems with spans mixing or disappearing so I need to go back into core functionality and fix all that. Maybe overhaul how span context is tracked like in the Node agent (especially since async wansn't originally a design consideration in this agent). So treat this PR as a single step towards getting all that fixed.

tom-pytel · 2021-07-01T14:41:41Z

@kezhenxu94 I notice you did "Merge branch 'master' into master", does this mean I should squash and merge?

kezhenxu94 · 2021-07-01T14:55:53Z

@kezhenxu94 I notice you did "Merge branch 'master' into master", does this mean I should squash and merge?

Not exactly, I just updated your branch to make sure your branch is up to date and pass CI, I haven't checked details in this PR b/c I'm recently busy at other emergent stuffs, should be able to look into this soon, sorry about that 🙇🏻

tom-pytel · 2021-07-05T14:28:45Z

Sanic 21.0.0+ no longer works with plugin hook method. This PR is starting to get messy...

skywalking/config.py

kezhenxu94

@tom-pytel sorry for the late response and thanks for working on this.

Only some nits. It's acceptable for me that some plugins only work under http protocol, but let's be clear which ones are in this case in the doc, also, I'd rather make grpc as default still as there is only 1 (for now) plugin that is not compatible in grpc. WDYT?

you can merge it after the nits are addressed

skywalking/config.py

docs/Plugins.md

kezhenxu94 · 2021-07-11T13:44:27Z

setup.py

    include_package_data=True,
    install_requires=[
        "grpcio",
        "grpcio-tools",
        "packaging",
+        "requests",


Hi @tom-pytel I missed this in this PR, but this makes requests a mandatory dependency of skywalking-python, please also take a look at apache/skywalking#7282 that requests depends on a LGPL licensed dependency that we cannot ship with in ASF project. As we have this in extras_require/http, can we just remove this? When users want to use http protocol, they can use something like pip install skywalking-python[http].

FYI @wu-sheng

Why a plugin requires an agent-level dependency?

Why a plugin requires an agent-level dependency?

We support grpc and http protocols, for http protocol, we use requests to send http requests, as we use grpc as default protocol and http is optional (can be installed by pip install skywalking-python[http]), I think @tom-pytel missed that and wanted to test http protocol so he add the dependency here

OK, get it. It is glad we don't really depend on it.

Sure if it is problematic then we remove it from the required dependencies if the license will cause problems. I could also look into using a different communication method like urllib.request or urllib3.request?

As for http protocol, we are doing stress testing here and finding that grpc is not entirely reliable and the http protocol is actually a lot more stable. Not sure why this is happening, maybe grpc is not configured correctly or the timeouts are causing problems. But the main result is that you should consider the http protocol a little more than just optional at this point since it is capable of working in scenarios for us where grpc breaks.

@tom-pytel Could you share how you test the performance in another separate issue? From the last several weeks' perf tests, the JSON really doesn't have good performance from a Java perspective, tested in the OAP backend.

tom-pytel added enhancement New feature or request plugin Plugin core labels Jun 21, 2021

tom-pytel added this to the 0.7.0 milestone Jun 21, 2021

tom-pytel requested a review from kezhenxu94 June 21, 2021 16:15

sonatype-lift bot reviewed Jun 21, 2021

View reviewed changes

skywalking/config.py Show resolved Hide resolved

tom-pytel force-pushed the master branch from 790fffe to 1156be7 Compare June 21, 2021 17:49

WIP celery plugin and required core changes

7c564db

tom-pytel force-pushed the master branch from 1156be7 to 7c564db Compare June 21, 2021 21:23

tom-pytel changed the title ~~WIP celery plugin and required core changes~~ Python celery plugin Jun 22, 2021

kezhenxu94 reviewed Jun 23, 2021

View reviewed changes

tweaks and minor fixes

02058ab

tom-pytel force-pushed the master branch from f25844f to 02058ab Compare June 28, 2021 13:10

tom-pytel and others added 2 commits June 30, 2021 11:08

added requests package to setup.py

c2e6b76

Merge branch 'master' into master

c41df48

tom-pytel added 2 commits July 5, 2021 11:24

updated sw_sanic plugin rules

dbef873

Merge remote-tracking branch 'fork/master'

8371f01

sonatype-lift bot reviewed Jul 5, 2021

View reviewed changes

skywalking/config.py Show resolved Hide resolved

kezhenxu94 approved these changes Jul 7, 2021

View reviewed changes

skywalking/config.py Show resolved Hide resolved

docs/Plugins.md Show resolved Hide resolved

doc update

b0ccc45

tom-pytel merged commit 25a5e7d into apache:master Jul 7, 2021

kezhenxu94 reviewed Jul 11, 2021

View reviewed changes

kezhenxu94 mentioned this pull request Jul 11, 2021

[Python] Migrate to the next version of Python requests when released apache/skywalking#7282

Closed

tom-pytel mentioned this pull request Jul 11, 2021

removed requests as required dependency #128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python celery plugin #125

Python celery plugin #125

tom-pytel commented Jun 21, 2021 •

edited

Loading

tom-pytel commented Jun 21, 2021

tom-pytel commented Jun 22, 2021

kezhenxu94 Jun 22, 2021

tom-pytel Jun 23, 2021

kezhenxu94 Jun 23, 2021

tom-pytel Jun 23, 2021

kezhenxu94 Jun 23, 2021

kezhenxu94 Jun 23, 2021

tom-pytel Jun 23, 2021

tom-pytel Jun 24, 2021

kezhenxu94 Jun 28, 2021

kezhenxu94 Jun 28, 2021

tom-pytel Jun 30, 2021

tom-pytel Jun 30, 2021 •

edited

Loading

tom-pytel commented Jul 1, 2021

kezhenxu94 commented Jul 1, 2021 •

edited

Loading

tom-pytel commented Jul 5, 2021

kezhenxu94 left a comment

kezhenxu94 Jul 11, 2021

wu-sheng Jul 11, 2021

kezhenxu94 Jul 11, 2021

wu-sheng Jul 11, 2021

tom-pytel Jul 11, 2021

wu-sheng Jul 11, 2021

		if config.protocol != 'http':
		logger.warning('fork() not currently supported with %s protocol' % config.protocol)

	def run(self):
	if agent.started() is False:
	config.deserialize(self._sw_config)
	agent.start()
	super(SwProcess, self).run()

Python celery plugin #125

Python celery plugin #125

Conversation

tom-pytel commented Jun 21, 2021 • edited Loading

tom-pytel commented Jun 21, 2021

tom-pytel commented Jun 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tom-pytel Jun 30, 2021 • edited Loading

Choose a reason for hiding this comment

tom-pytel commented Jul 1, 2021

kezhenxu94 commented Jul 1, 2021 • edited Loading

tom-pytel commented Jul 5, 2021

kezhenxu94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tom-pytel commented Jun 21, 2021 •

edited

Loading

tom-pytel Jun 30, 2021 •

edited

Loading

kezhenxu94 commented Jul 1, 2021 •

edited

Loading