You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today simpleflow only logs to stdout and with little contextual information on the performed actions. It falls flat when trying to identify logs from a specific workflow or activity at a given point in time, given that machines often perform different tasks simultaneously.
I'd like to add a second handler, optional, to a remote syslog target (think rsyslog remote server, or a hosted service such as logmatic or papertrail).
It would log things in the following format (please criticize):
# for tasks performed "inside" a workflow
<workflow_id>: <activity|decision>#<started_event_id> <log level> pid=<pid> <message>
# for tasks performed "outside" a workflow (workers initialization, etc.)
<message>
# or (tell me?)
simpleflow: <message>
The result would look like this in papertrail or your favorite syslog server:
May 14 10:22:19 127.0.0.1 logger: INFO pid=3817 using task list basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db
May 14 10:22:19 127.0.0.1 logger: INFO pid=3823 starting <bound method ActivityPoller.start of ActivityPoller(domain=TestDomain, task_list=basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db)>
May 14 10:22:19 127.0.0.1 logger: INFO pid=3822 starting <bound method DeciderPoller.start of DeciderPoller(TestDomain, basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db, basic)>
May 14 10:22:19 127.0.0.1 logger: INFO pid=3824 starting ActivityPoller(task_list=basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db) on domain TestDomain
May 14 10:22:19 127.0.0.1 logger: INFO pid=3825 starting DeciderPoller(workflow=basic) on domain TestDomain
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#3 INFO pid=3826 taking decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#3 INFO pid=3826 execution context from decider: {'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'name': u'basic', 'run_id': u'22ukk/DSnnnnnnnnnS5OR7mOxdUg1Y=', 'parent_workflow_id': None, 'tag_list': [u'a=1', u'b=foo'], 'parent_run_id': None, 'domain_name': u'TestDomain', 'continued_execution_run_id': None}
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#3 INFO pid=3826 1 open activities (1 decisions)
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#3 INFO pid=3826 completing decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: activity#6 INFO pid=3827 activity context: {'activity_id': u'activity-examples.basic.increment-1', 'name': u'examples.basic.increment', 'run_id': u'22ukk/DSnnnnnnnnnS5OR7mOxdUg1Y=', 'domain_name': u'TestDomain', 'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'input': u'{"args":[1],"kwargs":{}}'}
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#9 INFO pid=3828 taking decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#9 INFO pid=3828 execution context from decider: {'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'name': u'basic', 'run_id': u'22ukk/DSnnnnnnnnnS5OR7mOxdUg1Y=', 'parent_workflow_id': None, 'tag_list': [u'a=1', u'b=foo'], 'parent_run_id': None, 'domain_name': u'TestDomain', 'continued_execution_run_id': None}
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#9 INFO pid=3828 2 open activities (2 decisions)
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#9 INFO pid=3828 completing decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 taking decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 execution context from decider: {'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'name': u'basic', 'run_id': u'22ukk/DSnnnnnnnnnS5OR7mOxdUg1Y=', 'parent_workflow_id': None, 'tag_list': [u'a=1', u'b=foo'], 'parent_run_id': None, 'domain_name': u'TestDomain', 'continued_execution_run_id': None}
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 result of (1 + 1) * 2 = 4
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 1 open activities (0 decisions)
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 completing decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:50 127.0.0.1 basic-bcb8a2b0092072ef: decision#21 INFO pid=3843 taking decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:50 127.0.0.1 basic-bcb8a2b0092072ef: decision#21 INFO pid=3843 execution context from decider: {'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'name': u'basic', 'run_id': u'22u4Mb2YpWB4lVc4ZabGNkk/DSPnRMocS5OR7mOxdUg1Y=', 'parent_workflow_id': None, 'tag_list': [u'a=1', u'b=foo'], 'parent_run_id': None, 'domain_name': u'TestDomain', 'continued_execution_run_id': None}
May 14 10:22:50 127.0.0.1 basic-bcb8a2b0092072ef: decision#21 INFO pid=3843 result of (1 + 1) * 2 = 4
May 14 10:22:50 127.0.0.1 basic-bcb8a2b0092072ef: decision#21 INFO pid=3843 completing decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:51 127.0.0.1 logger: INFO pid=3823 process: caught signal signal=SIGTERM pid=3823
May 14 10:22:51 127.0.0.1 logger: INFO pid=3823 process: will stop workers, this might take up several minutes. Please, be patient.
May 14 10:22:51 127.0.0.1 logger: INFO pid=3823 process: sending SIGTERM to pid=3824
May 14 10:22:51 127.0.0.1 logger: INFO pid=3823 process: waiting for proces=psutil.Process(pid=3824, name='Python') to finish.
May 14 10:22:51 127.0.0.1 logger: INFO pid=3824 process: caught signal signal=SIGTERM pid=3824
May 14 10:22:51 127.0.0.1 logger: INFO pid=3824 stopping ActivityPoller(task_list=basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db)
Remarks:
here logger as the default program name, and some logs for a basic-bcb8a2b0092072ef workflow
domain is supposed to be obvious ; in Botify case domain matches the "environment" and given a specific machine, we know the SWF domain it uses directly ; YMMV
we could log a handful of other things but logs become very noisy and hard to read by a human (task list, activity id, activity name, etc.) => better have it in a tool above that, and this can always be changed later by making the logging pattern more flexible
I didn't test how bare client logging.getLogger().info("foo") behave, especially in pypy forks when using simpleflow.execute
I tested it with logging.handlers.SysLogHandler which has the following limitations: silent errors when logging with TCP (or maybe papertrail's fault?), explicit errors on UDP when above the max packet size (~9k bytes on my machine, but depends on MTU and remote syslog server I guess) ; no TCP+TLS support
logging with syslog introduce another limitation, that "program name" is the "workflow id" in our case ; in syslog program name is limited to 64 chars, but workflow id in SWF is limited to 256 ; in practice we don't have much workflows over 80 chars at Botify and they're all unique when truncated to 52 chars, but YMMV
the implementation would rely on environment variables which are 1/ non-intrusive and 2/ easy to retrieve in subprocesses, optionally not from simpleflow: any "poll_for_(activity|decision)_task" call would set a few environment variables, that the logger or any other program could use
Thoughts?
The text was updated successfully, but these errors were encountered:
Today simpleflow only logs to stdout and with little contextual information on the performed actions. It falls flat when trying to identify logs from a specific workflow or activity at a given point in time, given that machines often perform different tasks simultaneously.
I'd like to add a second handler, optional, to a remote syslog target (think rsyslog remote server, or a hosted service such as logmatic or papertrail).
It would log things in the following format (please criticize):
The result would look like this in papertrail or your favorite syslog server:
Remarks:
logger
as the default program name, and some logs for abasic-bcb8a2b0092072ef
workflowlogging.getLogger().info("foo")
behave, especially in pypy forks when using simpleflow.executelogging.handlers.SysLogHandler
which has the following limitations: silent errors when logging with TCP (or maybe papertrail's fault?), explicit errors on UDP when above the max packet size (~9k bytes on my machine, but depends on MTU and remote syslog server I guess) ; no TCP+TLS supportThoughts?
The text was updated successfully, but these errors were encountered: