-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pymongo support #65
Conversation
@haotianw465 Hope you could provide some suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which python version is this patcher tested under? It is OK to only test one version but it is good to point out in the doc which version this library patch supports.
aws_xray_sdk/ext/pymongo/patch.py
Outdated
host, port = event.connection_id | ||
subsegment = xray_recorder.begin_subsegment( | ||
f'MongoDB-{host}:{port}', 'remote') | ||
subsegment.put_annotation('command_name', event.command_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X-Ray service aggregates annotations at trace level when indexing. If another backend DB has a subsegment which has the same key, one will be overridden which is not what we want. A prefix mongodb
is probably a better option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when a segment has multiple pymongo calls?
aws_xray_sdk/ext/pymongo/patch.py
Outdated
subsegment = xray_recorder.begin_subsegment( | ||
f'MongoDB-{host}:{port}', 'remote') | ||
subsegment.put_annotation('command_name', event.command_name) | ||
subsegment.put_annotation('database_name', event.database_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you put database name apart from the subsegment name? The common practice for current database clients is to use database_name@database_url
as the subsegment name since it provides proper level of aggregation. You can see an example here: https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-sql. Please let me know if you have any particular reason to only name based on the host url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A remote
subsegment will appear as a vertex in the X-Ray graph. I would imagine each vertex as one (logical) machine, or one service. If we use database_name@database_url
, multiple databases in one MongoDB service will present as multiple vertexes, which may cause confusion. A subsegment naming MongoDB-192.168.1.100:27017
will aggregate all requests to the same service, which is more convincing to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SQL database capturing all of our SDKs have been using naming scheme name@url
and it works well so far. I'm not sure about the MondoDB use case but my concern is that for certain database that is slow, aggregating at host level would mask real issue or making debugging difficult as you need to drill down further to figure out if the issue is on the service level or database level.
Having said that, this is just something you might want to consider based on other db tracing we've been providing for a long time. Since this is the first PR support on MongoDB using host name might be a better practice for most MongoDB users. Please let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine to follow existing SQL DB conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep the port in the name, though, as in my case there are multiple MongoDB servers in the same host.
aws_xray_sdk/ext/pymongo/patch.py
Outdated
def failed(self, event): | ||
subsegment = xray_recorder.current_subsegment() | ||
subsegment.add_fault_flag() | ||
subsegment.put_metadata('failure', event.failure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the failure event type be pickled all the time? The pymongodb
API reference doc doesn't have very clear information based on what I looked at here: http://api.mongodb.com/python/current/api/pymongo/monitoring.html#pymongo.monitoring.CommandFailedEvent. If not then the failure event serialization might fail randomly at the runtime. See more information here: https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled. You could consider manually serialize it before add it to the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I digged into PyMongo's implementation. Yes it is unspecified in the documentation, but I suspect it may not change for no reason.
The failure
field is assigned when PyMongo receives a remote error, in format of BSON. PyMongo decodes it into a python dict
and wrap into an OperationError
.
Anyway, the failure
property contains a dict
-typed BSON Object, and is pickle-able.
I did an experiment:
try:
a = client.a.command('')
except pymongo.errors.OperationFailure as err:
d = err.details
print(d, type(d), d.keys())
Result:
{'ok': 0.0, 'errmsg': 'no such command: \'\', bad cmd: \'{ : 1, $readPreference: { mode: "secondaryPreferred" }, $db: "a" }\'', 'code': 59, 'codeName': 'CommandNotFound'} <class 'dict'> dict_keys(['ok', 'errmsg', 'code', 'codeName'])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this since official doc is not clear about the failure model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, in the documents there may be pymongo types bson.son.SON
and bson.objectid.ObjectId
. They can be pickled, but how will X-Ray backend handle them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve. TravisCI is failing so I'm holding off the merge until it gets fixed, although it is not likely related to this PR.
It seems to be a timeout issue..? Would you like to rerun the CI? |
I will post a fix soon. It's related to a dep config issue in |
adds pymongo support.
Treats each mongo invocation as a remote subsegment.
name:
annotations: