Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pymongo support #65

Merged
merged 6 commits into from
Aug 10, 2018
Merged

add pymongo support #65

merged 6 commits into from
Aug 10, 2018

Conversation

xplorld
Copy link
Contributor

@xplorld xplorld commented Jun 1, 2018

adds pymongo support.

Treats each mongo invocation as a remote subsegment.

name:

  • 'MongoDB-127.0.0.1:27017'
    annotations:
  • command_name:str
  • database_name: str
  • succeed: bool

@xplorld
Copy link
Contributor Author

xplorld commented Jun 4, 2018

@haotianw465 Hope you could provide some suggestions.

Copy link
Contributor

@haotianw465 haotianw465 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which python version is this patcher tested under? It is OK to only test one version but it is good to point out in the doc which version this library patch supports.

host, port = event.connection_id
subsegment = xray_recorder.begin_subsegment(
f'MongoDB-{host}:{port}', 'remote')
subsegment.put_annotation('command_name', event.command_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X-Ray service aggregates annotations at trace level when indexing. If another backend DB has a subsegment which has the same key, one will be overridden which is not what we want. A prefix mongodb is probably a better option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when a segment has multiple pymongo calls?

subsegment = xray_recorder.begin_subsegment(
f'MongoDB-{host}:{port}', 'remote')
subsegment.put_annotation('command_name', event.command_name)
subsegment.put_annotation('database_name', event.database_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you put database name apart from the subsegment name? The common practice for current database clients is to use database_name@database_url as the subsegment name since it provides proper level of aggregation. You can see an example here: https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-sql. Please let me know if you have any particular reason to only name based on the host url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A remote subsegment will appear as a vertex in the X-Ray graph. I would imagine each vertex as one (logical) machine, or one service. If we use database_name@database_url, multiple databases in one MongoDB service will present as multiple vertexes, which may cause confusion. A subsegment naming MongoDB-192.168.1.100:27017 will aggregate all requests to the same service, which is more convincing to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SQL database capturing all of our SDKs have been using naming scheme name@url and it works well so far. I'm not sure about the MondoDB use case but my concern is that for certain database that is slow, aggregating at host level would mask real issue or making debugging difficult as you need to drill down further to figure out if the issue is on the service level or database level.

Having said that, this is just something you might want to consider based on other db tracing we've been providing for a long time. Since this is the first PR support on MongoDB using host name might be a better practice for most MongoDB users. Please let me know your thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine to follow existing SQL DB conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the port in the name, though, as in my case there are multiple MongoDB servers in the same host.

def failed(self, event):
subsegment = xray_recorder.current_subsegment()
subsegment.add_fault_flag()
subsegment.put_metadata('failure', event.failure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the failure event type be pickled all the time? The pymongodb API reference doc doesn't have very clear information based on what I looked at here: http://api.mongodb.com/python/current/api/pymongo/monitoring.html#pymongo.monitoring.CommandFailedEvent. If not then the failure event serialization might fail randomly at the runtime. See more information here: https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled. You could consider manually serialize it before add it to the metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I digged into PyMongo's implementation. Yes it is unspecified in the documentation, but I suspect it may not change for no reason.
The failure field is assigned when PyMongo receives a remote error, in format of BSON. PyMongo decodes it into a python dict and wrap into an OperationError.

Anyway, the failure property contains a dict-typed BSON Object, and is pickle-able.

I did an experiment:

try:
    a = client.a.command('')
except pymongo.errors.OperationFailure as err:
    d = err.details
    print(d, type(d), d.keys())

Result:

{'ok': 0.0, 'errmsg': 'no such command: \'\', bad cmd: \'{ : 1, $readPreference: { mode: "secondaryPreferred" }, $db: "a" }\'', 'code': 59, 'codeName': 'CommandNotFound'} <class 'dict'> dict_keys(['ok', 'errmsg', 'code', 'codeName'])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this since official doc is not clear about the failure model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, in the documents there may be pymongo types bson.son.SON and bson.objectid.ObjectId. They can be pickled, but how will X-Ray backend handle them?

Copy link
Contributor

@haotianw465 haotianw465 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. TravisCI is failing so I'm holding off the merge until it gets fixed, although it is not likely related to this PR.

@xplorld
Copy link
Contributor Author

xplorld commented Jun 8, 2018

It seems to be a timeout issue..? Would you like to rerun the CI?

@haotianw465
Copy link
Contributor

I will post a fix soon. It's related to a dep config issue in tox.ini. Will merge this PR once it's fixed.

@haotianw465 haotianw465 merged commit 1010db3 into aws:master Aug 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants