-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch requests #93
Patch requests #93
Conversation
I pulled your branch and played around with it a bit. Aside from a few minor errors in the code that I had to fix, it works. It circumvents the expensive recursive parsing that the normal JSON parser does. I also took a look at your thoughts on #72. Some comments below.
Agreed. The PR was merged before we had done significant load testing or considered alternative approaches such as this one. I also did not expect the connection to DynamoDB to be as unstable as it is.
It does prevent us from upgrading internally within the library, but should not affect the user of the library since we use a vendored version of requests from botocore. Not sure why this is a problem though, as with either approach you are bound to the version of requests that botocore uses to make requests under the hood.
I strongly dislike botocore's default retry policy. It's another reason I wanted to stop using botocore to make the request. The default retry policy is terrible for any system taking user traffic. It is only configurable through some poorly documented botocore overrides using dotfiles. The retry algorithm is For the reasons stated above, I'd like to stick to the current approach. We can add features to give it parity with the old botocore behavior in addition to making it more configurable, as well as configurable from the library using Willing to consider the other side of this and help implement it in either case. Thoughts? @dylanlingelbach @gtowne @joshowen @jlafon |
@danielhochman - comments below:
It's been a while so my memory is a bit fuzzy here. I think I meant one of two things - either this prevented us from upgrading our version of botocore (which had a newer version of requests vendored) or this prevented us from hacking around this in a different way. In either case, I don't think this concern is that important.
The botocore default retry policy is based on the AWS recommendation. I think your points are fair. Did you explore allowing replacing the botocore retry logic via a similar method to how I replaced the JSON parser? Or at the very least allowing a model to override the max retries (which would give you the current logic)? That feels like a good balance of preserving the default botocore behavior for those who want it (like us) while allowing the flexibility others need.
Wouldn't it be less effort to have botocore support the hooks needed to configure retries, connection pool size, etc? It is very hard for us to get the behavior we want (default botocore) with the current approach. To be fair, I haven't looked at this since December so my thinking could be out of date. Hope this helps explain my thinking |
Right, we have to remember the goals of the AWS recommendations and botocore. Botocore was built for compatibility, not performance, which is why its parser is so slow. The AWS recommendations are meant to apply to all AWS APIs, most of which deal with resource orchestration in EC2, not live user traffic. They make little sense for an online database.
I am going to look into that. It wouldn't give us the granularity we might want per-operation, but will give us per-table. What makes me uncomfortable though is that these overrides are so poorly documented. In the alternate approach we're in a position to replace botocore altogether with a request signing library. If we go with this PR's approach we are coupling tightly to botocore without using its primary offering which is an interface to the API.
I really don't think we're that far away from parity once we add retry. It's not that complex though it's obvious that it's poorly understood given how disruptive my original changes were. Still could go either way on this. Looking forward to thoughts from @jlafon on which direction he wants to go in. |
I didn't realize performance was valued over compatibility for this library. I had assumed this library was about providing a simpler, Pythonic, interface over DynamoDB - not that it was a high performance DynamoDB library. If performance is valued over compatibility, I don't think what I have here is the right direction. If this is a high performance library I assume you'll want to go the route of pulling out botocore entirely and writing directly against the API to optimize. Let me know what you think. |
@dylanlingelbach I don't think either approach meaningfully affects compatibility. Pynamo formulates the entire request body and implements the DynamoDB API either way. The tradeoff here is the amount of code that we have to maintain for retry in exchange for dramatically better performance. I don't think it's too much code to maintain. I would consider sticking with botocore and modifying its parsing/retry behavior if it was well documented and easy to configure local to our library. After some discussion with @jlafon, I have decided to move us away from botocore. #121 implements retry for RequestExceptions which should eliminate 99% of the pain that users of the library are having with the current approach. I am going to take a more measured approach to implementing retry for ProvisionedThroughputExceptions and the like rather than blanketing the same retry logic on top of them. It should come within the next month though. |
@jlafon - here is my initial cut at hooking into the
botocore
response parsing.Three things:
PATCH_METHOD
for the tests to be lower level, likely at the actual request level. Currently the tests don't actually exercise thebotocore
code so my change isn't really getting coveragebotocore
response parsing vs this yet, but I'm not sure how what we have here would be any faster than whatbotocore
has - both have to iterate through the response and convert to binary, timestamps, etcLet me know how this looks to you