-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@TRACER.capture_method causes botocore response body of objects retrieved from S3 to be pre-exhausted #238
Comments
Hey Paul - Thanks for raising this issue, and also great to hear you find this library helpful! If I understood this scenario correctly, you're downloading a file from S3 but not reading it's chunks (.read()) since you might have another function doing that e.g. pandas to turn "file-like" object into a Data Frame. However, when you add capture_method decorator it is reading the object - calling .read() on your obj return - and this does not happen without the Tracer. Did I understand this right? If so, could you also share a snippet on how you're calling load_file_from_s3 fn, and your boto3 version? From Powertools perspective, we primarily decorate and call your function when you consume it - we are unaware of what's inside. This could be something else, but more than happy to try reproduce it tomorrow with the same boto version and snippet you have. Thank you :) |
Hi Heitor, Yes it is exactly what I am doing and the behaviour is as you describe, if I have the decorator then the output outside of that method is The versions of boto3 below: boto3 >> version": "==1.16.30" Below are the functions involved, although worth reiterating that once the object returned from the boto call is outside of
Apologies if this does turn out out to be something else, a configuration issue or something. Its just the application was working absolutely fine until we started a ticket to add power tools tracing and structured logging. |
Don't worry @paulalex it could be something with us, as I can't see anything immediately odd in that code. If there is something on our side, or if it takes too long to figure out, I'll push a context manager for Tracer to you can still use Tracer capabilities within your code more easily I'll take this for a spin tomorrow (5pm here now). Thanks for sharing all that info |
hey @paulalex - I've managed to replicate it using that snippet you sent last. There's something in the decorator logic and in X-Ray that I can't figure out why yet - I'll keep digging. In the meantime, you can use a context manager as it works expected: def load_file_from_s3(bucket_name, key):
try:
with tracer.provider.in_subsegment("## load_file_from_s3") as subsegment:
obj = s3_client.get_object(Bucket=bucket_name, Key=key)
# you can add annotations and metadata using subsegment.put_annotation, subsegment.put_metadata
# https://awslabs.github.io/aws-lambda-powertools-python/core/tracer/#escape-hatch-mechanism
except botocore.exceptions.ClientError as exc:
if exc.response["Error"]["Code"] != "404":
raise exc
# Getting the value of the body as csv is fine up to this point, once returned it is no longer available to be read
return obj It's really odd because the response from S3, Content-Length at least, is exactly the same as before. It's almost as if Things I've tried for the record:
I'll update here if we find the root cause and here's the source I used to reproduce now without all the boilerplate: https://github.com/heitorlessa/issue238-pt |
@heitorlessa Great work you definitely put a lot of effort in! So it looks like the issue is not with powertools but with either the X-ray libraries or boto? Thanks for the context manager snippet, I will switch to using this in my code for now. Noticed the same thing, that the response length is the same but it appears as if something has already read the body. Thanks a lot for all of this! |
Funny enough if you pass the Content-Length for the file_obj["Body"].read(file_obj["Content-Length"]).decode('utf-8') So it's definitely not an issue with X-Ray per se but something to be investigated in Boto and the IO stream, because at first glance boto isn't actually giving a file-like stream but a modified version of it -- I need to dig in as I'm personally hooked into the problem now despite not being Powertools per se :D TL;DR, if you pass the content-length to
I'll do more digging next week when I free up more time, hope that helps ;) |
Thats great thanks again and I hope this doesnt become your weekend! |
Like Heitor, I was also pretty hooked on figuring this one out! Long story short, its due to the Ultimately its expected behaviour - if you want the function response data in the trace, it has to be serialized. I'll add something to our docs to clarify before closing this issue, as I can certainly imagine this catching others out. Thanks for the helping us figure this out with the detailed bug report @paulalex! |
In simple words, X-Ray SDK uses When running locally this is a No-Op operation meaning it simply ignores hence why it wasn't easy to reproduce but in Lambda - @cakepietoast brilliantly traced all calls to reproduce that ;) |
@heitorlessa a hidden benefit of the |
indeed @michaelbrewer though if I'm honest it took me by surprise the lack of Closing this as it's now available as part of a bugfix release 1.9.1 |
Hi,
I am really loving powertools, its been really useful recently for me, although I have encountered an issue since starting to use it in a new serverless application, I am currently using it in a few others and it is working o.k there and I had not noticed this issue.
This may well be a 'User Error' and what I am experiencing may be desired or necessary behaviour and I can change the way my code functions so I can still decorate this method but I think it is worth raising this.
I have provided as much information as I can below.
When a method is decorated with
@TRACER.capture_method
and that method retrieves a file from S3 using boto3 and returns the S3 object\dictionary then thebotocore.response.StreamingBody
object is already read meaning there is no data to be read anymore.To add tracing to a method that calls S3 to retrieve a CSV file using boto3, has affected me in that since adding powertools tracing retrieved CSV files had no data when converted to data frame yet the response body was populated and files retrieved from S3.
Expected Behavior
Method is decorated and objects stream data is not read.
Current Behavior
See the following code snippet which works if the decorator is removed:
Environment
Latest
The text was updated successfully, but these errors were encountered: