-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 upload manager retry broken with seekable body #2485
Comments
Hi @project0 , Thanks for reaching out. Without reproduction steps its very hard to understand what the problem is. I'm able to supply to uploader a body with bytes.NewReader, and I'm simulating an error that the SDK will treat as retryable. import (
"bytes"
"context"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/feature/s3/manager"
"github.com/aws/aws-sdk-go-v2/service/s3"
"io"
"log"
"net/http"
)
type CustomTransport struct {
http.RoundTripper
Counter int
}
func (c *CustomTransport) RoundTrip(req *http.Request) (*http.Response, error) {
c.Counter++
if c.Counter <= 2 {
// simulating a generic server error (500), which the SDK treats as a retryable error.
return &http.Response{
StatusCode: 500,
Body: io.NopCloser(bytes.NewBufferString("Simulated failure")),
Header: make(http.Header),
}, nil
}
return c.RoundTripper.RoundTrip(req)
}
func main() {
cfg, err := config.LoadDefaultConfig(
context.TODO(),
config.WithRegion("us-east-1"),
config.WithClientLogMode(aws.LogRetries|aws.LogResponseWithBody),
config.WithHTTPClient(&http.Client{
Transport: &CustomTransport{
RoundTripper: http.DefaultTransport,
},
}))
if err != nil {
log.Fatalf("unable to load SDK config, %v", err)
}
client := s3.NewFromConfig(cfg)
uploader := manager.NewUploader(client)
reader := bytes.NewReader([]byte("hello world"))
_, err = uploader.Upload(context.Background(), &s3.PutObjectInput{
Bucket: aws.String("testbucket"),
Key: aws.String("foo"),
Body: reader,
})
if err != nil {
panic(err)
}
} This will result in the following logs:
As evident in the logs, the SDK retries successfully. Now you mention rewinding the body, I have put a breakpoint in line 155 of retry/middleware.go and RewindStream() does not return this error indicating that the request has a seekable stream. Without better more isolated reproduction case it will be hard for us to assist you. Thanks, |
@RanVaknin I can see you're simulating a 500 error in your tests, which implies getting a complete HTTP response back. What about the case of a closed network connection? We regularly get this kind of errors from our GitHub Actions workers (which I suspect are hosted somewhere in the US), when writing to S3 buckets in
If attempting to retry the upload stream, Terraform produces the following error:
|
Still reproducible in terraform 1.7.3 when I run terraform across 10 modules with terragrunt, at least 1 of the 10 modules will fail with this issue |
The nature of this issue is inherently complex. I'll do my best to outline the factors at play here. I will assert though that the specifics of the error (including whether an HTTP round trip actually took place) don't have any bearing on the behavior in question:
What can go wrong here though is that unwittingly or otherwise, a calling agent (that could be the SDK itself or a user) could put middleware before the beginning of the retry loop that elides the "seekability" of the stream. In practice, the only inbuilt middleware that does this is the body payload calculation routine that sigv4 signing depends on. Immediately after the recent auth refactor, there was an issue with the middleware order such that payload calculation did come before the retry loop, which would have deterministically caused this issue to surface. That was fixed here: #2438. Therefore, the root cause of "request stream is not seekable" is exactly as @project0 has indicated. The crux of the back-and-forth here, though, is that we're not seeing that happening in the SDK's current release state. The code snippet provided by @RanVaknin simulates the exact scenario in which the issue would arise, but the reported behavior isn't observed. In reality then we can only assume one of the following things are occuring within the terraform project:
This issue tugs at the themes we observed in #2370, where the unfortunate combination of the following:
caused obnoxious cross-module incompatibility issues. I can leave this open for a while longer, but as of now it appears to me based on everything I've seen and done in-code that the most current set of modules is operating correctly. [EDIT]: "after the beginning of the retry loop" => "before the beginning of the retry loop" |
Describe the bug
I am not sure if this is a exclusive problem of the upload manager or PutObject in general is affected.
The retry mechnanism with putObject is not triggered because the "request" stream is not seekable. Not sure where it is messed up or how to fix it.
see also hashicorp/terraform#34528.
Relevant code:
https://github.com/hashicorp/terraform/blob/v1.7.3/internal/backend/remote-state/s3/client.go#L200-L205
bytes.NewReader
produces a seekable reader, so the problem must be somewhere in the aws sdk.Expected Behavior
Retry of upload content is working as expected. The upload manager should be able to produce a full working version in this case.
Current Behavior
It looks like
isStreamSeekable
is not correctly set on the request object.see also
https://github.com/aws/smithy-go/blob/daa9e2bc92cf10976d9cac2893b74834ded16150/transport/http/request.go#L95-L106
aws-sdk-go-v2/aws/retry/middleware.go
Lines 151 to 163 in 3a7909e
results into retry failing with:
Reproduction Steps
I am not sure how to reproduce or provoke the bug
Possible Solution
No response
Additional Information/Context
see also hashicorp/terraform#34528
similar issue was reported here #2038, but in this case we have actually provided a proper body implementing
ReadSeeker
.AWS Go SDK V2 Module Versions Used
Compiler and Version used
1.21.5
Operating System and version
n/a
The text was updated successfully, but these errors were encountered: