-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamoDB write EPROTO #862
Comments
@phsstory Is this only happening with dynamodb or other services as well? Do you have any |
Error is intermittent. S3, DynamoDB and STS are only services used, only calls to DynamoDB reported issues. Did not show in logs on a 0.12.0 node server; however the load on that server is significantly less so no calls could have happened during the error windows. 37 Failures 0 Success Mon Dec 28 2015 21:02:42 GMT+0000 (UTC) Mon Dec 28 2015 21:03:13 GMT+0000 (UTC) Mon Dec 28 2015 21:03:19 GMT+0000 (UTC) Mon Dec 28 2015 21:03:29 GMT+0000 (UTC) Tue Jan 05 2016 17:34:21 GMT+0000 (UTC) Our client is running on ElasticBeanstalk but there are reports of this error in other environments (see links to amazon forums)
This is close to the extent of information I am able to provide on a public location. |
@phsstory httpOptions: {
agent: new https.Agent({
rejectUnauthorized: true,
keepAlive: true
})
} More agent options here: https://nodejs.org/api/http.html#http_new_agent_options |
I'm having this same problem. |
as an update, it has been 24hrs since we downgraded node to 0.12.0 with no indication of EPROTO errors, same code and sdk versions. I would guess it has to do with a combination of certain machines behind a load balancer (intermittent with varying time periods among reported users) and some internal deprecated feature of node or one of it's libraries. There is some speculation that it might be related to nodejs/node#2244 Have you had a chance to contact the DynamoDB infrastructure guys to see what might have changed on Dec 28th or if there are any machines/loadbalancers still in the cluster still serving up RC4-SHA? |
Also got the same error using node v4.1.2. Adding what @chrisradek suggested appears to work for me. |
from the sounds of it this is a compatibility issue between the sdk/node and the DynamoDB service clusters, keepAlive is only mitigating the chances of connecting to a problematic machine/loadbalancer. Unfortunately the amazon forums are the worst place to get information about amazon changes/issues since there is never a follow up by the techs. @chrisradek can you run this through the inter-department channels to see if there are DynamoDB machines/loadbalncers that attempt to TLS negotiate with RC4-SHA? |
@phsstory |
@phsstory |
I also have noticed this issue. And because I set keepAlive to true, it was unable to serve any request. My Beanstalk configuration:
I was just wondering if we could interfere in choosing the cipher (if that is indeed the case): https://nodejs.org/api/https.html#https_https_request_options_callback |
@chrisradek That is unfortunate as it would have been a quick diagnosis to the issue. As a recap: |
It seems to be an issue in node: |
It's happening a lot. I am using only Dynamo with latest node 5.4.0 and is having the problem. |
Continuing to observe this issue on Node.js v5.4.0, setting new AWS.DynamoDB(
{
httpOptions: {
agent: new https.Agent(
{
rejectUnauthorized: true,
keepAlive: true
})
}
}); {
"target": {
"module": "aws-sdk",
"version": "2.2.28",
"export": "DynamoDB",
"method": "putItem",
"args": [
{
"Item": "*REDACTED*"
}
]
},
"type": "log",
"level": "error",
"message": "failed to put item in DynamoDB",
"error": {
"message": "write EPROTO",
"code": "NetworkingError",
"errno": "EPROTO",
"syscall": "write",
"region": "us-east-1",
"hostname": "dynamodb.us-east-1.amazonaws.com",
"retryable": true,
"time": "2016-01-23T02:56:44.705Z"
},
"stack": "Error: write EPROTO\n at Object.exports._errnoException (util.js:856:11)\n at exports._exceptionWithHostPort (util.js:879:20)\n at WriteWrap.afterWrite (net.js:763:14)",
"timestamp": "2016-01-23T02:56:44.706Z"
} |
@chrisradek I still believe this to be an issue with deprecated node functionality and a portion, likely small, of the DynamoDB stack using said encryption. Since we have no knowledge of the DynamoDB stack architecture, I can only take make a wild guess. based on the time windows and the mitigation effect of keepalive, we will find it in the mechanism that handles dynamic growth while maintaining network connectivity for clients connecting during those brief periods. In the interim, can you get a list of the available cyphers and options from DynamoDB so we can correlate them with the supported options in node to restrict the negotiation as mentioned by @awerlang ? |
I did look into ciphers by refreshing this over and over again (trying to catch different IPs) https://www.ssllabs.com/ssltest/analyze.html?d=dynamodb.us-east-1.amazonaws.com&latest From my most recent run (54.239.20.144, 54.239.16.203):
|
I forgot an additional detail that I haven't seen reported yet. I happen to be measuring the latency of the HTTPS calls to DynamoDB using (edit: there was no exception to 25 seconds for edit 2: Looks like 25 seconds comes from DynamoDB |
Thanks @tristanls, This is starting to have the signs of a load balancer shuffle with connections being sent to a machine not quite ready to handle load and dumping the connection prematurely causing the TLS endpoint to bail on the client connection mid buffer. The odd ball is node 0.12 not having any issue. I wonder if 0.12 was less strict on this particular protocol error or silently ignored it. |
you might want to see if the DynamoDB dev env shows this same behavior with one of the reported node version under heavy load by multiple simulated accounts. You might be able to capture the connection and rebuild for protocol analysis. |
@phsstory looking at where EPROTO can come out of based on comments in nodejs/node#3692 the paths look (to my untrained eye) quite different. latest: https://github.com/nodejs/node/blob/master/src/tls_wrap.cc#L593 |
@tristanls 0.12 doesn't throw EPROTO for this issue, it continues on without issue or degraded performance. We had to downgrade our production servers until this issue can be resolved. |
@phsstory understood. I am working on the assumption that this is what's causing the EPROTO issue in v5.5.0 and latest: https://github.com/nodejs/node/blob/master/src/tls_wrap.cc#L592-L593. This code path for throwing EPROTO doesn't seem to exist in 0.12, which could explain the difference. |
Well, I haven't reproduced the problem (because in the stack below v0.12.7 also fails with EPROTO) error. But I'm not gonna be able to get back to this for a bit so wanted to post findings so far. The I'm hoping to keep iterating on In the meantime, I dumped the existing setup to docker hub https://hub.docker.com/r/tristanls/eproto-plus-patched-node/
|
This also started happening in our ELB environments after updating nodejs from 0.12.9 to 4.2.3 { [NetworkingError: write EPROTO] time: Mon Jan 25 2016 16:29:07 GMT+0000 (UTC) } Error |
I experience the same problem with DynamoDBLocal. No workarounds from above help. Thanks people. |
@koresar how do you connect to local? |
const aws = require('aws-sdk');
const dynamo = new aws.DynamoDB({
region: 'foo-west-1',
apiVersion: '2012-08-10',
accessKeyId: 'bar',
secretAccessKey: 'baz'
endpoint: new aws.Endpoint('localhost:8000'), // tried all kinds of uris
httpOptions: {
agent: new https.Agent({ // tried all combinations
rejectUnauthorized: true,
keepAlive: true,
secureProtocol: 'TLSv1_method', // tried all the openssl supported methods
ciphers: 'ALL'
})
}
});
dynamo.describeTable({TableName: 'my-table'}, callback); // get error after a timeout Error:
However, there is a workaround - use non-encrypted connection. Problem solved (until you need TLS). |
@koresar DynamoDBLocal listens for HTTP on port 8000
|
Has anyone noticed that while the issue is "fixed" with dynamo - it is still there for all the other amazon services? Namely if i want to invoke Lambdas from within Lambda....? |
@bradennapier The issue was specific to DynamoDB, since their servers were configured in such a way that they were affected by a bug in openssl. If you are seeing this issue with another service, please let us know. |
Yes I am currently experiencing the problem on one function. I am only assuming it is due to Lambda because allm y other functions appear to be fine. I am not sure if this is the exact same issue but it always shows in the same way - functions start with a second or so left and I get charged the entire timeout period for every call until I re-upload the function.
For example, the above is a function which has 15 second timeout - the first call I make is to log remaining time. |
That sounds like a timeout issue which is not what we experience with this bug. Have you tried turning off keepalive? |
@phsstory are you saying to do that for dynamo? My other functions seem fine so I believe this one is due to lambda but i cant be sure - its a simple function so it's fairly annoying
So I should turn that into
in my lambda function? (FYI Yes I tried without the promised with the same result) |
@bradennapier Am I understanding correctly that your Lambda functions are failing to complete within a given amount of time? Are you getting an error? If you extend your Lambda function to allow it to run longer, do your functions complete? |
Hello Just tried this code with and without the httpOptions var aws = require('aws-sdk');
var https = require('https');
exports.handler = function(event, context) {
var dynamo = new aws.DynamoDB({
region: 'eu-west-1',
httpOptions: {
agent: new https.Agent({
rejectUnauthorized: true,
keepAlive: false,
ciphers: 'ALL',
secureProtocol: 'TLSv1_method'
})
}
});
dynamo.listTables(function(err, data) {
console.log('inside listTables');
if (err)
console.log(JSON.stringify(err, null, 2));
else
console.log(data.TableNames);
});
}; And I get timeout. No problem locally calling AWS database. |
Without the |
@benoittgt you need to add
PS make sure that you have 10sec timeout for lambda, cuz it's 6sec by default @chrisradek could you guys put this stuff in Docs everywhere, because people never get real dynamo error messages if error is retry-able. And DynamoDB client has hardcoded retry logic which is not configurable as other AWS-SDK service clients. cheers and happy coding |
Thanks to both of you for the fast answers. With : var aws = require('aws-sdk');
var https = require('https');
exports.handler = function(event, context) {
var dynamo = new aws.DynamoDB({
region: 'eu-west-1',
maxRetries: 8
});
dynamo.listTables(function(err, data) {
console.log('inside listTables');
if (err)
console.log(JSON.stringify(err, null, 2));
else
console.log(data.TableNames);
});
}; I get
|
@benoittgt |
I did the test with context on both Node version available with the same errors posted on #862 (comment). var aws = require('aws-sdk');
exports.handler = function(event, context) {
var dynamo = new aws.DynamoDB({
region: 'eu-west-1',
maxRetries: 8
});
dynamo.listTables(function(err, data) {
if (err) {
context.fail(err.stack)
} else {
context.succeed('Function Finished! Data :' + data.TableNames);
}
});
}; |
@benoittgt |
The NodeJS commit referenced in #862 (comment) has been present in all releases since 6.0.0, and DynamoDB has updated their servers to use TLS 1.2 everywhere, so I don't believe customers are continuing to see this issue. |
As of 3/22/2016, the eBay API has several servers that can only negotiate TLS v1.0 sessions, and several servers that can negotiate TLS v1.0, v1.1 and v1.2. Node/OpenSSL get confused by this, and occasionally attempt to parse a v1.2 response using TLS v1.0 and vice versa. The error you get back from the request looks something like this: ``` { [Error: write EPROTO 140113357338496:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:../deps/openssl/openssl/ssl/s3_pkt.c:362: ] code: 'EPROTO', errno: 'EPROTO', syscall: 'write' } ``` As far as I can tell, this isn't patched yet, in Node or OpenSSL. But setting the following options forces all connections to be negotiated with TLS v1.0, effectively fixing the issue. More reading: aws/aws-sdk-js#862 nodejs/node#3692 https://www.ssllabs.com/ssltest/analyze.html?d=api.ebay.com If you know anyone at eBay, please tell them it's a) unacceptable to have servers that can only negotiate TLS v1.0, and b) unacceptable to have a SSL certificate that was signed with SHA1, and they should upgrade both things.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread. |
Node: 4.2.1
AWS-SDK: 2.1.21
From the logs
Other server on node 0.12.0 does not have this issue.
If issue with current node and sdk, please follow up on AWS forums
https://forums.aws.amazon.com/thread.jspa?messageID=694520򩣸
https://forums.aws.amazon.com/thread.jspa?messageID=693172#693172
Summary (as of 2016/05/13):
Edit: potential keepAlive errors noted.
Edit: Removed untested status, see @southpolesteve's comment
Server side mitigation (2016/06/29):
The text was updated successfully, but these errors were encountered: