-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pubsub Error: The operation was aborted #2661
Comments
@murgatroid99 any ideas why we might be getting an aborted error here? |
I'm not that familiar with the surface APIs of this library; what exactly is it that is running for a week before failing? Do you know what is supposed to be happening there, in terms of how the gRPC API is being called? Is it a sequence of requests or a single long-lived stream, or something else? |
For this specific usage we open 5 bidi streams, when any of the streams close we usually replace it with a new stream but not on |
OK. It looks like the |
what do you mean by a server-side issue ? as far as I can share, the application has only the GCF function, and the wrapper function as main: // this was supposed to deploy by GCF, but I designed the main wrapper in the same way of calling
function loadData(event) {
// parse the event, got gcs bucket and filename from the message
return bigquery.dataset('...').table('...')
.import(gcs.bucket('...').file('...'))
.then(([ job, _ ]) => job.promise())
}
exports.loadData = loadData;
async function main() {
const topic = pubsub.topic('...');
const [ sub, _ ] = await topic.createSubscription(...);
// with ackDeadlineSeconds 90 seconds although bigquery import took not more than 20s
sub.on('message', async function(msg) {
try {
await loadData({ data: msg });
} catch (err) { console.error(err); }
msg.ack();
});
}
if (require.main === module) {
main();
} because the GCF is currently still a Node-v6 not the v8 I wanted, I kept the loadData function with the Promise API only without async await, can still be deployable to GCF anytime. our architecture has some piece of code is uploading to GCS with one avro file every minute, this So in GCE at runtime, this function is supposed to be called once every minute (on every avro file upload), we have monitoring of this node process, it looks like running normal, CPU usage / Memory usage doesn't go up; but however with continuously running not more than 7 days, the node process becomes down because of the |
I mean that this is an error that is originating from the PubSub server. |
@jganetsk do we know why the server would return this kind of error after a week? |
then in the case PubSub server aborted connection, is there anyway as workaround in my main function can handle and reconnect? and make the node process not to go down? |
I think you should be able to catch the error via var ABORTED = 10;
var subscription;
openSubscription();
function openSubscription() {
subscription = pubsub.subscription('my-sub');
subscripton.on('error', handleError).on('message', onMessage);
}
function handleError(err) {
if (err.code === ABORTED) {
subscription.close(openSubscription);
}
}
function onMessage(message) {
message.ack();
} |
@callmehiphop When subscribing, the client library should catch all retryable errors (including "The operation was aborted") and reconnect instead of propagating the error to the caller. |
@jganetsk can we get a definitive list of what error should be retryable in PubSub? Historically we've only been requested to retry on the following errors
In the PubSub client we also retry on |
We did discuss this previously, and I was concerned that we were not covering enough codes for retry. Assuming that this is the list of codes: https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto These are definitely not retryable because the user needs to take action:
These would be unexpected and it's your call whether to retry them or not (I would lean towards retrying these):
The rest are retryable. |
yeah, I would like to see that |
@c0b I'm going to open a PR right now, it'll be pretty small so I imagine I'll have a release out by EOD. |
@c0b I've published |
Background: I have a simple one function project that is doing continuously loading data into bigquery table, triggered on every cloud storage file upload finish, by
GCS pubsub-notifications
, which is on when every new file uploaded to a particular bucket with particular prefix, then trigger my function to run via a pubsub notification, it was considered a perfect case for the GCF, however it's blocked by https://issuetracker.google.com/issues/66695033 https://stackoverflow.com/questions/45304673/random-apierror-invalid-credentials-calling-bigquery-from-google-cloud-functi failed by some RandomApiError: Invalid Credentials
once every a couple of days; I believe that's an GCF operation issue, the Google Cloud Function isn't really production ready;So I switched to use a plain VM from GCE, use a little wrapper to manually subscribe to this pubsub topic (from above same GCS pubsub-notifications) and whenever a messages come in, call the same function designed for GCF, the function is very simple just call
bigquery.dataset('...').table('...').import(gcs.bucket('...').file('...')).then('...')
it was running fine but however didn't last long, each run lasts for not more than a week in GCE then aborted itself; my solution is to have to have a shell wrapper in an endless loop
while :; do node ... ; done
but wonder why pubsub aborted grpc connection in the case? the libraries in use are the latest
@google-cloud/pubsub@0.14.4
@google-cloud/bigquery@0.9.6
with node-v8.6.0The text was updated successfully, but these errors were encountered: