Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata key contains illegal characters #415

Closed
kirillgroshkov opened this issue May 24, 2019 · 46 comments · Fixed by googleapis/gax-nodejs#532
Closed

Metadata key contains illegal characters #415

kirillgroshkov opened this issue May 24, 2019 · 46 comments · Fixed by googleapis/gax-nodejs#532
Assignees
Labels
api: datastore Issues related to the googleapis/nodejs-datastore API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@kirillgroshkov
Copy link

kirillgroshkov commented May 24, 2019

We've started noticing a new error in production (and in staging env also) that happens ~50 times a day for us (for many thousands or maybe even millions of calls per day, don't have exact number). It looks like this in a stacktrace:


unhandledRejection:  { 
Error: Metadata key "a" contains illegal characters     
at Http2CallStream.call.on (/ncbackend3/node_modules/@grpc/grpc-js/build/src/client.js:102:45)     
at Http2CallStream.emit (events.js:194:15)     
at Http2CallStream.EventEmitter.emit (domain.js:459:23)     
at Http2CallStream.endCall (/ncbackend3/node_modules/@grpc/grpc-js/build/src/call-stream.js:74:18)     
at ClientHttp2Stream.stream.on (/ncbackend3/node_modules/@grpc/grpc-js/build/src/call-stream.js:205:30)     
at ClientHttp2Stream.emit (events.js:189:13)     
at ClientHttp2Stream.EventEmitter.emit (domain.js:459:23)     at emit (internal/http2/core.js:236:8)     
at process._tickCallback (internal/process/next_tick.js:63:19)  
 code: 2,   
details:    'Metadata key "a\u0000\u0000\u0000" contains illegal characters',   
metadata: Metadata { internalRepr: Map {} },   
note:    'Exception occurred in retry method that was not classified as transient' }

I've even seen this error myself once while calling a certain endpoint that was doing 1 query and 2-3 getById calls to Datastore. So, no saves was made, only querying.

This error is not reproducible all the time, as I mentioned, rather "rare", but still noticeable (~50 hard errors every day, that end up in http 500 errors for our api users).

I don't know how to dig further and provide other helpful info. Important to note that it only happened after updating from ^3.0.0 (latest version in 3.x.x) to 4.0.0. I'm assuming the error could somewhere downstream (in grpc-js itself) that was updated as a sub-dependency. We're on the latest version of everything as of today (yarn upgrade was done and we have no deps pinned).

ps: forgot to add that the "key" in the metadata has different values in each other error (we see it in Sentry), examples (err.message here):

Metadata key "a" contains illegal characters
Metadata key " �û·" contains illegal characters
Metadata key " �`ñ" contains illegal characters
Metadata key " r�ð" contains illegal characters
Metadata key " òe×" contains illegal characters
Metadata key " �wâ" contains illegal characters
...

pps: we were using .runQuery() method with 2 simple filters by number fields and .get(key) (that's what I called getById). Same code in 3.x.x version worked just fine.

Noticed that a occured more often than others.

Environment details

  • OS: node:10-alpine official docker image
  • Node.js version: 10.15.3
  • npm version: not applicable
  • @google-cloud/datastore version: 4.0.0
@kmontag
Copy link

kmontag commented May 24, 2019

Just confirming I'm seeing this frequently as well, but also not reproducibly. Retrying the request in question almost always succeeds. It happens regardless of whether there are simultaneous parallel requests.

Our project contains get, runQuery, and upsert calls, and I've seen it happen with all of them.

@kirillgroshkov
Copy link
Author

I think we'll be forced to downgrade to 3.x due to this soon. Too many errors that affect users...

@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Jun 4, 2019
@bcoe bcoe added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed 🚨 This issue needs some love. labels Jun 5, 2019
@bcoe
Copy link
Contributor

bcoe commented Jun 5, 2019

@alexander-fenster I wonder if this could relate to our upgrade to grpc-js?

@kirillgroshkov
Copy link
Author

@alexander-fenster I wonder if this could relate to our upgrade to grpc-js?

It's probably not, cause we did this sequence of things:

  1. Downgraded to @google-cloud/datastore@^3.0.0. Error was gone.
  2. Upgraded to @grpc/grpc-js@0.4.1 (the one that was unpublished later) and it was working fine for 2 days (no "metadata errors" or new errors).

ps: After I've just written this, I realized I'm not 100% sure. Tried to trace changes via yarn.lock commit history, but the dependency tree between google libs is more complicated to grasp it in few seconds. If it's important - I can try to trace more carefully. But I see multiple of versions of grpc-js in my lock due to different google libs installed.

@yoshi-automation yoshi-automation removed the triage me I really want to be triaged. label Jun 5, 2019
@alexander-fenster
Copy link
Contributor

@kirillgroshkov @kmontag The main difference between v3.x and v4.x under the hood is the replacement of the original grpc (which uses C++ code compiled into Node.js binary) with the new @grpc/grpc-js. The error you folks see comes from grpc-js so it might be totally possible that the upgrade is causing this. Similar metadata check exists in grpc as well but it has completely different logic.

The good news is that you can keep using the C++ grpc for the time being by passing it to the client constructor:

npm install grpc
const grpc = require('grpc'); // not '@grpc/grpc-js'
const {Datastore} = require('@google-cloud/datastore');
const projectId = 'YOUR_PROJECT_ID';
const datastore = new Datastore({
  projectId,
  grpc   // <-- passing it down to the client
});

I would really appreciate it if you folks try that and update this issue with the results (e.g. if the errors disappear or not). This will help us report a bug to gRPC developers.

Thanks!

@alexander-fenster
Copy link
Contributor

@murgatroid99 @nicolasnoble This is FYI for now but will likely be upgraded to a real @grpc/grpc-js issue :)

@alexander-fenster
Copy link
Contributor

A quick clarification: please upgrade back to v4.0.0 before substituting grpc.

@murgatroid99
Copy link

murgatroid99 commented Jun 6, 2019

At the moment, the best possible thing to help figure this out would be a tcpdump of a stream that has this error, preferably not all encryped with TLS. If anyone could manage to get one of those and share it here, that would be very helpful.

@murgatroid99
Copy link

Also, it might be helpful to share a more complete list of the error messages with different metadata keys, possibly in a Gist.

@chrishiestand
Copy link

@murgatroid99 I'm seeing these fairly regularly. How would one get a tcpdump of datastore traffic without TLS? Just to the datastore emulator right?

I've started a gist and I'll update this as I go along today. https://gist.github.com/chrishiestand/4c8a7b36e93cbddbafd09a53be113d08

@murgatroid99
Copy link

If you can reproduce this with the datastore emulator, that might be even better. Then I could probably take the tcpdump myself.

@klon
Copy link

klon commented Jun 8, 2019

I am seeing this error using Firestore specifically firebase-admin 8.0.0 under NodeJS 10.

Metadata key "@���" contains illegal characters at Http2CallStream.call.on (/app/node_modules/@grpc/grpc-js/build/src/client.js:101:45) at Http2CallStream.emit (events.js:203:15) at Http2CallStream.endCall (/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:63:18) at ClientHttp2Stream.stream.on (/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:194:30) at ClientHttp2Stream.emit (events.js:198:13) at emit (internal/http2/core.js:236:8) at process._tickCallback (internal/process/next_tick.js:63:19)

@alexander-fenster
Copy link
Contributor

@klon Please follow my advice from the above comment #415 (comment) to use the C-core grpc as a workaround.

Alternatively, if you could help gRPC team (@murgatroid99) debug the issue by providing the non-encrypted dump of the data, it will be really appreciated. Thanks!

@klon
Copy link

klon commented Jun 10, 2019

@alexander-fenster Thanks for your reply.

Why would you introduce grpc-js to a production library when it clearly says "Note: This is an beta-level release" in the grpc-js readme?

@alexander-fenster
Copy link
Contributor

@klon This is a valid question.

To answer your question: this is not some external library we randomly started using for no reason, it's a library we develop and is supposed to be a replacement of the existing grpc library (that is based on C++ code compiled into Node.js binary, which makes it hard to use it everywhere).

It's indeed marked as beta, but it still makes sense to use it and not grpc for several reasons (module size, install time, load time, platform binary dependency in grpc). Before we switched to @grpc/grpc-js, we tested it across all 50+ client libraries we support for several months in a row, and made sure there are no differences in behavior. Also, we made sure the rollback to the original grpc is possible and easy (and it actually is a helpful workaround in this particular case). Finally, switching to @grpc/grpc-js was a semver major bump for all client libraries (including Datastore library) that communicates that this is somewhat big change.

It's sad to see that this particular edge case was not caught by our tests.

Bugs do happen, we apologize for this inconvenience this caused for you and other folks in this thread. This bug would've likely happened even if @grpc/grpc-js was called v1.0.0 and changed the release status to general availability.

I guess we could've done somewhat better job communicating this change (that happened within our latest semver major release) and explaining a way to rollback to grpc in the documentation. However, I must note that this is the only grpc-js-related issue we got so far, out of tens of client libraries switched, and gRPC folks are looking into this and I hope we'll have a fix for this issue.

Thanks!
-Alex

@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Jun 15, 2019
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Jun 25, 2019
@wilhuff
Copy link

wilhuff commented Jun 27, 2019

This issue should have been fixed by googleapis/gax-nodejs#521. Can this issue be closed?

@alexander-fenster
Copy link
Contributor

Thanks @chrishiestand! I'll try that.

Can I ask you to make the following change to the grpc-js code right inside your node_modules folder:

--- node_modules/@grpc/grpc-js/build/src/call-stream.js.orig    2019-07-10 15:22:19.048842980 -0700
+++ node_modules/@grpc/grpc-js/build/src/call-stream.js 2019-07-10 15:22:41.096762637 -0700
@@ -200,6 +200,7 @@
                         metadata = metadata_1.Metadata.fromHttp2Headers(headers);
                     }
                     catch (error) {
+                        console.log('Bad headers:', headers);
                         this.endCall({
                             code: constants_1.Status.UNKNOWN,
                             details: error.message,

That way it will dump the headers that failed to parse, and at least we'd know if they make any sense or are just a garbage. If you happen to get those headers, please check if they contain any private data before pasting it here, and feel free to obfuscate / strip everything.

@csidell-earny
Copy link

@chrishiestand Sorry it wasn't a reference to the original problem but the one mentioned by @kirillgroshkov

@chrishiestand
Copy link

@alexander-fenster

 Bad headers: [Object: null prototype] {
   ':status': 200,
   'content-disposition': 'attachment',
   'content-type': 'application/grpc',
   'A\u0000\u0000\u0000': 'Wed, 10 Jul 2019 22:37:34 GMT',
   'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
 }
 
 "Metadata key \"a\u0000\u0000\u0000\" contains illegal characters

@alexander-fenster
Copy link
Contributor

Aha! Thank you so much!

@murgatroid99 Does it make any sense for you? Is it something in Node.js http2 stack, or something that gRPC server sends?

I suggest, while we're investigating, let's just start ignoring bad metadata keys at that place. Receiving an invalid metadata key from server is something we (in the gRPC client or in a client library) cannot do anything about, so throwing an exception might be an overreaction.

@alexander-fenster
Copy link
Contributor

@chrishiestand Can I ask you to dump some examples of good headers (with no failure) as well, just to compare the good ones vs the bad ones? (just put the same console.log line right after the fromHttp2Headers call) We are trying to figure out something here.

@chrishiestand
Copy link

@alexander-fenster here's the headers from a single function call

Good headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 date: 'Wed, 10 Jul 2019 23:05:01 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}
Good headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 date: 'Wed, 10 Jul 2019 23:05:01 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}
Good headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 date: 'Wed, 10 Jul 2019 23:05:01 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}
Good headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 date: 'Wed, 10 Jul 2019 23:05:01 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}
Good headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 date: 'Wed, 10 Jul 2019 23:05:01 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}
Good headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 date: 'Wed, 10 Jul 2019 23:05:01 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}
Good headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 date: 'Wed, 10 Jul 2019 23:05:01 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}
Good headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 date: 'Wed, 10 Jul 2019 23:05:02 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}

@murgatroid99
Copy link

If you ever see a headers block with one of the bad strings that doesn't have a lot of zeros (like " �wâ") it would be good to see what that headers block looks like too.

@chrishiestand
Copy link

chrishiestand commented Jul 10, 2019

I'll update this comment as they come in:

Bad headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 ' pø0': 'Wed, 10 Jul 2019 23:39:46 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}

"Metadata key \" pø0\" contains illegal characters"

Bad headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 ' ÐÕJ': 'Wed, 10 Jul 2019 23:58:43 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}

Metadata key \" ðõj\" contains illegal characters


Bad headers: [Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 '  M�': 'Thu, 11 Jul 2019 17:49:30 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}

Metadata key \"  m�\" contains illegal characters

[Object: null prototype] {
 ':status': 200,
 'content-disposition': 'attachment',
 'content-type': 'application/grpc',
 ' \u0010Þà': 'Thu, 11 Jul 2019 20:03:25 GMT',
 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"'
}

@murgatroid99
Copy link

I have published grpc-js@0.5.2 with Alex's suggestion: now these invalid keys will cause a warning to be emitted instead of an error, and the call will continue to be processed. I am very curious to see if anyone gets multiple warnings from a single process and, if so, whether they have anything we haven't seen so far.

@alexander-fenster
Copy link
Contributor

Since grpc-js is in 0.x range, v0.5.2 won't be pulled automatically since google-gax depends on ^0.4.0. I will make a patch release of gax that bumps this dependency and will update this thread.

alexander-fenster added a commit to googleapis/gax-nodejs that referenced this issue Jul 12, 2019
This fix includes grpc/grpc-node#962 which is a work around for a bad metadata keys problem that many customer see across several libraries (googleapis/nodejs-datastore#415, googleapis/nodejs-logging#527, googleapis/nodejs-pubsub#667 are just a few).
JustinBeckwith pushed a commit to googleapis/gax-nodejs that referenced this issue Jul 12, 2019
This fix includes grpc/grpc-node#962 which is a work around for a bad metadata keys problem that many customer see across several libraries (googleapis/nodejs-datastore#415, googleapis/nodejs-logging#527, googleapis/nodejs-pubsub#667 are just a few).
@alexander-fenster
Copy link
Contributor

alexander-fenster commented Jul 15, 2019

@kirillgroshkov @kmontag @klon @csidell-earny @chrishiestand

Hey folks,

We have released google-gax v1.1.5 that pulls @grpc/grpc-js v0.5.2 in which the invalid headers (that are coming from Node.js http2 module) are printed in a warning, but do not throw exception. Please upgrade your dependencies, it should fix the problem for you folks (but please keep posting those warnings here and/or to the Node.js issue grpc/grpc-node#962 since it might help understand what's going on in http2 module).

@chrishiestand Thank you so much for your help with debugging!

@kirillgroshkov
Copy link
Author

After doing yarn add @google-cloud/datastore and yarn upgrade @google-cloud/datastore and then yarn why @grpc/grpc-js I see this:

yarn why v1.16.0
[1/4] 🤔  Why do we have the module "@grpc/grpc-js"...?
[2/4] 🚚  Initialising dependency graph...
[3/4] 🔍  Finding dependency...
[4/4] 🚡  Calculating file sizes...
=> Found "@grpc/grpc-js@0.4.3"
info Reasons this module exists
   - "@google-cloud#datastore" depends on it
   - Hoisted from "@google-cloud#datastore#@grpc#grpc-js"
info Disk size without dependencies: "340KB"
info Disk size with unique dependencies: "428KB"
info Disk size with transitive dependencies: "428KB"
info Number of shared dependencies: 1
=> Found "google-gax#@grpc/grpc-js@0.5.2"
info This module exists because "@google-cloud#datastore#google-gax" depends on it.
info Disk size without dependencies: "384KB"
info Disk size with unique dependencies: "472KB"
info Disk size with transitive dependencies: "472KB"
info Number of shared dependencies: 1

It says I have 2 different versions of grpc-js in my node_modules and one of them is old (0.4.3), because @google-cloud/datastore depends on it.

Is this correct? I would expect that you bumped it there too

@alexander-fenster
Copy link
Contributor

@kirillgroshkov Technically you're right and we need to bump it here as well (@bcoe can you please do it?), but if you check how it is used, you'll see it's here just for TypeScript types and no actual code from @grpc/grpc-js is used directly in datastore library. It should be fine just with the google-gax update that would pull @grpc/grpc-js v0.5.2 for itself.

@kirillgroshkov
Copy link
Author

Deployed it yesterday. So far 0 errors, looks promising.

@alexander-fenster
Copy link
Contributor

@kirillgroshkov Are you able to see any warnings (console.warn) in your setup? If so, feel free to update the issue with the warnings printed. We still need to collect some data to have more information for the upstream issue.

@hermanbanken
Copy link
Contributor

This issue has been hitting us too for the past 20 days (around 2500 occurrences of that error so far that I can tell), so we'll try to update and see if it fixes the issue at our side too.

@kirillgroshkov
Copy link
Author

@kirillgroshkov Are you able to see any warnings (console.warn) in your setup? If so, feel free to update the issue with the warnings printed. We still need to collect some data to have more information for the upstream issue.

I'm unable to check console logs in our current system. But 8 days since my last post - it runs stable, original issue is not reproduced since.

@hermanbanken
Copy link
Contributor

It is fixed in out application too. We see no more of these:

Error: Metadata key "a???" contains illegal characters     
at Http2CallStream.call.on (/ncbackend3/node_modules/@grpc/grpc-js/build/src/client.js:102:45)

however, we apparently traded it for a different Error now:

Error
    at Http2CallStream.call.on (/app/heimdall/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:96:45)

which is happening since that update.

@shaunwarman
Copy link

shaunwarman commented Aug 12, 2019

Seeing the same. It happens very frequently now too so really unsure what to do. Testing with @google-cloud/datastore@3 again. All I get is the same Http2Callstream.. related stacktrace so I am flying a bit blind.

@hermanbanken any luck since your last comment?

@JustinBeckwith @schmidt-sebastian is this related to: firebase/firebase-functions#536 (comment)

@hermanbanken
Copy link
Contributor

The amount of occurrences those new errors is much lower, so it is kind of an improvement.

The console.warn in the original 'fix' is kinda annoying though, because it ends up in stderr as text, which also ends up in StackDriver. At least the function is no longer terminated by the throw...

@shaunwarman
Copy link

shaunwarman commented Aug 12, 2019

Your fix was to bump @grpc/grpc-js to v0.5.2?

$ npm ls @grpc/grpc-js
myapp
├─┬ @google-cloud/datastore@3.1.2
│ └─┬ google-gax@0.25.6
│   └── @grpc/grpc-js@0.3.6
└─┬ @google-cloud/logging-bunyan@0.10.1
  └─┬ @google-cloud/logging@4.5.2
    └─┬ google-gax@0.25.6
      └── @grpc/grpc-js@0.3.6

I dropped back down to 3 and am not seeing issues in the last hour or so

@schmidt-sebastian
Copy link

This sounds like the same issue that we fixed in googleapis/nodejs-firestore#697. The fix was to update google-gax to 1.1.2. I am however not certain that it will be a drop in replacement here since your dependencies still shows gax at 0.25.6.

@google-cloud-label-sync google-cloud-label-sync bot added the api: datastore Issues related to the googleapis/nodejs-datastore API. label Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/nodejs-datastore API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.