Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact potentially sensitive information from error objects #76

Merged
merged 8 commits into from
Dec 8, 2023

Conversation

JoshMock
Copy link
Member

Exposes new Transport options to improve the JS client's security position on potentially sensitive data.

Right now, many of the Error classes defined in this library include a DiagnosticResult object full of request and connection metadata that is useful for debugging. This can include HTTP headers and URLs. Common sources of sensitive data are removed while serializing to JSON or logging to the console, so common use cases are already covered. However, since that data is still stored on these objects in memory, sensitive data may not be scrubbed when custom serialization methods are used.

To solve for these cases, Error objects now fully redact and/or remove potentially sensitive data at instantiation time rather than at serialization time, so they do not exist in memory at all and cannot be exposed by any means.

These redaction methods are all exposed as options on the Transport class:

  • redactConnection - Completely removes the Connection object from the DiagnostictResult (false by default)
  • redactDiagnostics - Recursively traverses the DiagnosticResult object, redacting any values whose keys match—case-insensitive—a pre-set list of sensitive key names: authorization, password, apiKey, and x-elastic-app-auth (true by default)
  • additionalRedactionKeys - A list of additional keys that can be redacted when redactDiagnostics is true; note that it is not possible to remove the pre-set keys listed above, only add new ones (defaults to [])

See elastic/elasticsearch-js#2026.

src/errors.ts Outdated Show resolved Hide resolved
src/security.ts Outdated
*/
export function redactObject (obj: Record<string, any>, additionalKeys: string[] = []): Record<string, any> {
const toRedact = [...secretKeys, ...additionalKeys].map(key => key.toLowerCase())
return traverse(obj)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traverse loops recursively through every key/value in any deeply-nested object, and can easily account for circular references (see line 47). I wanted both of these things so that this function could be generic enough to be used elsewhere later.

It was proposed that we could use fast-redact to implement this redaction feature. However, it uses wildcards to match a key string at multiple levels. I benchmarked it against my solution and, when using it to match keys at 5+ levels of nesting—DiagnosticResult includes HTTP headers nested 5 levels deep—it was substantially slower than using traverse, and did not account for circular references.

This validation would have been ideal, but it would break this lib's
contract with elasticsearch-js. Since we want to ensure the
transport's major version will also work with the same major version
of Elasticsearch, bumping this lib to 9.0.0 would have created
confusion.
Copy link
Contributor

@ezimuel ezimuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should provide some documentation on this feature and how a user can customize the list of keyword to be filtered. Moreover, do we provide a way to switch off this features? I'm assuming we'll remove the sensitive information by default but if someone wants to have it we should provide (e.g. for debugging).


import traverse from 'traverse'

const secretKeys = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a user customize this list?

Copy link
Member

@delvedor delvedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always a bit on the fence about adding new options, users may overlook the documentation and adds more code to maintain in general. If the goal here is only to hide the sensitive options, how about using a getter?
For example:

class Error {
  constructor (opts) {
    this.foo = opts.foo
    Object.defineProperty(this, 'bar', {
      get: () => opts.bar
    })
  }
}

const error = new Error({ foo: 1, bar: 2 })
console.log(error)
console.log(error.bar)
throw error

This will log the following:

Error { foo: 1 }
2

/Users/delvedor/Development/elastic/sdh-stats/error.js:13
throw error
^
Error { foo: 1 }

Node.js v18.12.1

In the Transport's case we could do:

export class ConnectionError extends ElasticsearchClientError {
  meta?: DiagnosticResult
  constructor (message: string, meta?: DiagnosticResult) {
    super(message)
    Error.captureStackTrace(this, ConnectionError)
    this.name = 'ConnectionError'
    this.message = message ?? 'Connection Error'
    Object.defineProperty(this, 'meta', {
      get: () => meta
    })
  }
}

The same in other error classes. It may not be conventional, but it does the trick without introducing new options.

@JoshMock
Copy link
Member Author

JoshMock commented Dec 5, 2023

I'm always a bit on the fence about adding new options, users may overlook the documentation and adds more code to maintain in general.

I agree that adding new options increases maintenance cost and should be avoided when possible, but it didn't seem reasonable in this case.

The goal of the options is to allow multiple redaction "strengths," and to be able to disable it when the old behavior is needed for e.g. local debugging. When discussing with @rudolf, @pgayvallet and @ezimuel, adding these options—with secure but non-breaking-change defaults—seemed like the most flexible approach: current users get better protection by default, users who want more protection can add it, and it can be disabled when necessary.

@delvedor Did you have an idea of how to achieve the same goals with getters? It wasn't clear by your example.

@JoshMock
Copy link
Member Author

JoshMock commented Dec 5, 2023

I think we should provide some documentation on this feature and how a user can customize the list of keyword to be filtered. Moreover, do we provide a way to switch off this features? I'm assuming we'll remove the sensitive information by default but if someone wants to have it we should provide (e.g. for debugging).

@ezimuel The three new options in the PR description are how all of these things can be customized. The optional list of redaction keys (additionalRedactionKeys) will only append new keys, so the defaults (authorization etc.) cannot be accidentally disabled by adding extra keys. To disable the default keys one would need to set redactDiagnostics to false. I'm willing to revisit that approach if it seems too strict.

As for docs, when I upgrade elasticsearch-js to use this new transport version, I'll add documentation there, as well as some notes in the changelog.

@rudolf
Copy link

rudolf commented Dec 6, 2023

Common sources of sensitive data are removed while serializing to JSON or logging to the console, so common use cases are already covered. However, since that data is still stored on these objects in memory, sensitive data may not be scrubbed when custom serialization methods are used.

Just to clarify, these existing protections only work on connection not on other sensitive parts of ElasticsearchClientError contained under meta: DiagnosticResult. So while JSON.stringify(connection) is safe, JSON.stringify(Error) is not (same for console.log).

I think both layers are usefulh bb

@rudolf rudolf closed this Dec 6, 2023
@JoshMock
Copy link
Member Author

JoshMock commented Dec 6, 2023

@rudolf I assume you closed on accident 😆

@JoshMock JoshMock reopened this Dec 6, 2023
@JoshMock
Copy link
Member Author

JoshMock commented Dec 7, 2023

I chatted with @delvedor earlier today and we talked through his suggestion in detail. Rather than adding a redactDiagnostics option that has to be passed all the way down from the client, we can reduce the number of new options to 2 and just redact potentially sensitive data all the time.

Rather than replacing values with [redacted], the values are still there, just hidden from all serialization and iteration methods by wrapping it in a getter function set on the object using Object.defineProperty. By this method, the only way redacted values can be accessed is by directly hitting the property (e.g. console.log(myObject.authorization)). Someone who is doing that explicitly should be aware that this data is sensitive and proceed accordingly. All serialization methods run on myObject will hide the value:

  • util.inspect
  • JSON.stringify
  • accessing all values via Object.values, Object.keys and Object.entries
  • looping over all keys on an object with a for..in loop: for (const key in redactedObject) { ... }
  • throwing an object as an Error

See commit b486c79 for this change.

redactConnection and additionalRedactionKeys options are still exposed as described above.

Planning to merge this tomorrow, Dec 8.

@rudolf
Copy link

rudolf commented Dec 8, 2023

Rather than replacing values with [redacted], the values are still there, just hidden from all serialization and iteration methods by wrapping it in a getter function set on the object using Object.defineProperty. By this method, the only way redacted values can be accessed is by directly hitting the property (e.g. console.log(myObject.authorization)). Someone who is doing that explicitly should be aware that this data is sensitive and proceed accordingly. All serialization methods run on myObject will hide the value

I'm not sure how I feel about this...

I 100% agree that JSON.stringify(ErrorWithSecrets) should never serialise those secrets. And here it makes a lot of sense to me to define getters on the connection, and all the *.headers keys so that those headers never escape. Although it's a slightly different implementation we have this protection in place on connection objects already. If this is easier to adopt in a non-breaking way this could be great layer to have for other users.

BUT *rant incoming 😉 * I feel like Elasticsearch-js has no business throwing around secrets all over the place. "I asked you to call Elasticsearch for me and tell me what went wrong, the authorization headers you used to make the request are irrelevant!" If I e.g. compare mongodb's nodejs client errors just contain the code, message, labels and a connection id string https://github.com/mongodb/node-mongodb-native/blob/main/src/error.ts

I know there's some backwards compatibility challenges, but if I could be completely selfish I just want an error that contains the bare minimum for me to understand what went wrong. Anything more I would rather use the diagnostics event emitter to debug. Things to include would be (probably missing something):

{
  attempts: number;
  aborted: boolean;
  body: TResponse;
  statusCode: number;
  warnings?: string[]; // not sure what this is?
  type: string;
  reason: string;
}

But definitely I don't want the request/response headers or the connection they're not useful and they're dangerous so why add it to the error object in the first place. Removing these completely makes me feel much better than trying to use a scalpel to find just the things we know are secrets.

@JoshMock
Copy link
Member Author

JoshMock commented Dec 8, 2023

Thanks for the feedback, @rudolf. I agree full removal is ideal. A diagnostic looking more like your proposal is definitely a consideration we can take for the next major version. But since the DiagnosticResult has been attached to Error objects for at least a few minor releases now, it would be a breaking change.

I'm leaning toward reverting the last commit, and changing the redaction options to something like:

interface RedactionOptions {
	type: 'basic' | 'advanced' | 'off'
    additionalKeys?: string[]
}
  • off would leave everything alone, as is the case now
  • basic would be the default, and would do what redactDiagnostics: true does (replace values with [redacted]).
  • advanced would do what redactDiagnostics: true, redactConnection: true does, but also strips (almost) all optional values off of the diagnostic, to get as close as possible to what you proposed:
{
  body?: TResponse
  statusCode?: number
  warnings: string[] | null
  meta: {
    context: TContext
    name: string | symbol
    request: {
      params: {
        method: string
        path: string
      },
      options: {}
      id: any
    }
    connection: null
    attempts: number
    aborted: boolean
  }
}

By nesting the three options into their own interface, the top-level option would just be redaction: RedactionOptions, and effectively behave like one or two options instead of 3. Having type be a union of strings also allows us to add more choices later.

@JoshMock
Copy link
Member Author

JoshMock commented Dec 8, 2023

Another set of options for type could be:

  • off
  • hide: replace values with hidden getters
  • replace: replace values with [redacted]
  • remove: what advanced does above: strip out most/all optional values from the diagnostic

@JoshMock
Copy link
Member Author

JoshMock commented Dec 8, 2023

I settled for off, replace and remove, with replace as default. I decided to skip hide, which would use the Object.defineProperty method. I don't think it's necessary, but if we see a need for it later, we can add it because the structure of the redaction options are more future-proof now.

Thanks everyone for the feedback. This solution feels a lot more refined. 🙏

Otherwise it breaks compatibility with the client
@JoshMock
Copy link
Member Author

JoshMock commented Dec 8, 2023

I tested this branch against all minor release branches of elasticsearch-js back to 8.6. Glad I did. Quick fix in 5b1324d, and now all tests on those branches are passing.

Merging soon.

@JoshMock JoshMock merged commit 5ae8589 into main Dec 8, 2023
13 checks passed
@JoshMock JoshMock deleted the redaction branch December 8, 2023 19:39
@JoshMock
Copy link
Member Author

JoshMock commented Dec 8, 2023

Not going to publish to npm until Monday, just to be safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants