Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.0.0-rc.1 #1

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open

v1.0.0-rc.1 #1

wants to merge 41 commits into from

Conversation

ahmadnassri
Copy link

This is essentially a complete re-write and expansion on the original functionality of v0.1.0

  • modular schema support, allows for easy future expansions
  • support RFC 7239 & X-Forwarded-* as primary trusted schemas
  • allows for implementers to choose schemas to process by passing an options object
  • this is a breaking api change. where v0.1.0 returns an array of IPs, v1.0.0 returns an object representing parameters of RFC 7239
    • the version increment is required to allow dependants libraries/implementers to continue to operate safely with out version breakage (assuming version resolving occurs with npm)
    • ref: RFC 7239 Parameters
  • coupled with support for less common schemas, named in association with vendor usage:
    • cloudflare
    • fastly
    • microsoft
    • nginx
    • zscaler
TODO
  • test with all direct dependents & indirect dependents after updating to forwarded().addrs
    • hapi-forward has no tests
    • hapi-forwarded works
    • proxy-addr works (with minor fix to its own test)
    • express works
    • at this point, confident the remainder will also work.
  • test node 0.6
  • test with real http server / client
  • parse Forwarded header elements using RFC 7230 token syntax
  • start pull requests for dependents with appropriate changes. (post merge & release)

This is essentially a complete re-write and expansion on the original functionality of `v0.1.0`

- modular schema support, allows for easy future expansions
- support `RFC7239` & `X-Forwarded-*` as primary trusted schemas
- allows for implementers to choose schemas to process by passing an `options` object
- this *is* a breaking api change, where `v0.1.0` returns an array of IPs, `v1.0.0` returns an object representing parameters of `RFC 7239`
  - the version increment is required to allow dependants libraries/implementers to continue to operate safely with out version breakage *(assuming version resolving occurs with `npm`)
  - [`RFC 7239` Parameters](http://tools.ietf.org/html/rfc7239#section-5)
- coupled with support for less common schemas, named in association with vendor usage:
  - cloudflare
  - fastly
  - microsoft
  - nginx
  - zscaler
host: req.headers && req.headers.host ? req.headers.host : undefined,
port: req.connection.remotePort.toString(),
ports: [req.connection.remotePort.toString()],
proto: req.connection.encrypted ? 'https' : 'http'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this actually works in Node.js 0.6 and the others? I only ask because the test is just a mock, so we have to verify manually if we cannot add a real test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was relying on the travis test to be honest, I managed to manually test all the way down to 0.8, then gave up on getting 0.6 installed on my machine. I'll get that working today and confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are only as honest as your tests are--since you are using mocks, then you are not actually testing anything (you have to test a real req for Travis CI to be of any help to you).

addrs: [req.connection.remoteAddress],
by: null,
host: req.headers && req.headers.host ? req.headers.host : undefined,
port: req.connection.remotePort.toString(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a disconnected socket, req.connection.remotePort is undefined and this will throw. This needs to be handled.

@dougwilson
Copy link
Contributor

proxy-addr works (with minor fix to its own test)

it's actually a bug in your PR here that it relieved. the issue is that you are using a simple mock for req, but it does not fully test it. req.connection.remotePort can legitimately be undefined.

Here is the follow-though using Node.js 0.10.24:

  1. Access to the remotePort getter at https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L581
  2. Call this._getpeername() https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L561
  3. Two places in the method return {} (https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L563 and https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L569).
  4. Return the port property from this._getpeername(); the two cases from #3 will thus return undefined for remotePort

@dougwilson
Copy link
Contributor

Another concern I have is why does this PR seem to be significantly slower? So this module does not have a benchmark suite, but proxy-addr does, so simply installing this PR into proxy-addr and changing the usage to forwarded(req).addrs results in the following difference:

#### forwarded 0.1.0

> node benchmark\matching.js

  1 test completed.
  2 tests completed.
  3 tests completed.
  4 tests completed.
  5 tests completed.
  6 tests completed.

  trust none     x 864,390 ops/sec ±1.62% (175 runs sampled)
  trust all      x 909,210 ops/sec ±1.55% (176 runs sampled)
  trust single   x  18,817 ops/sec ±1.37% (171 runs sampled)
  trust first    x 949,981 ops/sec ±1.30% (176 runs sampled)
  trust subnet   x  18,371 ops/sec ±1.55% (173 runs sampled)
  trust multiple x  18,834 ops/sec ±1.44% (174 runs sampled)
### forwarded PR

> node benchmark\matching.js

  1 test completed.
  2 tests completed.
  3 tests completed.
  4 tests completed.
  5 tests completed.
  6 tests completed.

  trust none     x 54,219 ops/sec ±1.54% (180 runs sampled)
  trust all      x 52,660 ops/sec ±1.75% (175 runs sampled)
  trust single   x 13,887 ops/sec ±1.55% (177 runs sampled)
  trust first    x 53,196 ops/sec ±1.36% (177 runs sampled)
  trust subnet   x 13,665 ops/sec ±1.39% (175 runs sampled)
  trust multiple x 13,615 ops/sec ±1.51% (180 runs sampled)

The matching benchmark is the closest one that measures just the forwarded module performance (mainly the "trust none", "trust all", and "trust first", which have almost no overhead other than simply calling forwarded).

As you can see above, based on those three benchmarks, performance dropped significantly. Since this module would theoretically be called on every request to a web server, it has a direct impact on the maximum req/sec a single Node.js instance can serve.

I understand, of course, that the current implement was very simply, but I did not expect to see it drop all the way down to 55k ops/sec. My main question is: can we do better? What are the bottlenecks in this code? Can v8 optimize functions, or does it keep bailing out for some reason?

This note on performance is not technically a blocker, though, but just something I noted and think one of us should at least glance into :)

function isSecure (req) {
try {
var cf = JSON.parse(req.headers['cf-visitor'])
return cf.scheme !== undefined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is broken; according to https://support.cloudflare.com/hc/en-us/articles/200170536-How-do-I-redirect-HTTPS-traffic-with-Flexible-SSL-and-Apache- the value {"scheme":"http"} is valid and this routine would incorrectly mark the request as secure.

@ahmadnassri
Copy link
Author

Another concern I have is why does this PR seem to be significantly slower?

that is concerning, I did not expect much change in the way of performance, so it never occurred to me to apply a benchmark, I'll take a deeper dive into this and find the bottleneck.

thanks for all the great feedback, I'll make the appropriate changes.

@dougwilson
Copy link
Contributor

thanks for all the great feedback

Absolutely! I wanted to get your the feedback quickly :) I noticed the repo was last changed in 2014, so I just make a few mundane updates to master; it is up to you if you want to merge it in your branch, rebase your branch on top, or simply do nothing w.r.t those updates :)

Also, with the significant code, please feel free to also add your name for 2015 in the LICENSE file.

return forwarded
}

splitMap(header, ELEMENT_SEPARATOR, function parseElement (el) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this will fail to parse the header Forwarded: for="foo,bar" properly, opening a vector for security issues when relying on the header for access control.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my reading of RFC 7239 the spec states that each for parameter represents a single address. unless I'm reading this wrong. can you link to or highlight the section that says otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is irrelevant; an incoming header can contain any values, valid or not. The current implementation will do the wrong thing and this is not good, as the only reason you are parsing these headers is for logging, auditing, or access controls.

Just think of a standard proxy that does not modify anything existing in the Forwarded header and simply adds it's own data to it; it will pass-through the bogus value that will create false parses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, what I'm saying is that the current parse is actually not RFC 7239 compliant, as you are not correctly following the ABNF, which allows for , and ; to appear within a quoted-string and the defined ABNF for the header is as follows:

Forwarded   = 1#forwarded-element
forwarded-element = [ forwarded-pair ] *( ";" [ forwarded-pair ] )
forwarded-pair = token "=" value
value          = token / quoted-string
token          = 1*tchar
tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
               / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
               / DIGIT / ALPHA
quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
qdtext         = HTAB / SP /%x21 / %x23-5B / %x5D-7E / obs-text
obs-text       = %x80-FF

The given header, Forwarded: for="foo,bar" is syntactically valid, yet this module will definitely parse it incorrectly.

Because the RFC allows extensions, perhaps your proxy decides it's going to include a custom field of the authenticated user's name. Uh, oh!

# this is in bash, so the "" is an escaped double quote (i.e. it's only a single double quote in the program)
$ node -pe "require('forwarded')({connection:{remotePort:0},headers:{forwarded:'user=""bad,for=_whitelisted_ip"";for=_blacklisted_ip'}})"
{ addrs: [ 'whitelisted_ip', 'blacklisted_ip' ],
  by: null,
  host: undefined,
  port: '0',
  ports: [ '0' ],
  proto: 'http' }

Even though the header is 100% syntactically valid, with a single forwarded entry of {for: '_blacklisted_ip', user: 'bad,for=_whitelisted_ip'}, we end up thinking that the proxy got the request from _whitelisted_ip incorrectly due to the invalid parsing this module did :(

And I'm not even a security expert; I can tell you that if we don't fix the parsing, it will most likely end up as a critical security bug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation will do the wrong thing

can you clarify what you mean by that?

the token syntax does not actually allow a comma to be present:

     token          = 1*tchar

     tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
                    / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
                    / DIGIT / ALPHA
                    ; any VCHAR, except delimiters
Delimiters are chosen from the set of US-ASCII visual characters not allowed in a token
(DQUOTE and "(),/:;<=>?@[\]{}").

so, if anything, we should discard for="foo,bar" since its not a valid token

verification vs. validation

further, imo, the purpose of this library is to parse the headers, _not to verify values_, the resulting values should be filtered through a secondary layer if the implementer chooses to verify values, such as network addresses, host names, etc ...

which leaves us with the question: should we add validation here?

specifically: to validate addresses, hostnames, ports.


as a side, RFC 7239 node identifier syntax could also include unknown and obfuscated identifiers:

   MUST have a leading underscore "_".  Furthermore, it MUST also
   consist of only "ALPHA", "DIGIT", and the characters ".", "_", and
   "-".

which we can easily validate as well.

@ahmadnassri
Copy link
Author

I've added some minor performance improvements and switched to using @lpinca forwarded-parse for better ANBF compatibility (though it seems to be very strict)

next steps:

  • fix travis issues for node 0.6, 0.8
  • add travis tests for recent node versions
  • increase code-coverage
  • add benchmark suite (in separate repo)
  • improve performance where possible
  • test alternative RFC 7239 parsing module and compare performance and spec compatibility

@ahmadnassri
Copy link
Author

@dougwilson any further direction from you to the todo list above?

@dougwilson
Copy link
Contributor

and switched to using @lpinca forwarded-parse for better ANBF compatibility

This part I don't understand. The entire purpose of this module is literally to do that operation, not depend on another module to do it. This module's name marched the name of the header for a reason...

@dougwilson
Copy link
Contributor

Basically I think there is a huge misunderstanding of the current separation of concerns between the modules here.

This module is meant only to be concerned with parsing/formatting of the HTTP "Forwarded" header (and I suppose the X headers that it unified)

The proxy-addr module is designed to determine the "proxied address" of a request, which could include configuring the source of proxy headers.

This PR seems to be heavily mixing those two together into this single library to me.

@dougwilson
Copy link
Contributor

A good litmus test in this organization is that if your code logically organizes into more than a single JavaScript file, that is multiple modules, not one. There are still some old, inherited modules that have a lib folder, but we are slowing working to get rid of that such that every module consists only of a single index.js file (for runtime code).

@dougwilson
Copy link
Contributor

Oh, I know I keep making comments but I keep thinking of new things to say :) I have not yet looked at the new changes, or even really looked though to remind myself of the existing changes, but I think all the features and stuff in here are good, but more though from us (and perhaps the technical committee that now oversees this organization) needs to be done around which changes go into which modules, what new modules should be created, etc.

@ahmadnassri
Copy link
Author

This part I don't understand. The entire purpose of this module is literally to do that operation, not depend on another module to do it. This module's name marched the name of the header for a reason...

I suppose I did not expand on my thoughts here... mea culpa, let me explain:

  • forwarded-parse seems to be doing a proper job of parsing the Forwarded header in accordance with the spec.
  • @lpinca has volunteered donating his work to this organization, which is great.
  • this original forwarded module focused on X-Forwarded-For header, and only concerned itself with the IP addresses, including the socket address for the request (though the latter part could be considered out of scope, if we're talking only about headers)
  • the RFC 7239 spec, introduces by, host, proto in addition to for as core parameters. (while also allowing for extensions)

so my thoughts up this point:

  • the parsing module if donated into the org can continue to live separately and focuses on the spec (while allowing extensions)
  • this module should focus on those core parameters, and any fallback that might be derived from defacto standards (such as X-Forwarded-* and X-Real-* headers) or vendor extensions used in popular cloud services.
  • the proxy-addr continues to focus on Proxy IP functionality, leveraging info from this module.

hence the separation and added dependency, mostly due to the extension parsing allowed by spec, as I'm a fan of separation of concerns.

@ahmadnassri
Copy link
Author

Oh, I know I keep making comments but I keep thinking of new things to say :)

all good, same here :)

more though from us (and perhaps the technical committee that now oversees this organization) needs to be done around which changes go into which modules, what new modules should be created, etc.

as I was reviewing the original work (almost a year old now!) I started thinking of better way to break this down too, I'm no longer a fan of having the vendor headers included in the core of this module.

I believe now, it should only concern it self with the RFC 7239 Spec and X-Forwarded-* only (meaning we can merge the work from forwarded-parse directly here.)

and the vendor specs can be additional, separate modules, that can be passed to this one for additional processing (e.g. building the stack of IPs in a desired order across multiple specs)

not sure what a good pattern for the latter though, or how it would propagate across higher dependencies for user-level control? (e.g. proxy-addr > connect > framework)

a simple pattern can perhaps just be an array of module names, or directly references functions (that in turn can be loaded from modules) that provide a common object { for, by, host, proto } that can then be used to build a stacked result.

@ahmadnassri
Copy link
Author

now that I'm back at a desktop keyboard, here's an example of how my previous thoughts might shape out:

minimal / default
const forwarded = require('forwarded') // included dependencies: RFC2739, X-Forwarded-*
let result = forwarded(request)
verbose
const forwarded = require('forwarded') 
const rfc2739 = require('forwarded-rfc2739')
const xforwarded = require('forwarded-x-forwarded')
const cloudflare = require('forwarded-cloudflare')
const fastly = require('forwarded-fastly')

// custom order
let options = {
  schemas: [cloudflare, rfc7239, fastly, xforwarded]
}

let result = forwarded(request, options)

@dougwilson
Copy link
Contributor

I think proxy-addr should be receiving those options, and this module is the one that does the parsing for one of those "schemas": Forwarded.

@ahmadnassri
Copy link
Author

would proxy-addr also deal with for, by, host, proto properties and data beyond just the "proxy address"?

@dougwilson
Copy link
Contributor

There is no reason why it cannot do that, since those are all basically part of the "proxied address". Ideally one should at least get enough information to construct the URL for the request, and just the hostname/IP is not enough information, because it could be on a non-default port, could be using HTTP or HTTPS, and more.

Typically if people just wanted to parse their headers, they would directly depend on the module that just does the header parsing they need. The proxy-addr module was created to glue the pure parsing modules together with trust chains and expose some kind of interface that modules like Express or koa could consume for what they need.

I think the long-term plans is that proxy-addr will be the module that will get all the address information: host, scheme/protocol, port, etc.

It used to also parse things like X-Forwarded-For, but that was doing too much, so it was split out into this module, with the desire to get parsing of the real Forwarded header instead, though that header doesn't seem to have taken off very much (yet), so there hasn't been any requests to Express/Koa to even get alternative headers as part of the core parsing, though it would be good to have.

Because we keep going back and forth on this discussion, I still have not even had time to lok at your new code or even refresh myself on the existing content of this pull request, sorry!

@ahmadnassri
Copy link
Author

Because we keep going back and forth on this discussion, I still have not even had time to lok at your new code or even refresh myself on the existing content of this pull request, sorry!

no worries, I think your point about proxy-addr is along the same lines of what I'm suggesting

I think we're debating semantics of where the logic sits.

The logic used here can be abstracted to proxy-addr and multiple modules / packages representing different "schems" can be created, then proxy-addr glues them together as you said.

TL;DR

I think we're saying the same thing. :)


I'll let you review the current code first, then we can discuss further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants