v1.0.0-rc.1 #1

ahmadnassri · 2015-05-30T07:35:52Z

This is essentially a complete re-write and expansion on the original functionality of v0.1.0

modular schema support, allows for easy future expansions
support RFC 7239 & X-Forwarded-* as primary trusted schemas
allows for implementers to choose schemas to process by passing an options object
this is a breaking api change. where v0.1.0 returns an array of IPs, v1.0.0 returns an object representing parameters of RFC 7239
- the version increment is required to allow dependants libraries/implementers to continue to operate safely with out version breakage (assuming version resolving occurs with npm)
- ref: RFC 7239 Parameters
coupled with support for less common schemas, named in association with vendor usage:
- cloudflare
- fastly
- microsoft
- nginx
- zscaler

TODO

test with all direct dependents & indirect dependents after updating to forwarded().addrs
- hapi-forward has no tests
- hapi-forwarded works
- proxy-addr works (with minor fix to its own test)
- express works
- at this point, confident the remainder will also work.
test node 0.6
test with real http server / client
parse Forwarded header elements using RFC 7230 token syntax
start pull requests for dependents with appropriate changes. (post merge & release)

This is essentially a complete re-write and expansion on the original functionality of `v0.1.0` - modular schema support, allows for easy future expansions - support `RFC7239` & `X-Forwarded-*` as primary trusted schemas - allows for implementers to choose schemas to process by passing an `options` object - this *is* a breaking api change, where `v0.1.0` returns an array of IPs, `v1.0.0` returns an object representing parameters of `RFC 7239` - the version increment is required to allow dependants libraries/implementers to continue to operate safely with out version breakage *(assuming version resolving occurs with `npm`) - [`RFC 7239` Parameters](http://tools.ietf.org/html/rfc7239#section-5) - coupled with support for less common schemas, named in association with vendor usage: - cloudflare - fastly - microsoft - nginx - zscaler

dougwilson · 2015-05-30T14:22:06Z

index.js

+    host: req.headers && req.headers.host ? req.headers.host : undefined,
+    port: req.connection.remotePort.toString(),
+    ports: [req.connection.remotePort.toString()],
+    proto: req.connection.encrypted ? 'https' : 'http'


Are you sure this actually works in Node.js 0.6 and the others? I only ask because the test is just a mock, so we have to verify manually if we cannot add a real test.

I was relying on the travis test to be honest, I managed to manually test all the way down to 0.8, then gave up on getting 0.6 installed on my machine. I'll get that working today and confirm.

They are only as honest as your tests are--since you are using mocks, then you are not actually testing anything (you have to test a real req for Travis CI to be of any help to you).

dougwilson · 2015-05-30T15:38:34Z

index.js

+    addrs: [req.connection.remoteAddress],
+    by: null,
+    host: req.headers && req.headers.host ? req.headers.host : undefined,
+    port: req.connection.remotePort.toString(),


On a disconnected socket, req.connection.remotePort is undefined and this will throw. This needs to be handled.

dougwilson · 2015-05-30T15:42:52Z

proxy-addr works (with minor fix to its own test)

it's actually a bug in your PR here that it relieved. the issue is that you are using a simple mock for req, but it does not fully test it. req.connection.remotePort can legitimately be undefined.

Here is the follow-though using Node.js 0.10.24:

Access to the remotePort getter at https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L581
Call this._getpeername() https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L561
Two places in the method return {} (https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L563 and https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L569).
Return the port property from this._getpeername(); the two cases from #3 will thus return undefined for remotePort

dougwilson · 2015-05-30T16:35:13Z

Another concern I have is why does this PR seem to be significantly slower? So this module does not have a benchmark suite, but proxy-addr does, so simply installing this PR into proxy-addr and changing the usage to forwarded(req).addrs results in the following difference:

#### forwarded 0.1.0

> node benchmark\matching.js

  1 test completed.
  2 tests completed.
  3 tests completed.
  4 tests completed.
  5 tests completed.
  6 tests completed.

  trust none     x 864,390 ops/sec ±1.62% (175 runs sampled)
  trust all      x 909,210 ops/sec ±1.55% (176 runs sampled)
  trust single   x  18,817 ops/sec ±1.37% (171 runs sampled)
  trust first    x 949,981 ops/sec ±1.30% (176 runs sampled)
  trust subnet   x  18,371 ops/sec ±1.55% (173 runs sampled)
  trust multiple x  18,834 ops/sec ±1.44% (174 runs sampled)

### forwarded PR

> node benchmark\matching.js

  1 test completed.
  2 tests completed.
  3 tests completed.
  4 tests completed.
  5 tests completed.
  6 tests completed.

  trust none     x 54,219 ops/sec ±1.54% (180 runs sampled)
  trust all      x 52,660 ops/sec ±1.75% (175 runs sampled)
  trust single   x 13,887 ops/sec ±1.55% (177 runs sampled)
  trust first    x 53,196 ops/sec ±1.36% (177 runs sampled)
  trust subnet   x 13,665 ops/sec ±1.39% (175 runs sampled)
  trust multiple x 13,615 ops/sec ±1.51% (180 runs sampled)

The matching benchmark is the closest one that measures just the forwarded module performance (mainly the "trust none", "trust all", and "trust first", which have almost no overhead other than simply calling forwarded).

As you can see above, based on those three benchmarks, performance dropped significantly. Since this module would theoretically be called on every request to a web server, it has a direct impact on the maximum req/sec a single Node.js instance can serve.

I understand, of course, that the current implement was very simply, but I did not expect to see it drop all the way down to 55k ops/sec. My main question is: can we do better? What are the bottlenecks in this code? Can v8 optimize functions, or does it keep bailing out for some reason?

This note on performance is not technically a blocker, though, but just something I noted and think one of us should at least glance into :)

dougwilson · 2015-05-30T16:39:07Z

lib/schemas/cloudflare.js

+function isSecure (req) {
+  try {
+    var cf = JSON.parse(req.headers['cf-visitor'])
+    return cf.scheme !== undefined


This is broken; according to https://support.cloudflare.com/hc/en-us/articles/200170536-How-do-I-redirect-HTTPS-traffic-with-Flexible-SSL-and-Apache- the value {"scheme":"http"} is valid and this routine would incorrectly mark the request as secure.

ahmadnassri · 2015-05-30T20:56:41Z

Another concern I have is why does this PR seem to be significantly slower?

that is concerning, I did not expect much change in the way of performance, so it never occurred to me to apply a benchmark, I'll take a deeper dive into this and find the bottleneck.

thanks for all the great feedback, I'll make the appropriate changes.

dougwilson · 2015-05-30T21:07:06Z

thanks for all the great feedback

Absolutely! I wanted to get your the feedback quickly :) I noticed the repo was last changed in 2014, so I just make a few mundane updates to master; it is up to you if you want to merge it in your branch, rebase your branch on top, or simply do nothing w.r.t those updates :)

Also, with the significant code, please feel free to also add your name for 2015 in the LICENSE file.

dougwilson · 2015-05-30T21:20:16Z

lib/schemas/rfc7239.js

+    return forwarded
+  }
+
+  splitMap(header, ELEMENT_SEPARATOR, function parseElement (el) {


It seems like this will fail to parse the header Forwarded: for="foo,bar" properly, opening a vector for security issues when relying on the header for access control.

from my reading of RFC 7239 the spec states that each for parameter represents a single address. unless I'm reading this wrong. can you link to or highlight the section that says otherwise?

That is irrelevant; an incoming header can contain any values, valid or not. The current implementation will do the wrong thing and this is not good, as the only reason you are parsing these headers is for logging, auditing, or access controls.

Just think of a standard proxy that does not modify anything existing in the Forwarded header and simply adds it's own data to it; it will pass-through the bogus value that will create false parses.

Basically, what I'm saying is that the current parse is actually not RFC 7239 compliant, as you are not correctly following the ABNF, which allows for , and ; to appear within a quoted-string and the defined ABNF for the header is as follows:

Forwarded = 1#forwarded-element forwarded-element = [ forwarded-pair ] *( ";" [ forwarded-pair ] ) forwarded-pair = token "=" value value = token / quoted-string token = 1*tchar tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE qdtext = HTAB / SP /%x21 / %x23-5B / %x5D-7E / obs-text obs-text = %x80-FF

The given header, Forwarded: for="foo,bar" is syntactically valid, yet this module will definitely parse it incorrectly.

Because the RFC allows extensions, perhaps your proxy decides it's going to include a custom field of the authenticated user's name. Uh, oh!

# this is in bash, so the "" is an escaped double quote (i.e. it's only a single double quote in the program) $ node -pe "require('forwarded')({connection:{remotePort:0},headers:{forwarded:'user=""bad,for=_whitelisted_ip"";for=_blacklisted_ip'}})" { addrs: [ 'whitelisted_ip', 'blacklisted_ip' ], by: null, host: undefined, port: '0', ports: [ '0' ], proto: 'http' }

Even though the header is 100% syntactically valid, with a single forwarded entry of {for: '_blacklisted_ip', user: 'bad,for=_whitelisted_ip'}, we end up thinking that the proxy got the request from _whitelisted_ip incorrectly due to the invalid parsing this module did :(

And I'm not even a security expert; I can tell you that if we don't fix the parsing, it will most likely end up as a critical security bug.

The current implementation will do the wrong thing

can you clarify what you mean by that?

the token syntax does not actually allow a comma to be present:

token = 1*tchar tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA ; any VCHAR, except delimiters

Delimiters are chosen from the set of US-ASCII visual characters not allowed in a token (DQUOTE and "(),/:;<=>?@[\]{}").

so, if anything, we should discard for="foo,bar" since its not a valid token

verification vs. validation

further, imo, the purpose of this library is to parse the headers, _not to verify values_, the resulting values should be filtered through a secondary layer if the implementer chooses to verify values, such as network addresses, host names, etc ...

which leaves us with the question: should we add validation here?

specifically: to validate addresses, hostnames, ports.

as a side, RFC 7239 node identifier syntax could also include unknown and obfuscated identifiers:

MUST have a leading underscore "_". Furthermore, it MUST also consist of only "ALPHA", "DIGIT", and the characters ".", "_", and "-".

which we can easily validate as well.

…r' property is an array

ahmadnassri · 2016-03-26T23:35:48Z

I've added some minor performance improvements and switched to using @lpinca forwarded-parse for better ANBF compatibility (though it seems to be very strict)

next steps:

fix travis issues for node 0.6, 0.8
add travis tests for recent node versions
increase code-coverage
add benchmark suite (in separate repo)
improve performance where possible
test alternative RFC 7239 parsing module and compare performance and spec compatibility

ahmadnassri · 2016-03-26T23:44:16Z

@dougwilson any further direction from you to the todo list above?

dougwilson · 2016-03-26T23:52:45Z

and switched to using @lpinca forwarded-parse for better ANBF compatibility

This part I don't understand. The entire purpose of this module is literally to do that operation, not depend on another module to do it. This module's name marched the name of the header for a reason...

dougwilson · 2016-03-26T23:59:26Z

Basically I think there is a huge misunderstanding of the current separation of concerns between the modules here.

This module is meant only to be concerned with parsing/formatting of the HTTP "Forwarded" header (and I suppose the X headers that it unified)

The proxy-addr module is designed to determine the "proxied address" of a request, which could include configuring the source of proxy headers.

This PR seems to be heavily mixing those two together into this single library to me.

dougwilson · 2016-03-27T00:02:01Z

A good litmus test in this organization is that if your code logically organizes into more than a single JavaScript file, that is multiple modules, not one. There are still some old, inherited modules that have a lib folder, but we are slowing working to get rid of that such that every module consists only of a single index.js file (for runtime code).

dougwilson · 2016-03-27T00:07:44Z

Oh, I know I keep making comments but I keep thinking of new things to say :) I have not yet looked at the new changes, or even really looked though to remind myself of the existing changes, but I think all the features and stuff in here are good, but more though from us (and perhaps the technical committee that now oversees this organization) needs to be done around which changes go into which modules, what new modules should be created, etc.

ahmadnassri · 2016-03-27T05:26:58Z

This part I don't understand. The entire purpose of this module is literally to do that operation, not depend on another module to do it. This module's name marched the name of the header for a reason...

I suppose I did not expand on my thoughts here... mea culpa, let me explain:

forwarded-parse seems to be doing a proper job of parsing the Forwarded header in accordance with the spec.
@lpinca has volunteered donating his work to this organization, which is great.
this original forwarded module focused on X-Forwarded-For header, and only concerned itself with the IP addresses, including the socket address for the request (though the latter part could be considered out of scope, if we're talking only about headers)
the RFC 7239 spec, introduces by, host, proto in addition to for as core parameters. (while also allowing for extensions)

so my thoughts up this point:

the parsing module if donated into the org can continue to live separately and focuses on the spec (while allowing extensions)
this module should focus on those core parameters, and any fallback that might be derived from defacto standards (such as X-Forwarded-* and X-Real-* headers) or vendor extensions used in popular cloud services.
the proxy-addr continues to focus on Proxy IP functionality, leveraging info from this module.

hence the separation and added dependency, mostly due to the extension parsing allowed by spec, as I'm a fan of separation of concerns.

ahmadnassri · 2016-03-27T05:35:23Z

Oh, I know I keep making comments but I keep thinking of new things to say :)

all good, same here :)

more though from us (and perhaps the technical committee that now oversees this organization) needs to be done around which changes go into which modules, what new modules should be created, etc.

as I was reviewing the original work (almost a year old now!) I started thinking of better way to break this down too, I'm no longer a fan of having the vendor headers included in the core of this module.

I believe now, it should only concern it self with the RFC 7239 Spec and X-Forwarded-* only (meaning we can merge the work from forwarded-parse directly here.)

and the vendor specs can be additional, separate modules, that can be passed to this one for additional processing (e.g. building the stack of IPs in a desired order across multiple specs)

not sure what a good pattern for the latter though, or how it would propagate across higher dependencies for user-level control? (e.g. proxy-addr > connect > framework)

a simple pattern can perhaps just be an array of module names, or directly references functions (that in turn can be loaded from modules) that provide a common object { for, by, host, proto } that can then be used to build a stacked result.

ahmadnassri · 2016-03-27T16:16:33Z

now that I'm back at a desktop keyboard, here's an example of how my previous thoughts might shape out:

minimal / default

const forwarded = require('forwarded') // included dependencies: RFC2739, X-Forwarded-*
let result = forwarded(request)

verbose

const forwarded = require('forwarded') 
const rfc2739 = require('forwarded-rfc2739')
const xforwarded = require('forwarded-x-forwarded')
const cloudflare = require('forwarded-cloudflare')
const fastly = require('forwarded-fastly')

// custom order
let options = {
  schemas: [cloudflare, rfc7239, fastly, xforwarded]
}

let result = forwarded(request, options)

dougwilson · 2016-03-27T19:39:43Z

I think proxy-addr should be receiving those options, and this module is the one that does the parsing for one of those "schemas": Forwarded.

ahmadnassri · 2016-03-27T19:51:53Z

would proxy-addr also deal with for, by, host, proto properties and data beyond just the "proxy address"?

dougwilson · 2016-03-27T20:11:23Z

There is no reason why it cannot do that, since those are all basically part of the "proxied address". Ideally one should at least get enough information to construct the URL for the request, and just the hostname/IP is not enough information, because it could be on a non-default port, could be using HTTP or HTTPS, and more.

Typically if people just wanted to parse their headers, they would directly depend on the module that just does the header parsing they need. The proxy-addr module was created to glue the pure parsing modules together with trust chains and expose some kind of interface that modules like Express or koa could consume for what they need.

I think the long-term plans is that proxy-addr will be the module that will get all the address information: host, scheme/protocol, port, etc.

It used to also parse things like X-Forwarded-For, but that was doing too much, so it was split out into this module, with the desire to get parsing of the real Forwarded header instead, though that header doesn't seem to have taken off very much (yet), so there hasn't been any requests to Express/Koa to even get alternative headers as part of the core parsing, though it would be good to have.

Because we keep going back and forth on this discussion, I still have not even had time to lok at your new code or even refresh myself on the existing content of this pull request, sorry!

ahmadnassri · 2016-03-27T20:23:49Z

Because we keep going back and forth on this discussion, I still have not even had time to lok at your new code or even refresh myself on the existing content of this pull request, sorry!

no worries, I think your point about proxy-addr is along the same lines of what I'm suggesting

I think we're debating semantics of where the logic sits.

The logic used here can be abstracted to proxy-addr and multiple modules / packages representing different "schems" can be created, then proxy-addr glues them together as you said.

TL;DR

I think we're saying the same thing. :)

I'll let you review the current code first, then we can discuss further.

https://greenkeeper.io/

Update mocha to version 2.5.1 🚀

https://greenkeeper.io/

Update mocha to version 2.5.3 🚀

https://greenkeeper.io/

Update istanbul to version 0.4.3 🚀

https://greenkeeper.io/

Update mocha to version 3.0.1 🚀

BREAKING CHANGE: This module no longer supports Node.js 0.10

👻😱 Node.js 0.10 is unmaintained 😱👻

https://greenkeeper.io/

Update dependencies to enable Greenkeeper 🌴

https://greenkeeper.io/

Update mocha to the latest version 🚀

ahmadnassri mentioned this pull request May 30, 2015

better detection of forwarded / proxied HTTP protocol expressjs/express#2666

Closed

Ahmad Nassri and others added 5 commits May 30, 2015 03:44

resolving node <=0.8 issue ("debug" target not found)

7416d5a

add node 0.12 to travis matrix

0b8bbc3

use node 0.6 safe versions of mocha + istanbul

e9badf7

remove dependency on util._extend (node 0.6 support)

5f36bf9

fix gh markdown rendering bug

3e90938

dougwilson reviewed May 30, 2015
View reviewed changes

dougwilson self-assigned this May 30, 2015

dougwilson added enhancement pr labels May 30, 2015

dougwilson reviewed May 30, 2015
View reviewed changes

Ahmad Nassri added 6 commits May 30, 2015 18:17

restore copyright notice

0962a47

throw error on invalid schema name

560dfa6

lowecase schema names + test

34ad6cc

debug should be under dependencies, rather than devDependencies

4e7e8f4

On a disconnected socket, req.connection.remotePort is undefined

2d937df

fix: address ordering in rfc2739 processor, fix: ensure processed 'fo…

1520eae

…r' property is an array

Ahmad Nassri added 3 commits March 26, 2016 19:01

feat(defaults): default schemas should be rfc7239 and x-forwarded

9e54f7a

feat(schemas): add weblogic proxy support

7aa4ffe

docs(readme): update docs with details on each schema

8cf1117

greenkeeperio-bot and others added 14 commits May 23, 2016 02:19

chore(package): update mocha to version 2.5.1

0dc53a6

https://greenkeeper.io/

Merge pull request #3 from ahmadnassri/greenkeeper-mocha-2.5.1

482dcec

Update mocha to version 2.5.1 🚀

chore(package): update mocha to version 2.5.3

104f0a5

https://greenkeeper.io/

Merge pull request #5 from ahmadnassri/greenkeeper-mocha-2.5.3

18e0181

Update mocha to version 2.5.3 🚀

chore(package): update istanbul to version 0.4.3

9712f4d

https://greenkeeper.io/

Merge pull request #1 from ahmadnassri/greenkeeper-istanbul-0.4.3

462d7e5

Update istanbul to version 0.4.3 🚀

chore(package): update mocha to version 3.0.1

4b97e71

https://greenkeeper.io/

Merge pull request #8 from ahmadnassri/greenkeeper-mocha-3.0.1

cb066f8

Update mocha to version 3.0.1 🚀

chore: drop support for Node.js 0.10

8980fbf

BREAKING CHANGE: This module no longer supports Node.js 0.10

Merge pull request #15 from ahmadnassri/greenkeeper/remove-node-0.10

cad75c3

👻😱 Node.js 0.10 is unmaintained 😱👻

chore(package): update dependencies

bbe1a0d

https://greenkeeper.io/

Merge pull request #16 from ahmadnassri/greenkeeper/update-all

43b9fae

Update dependencies to enable Greenkeeper 🌴

chore(package): update mocha to version 3.2.0

7f03ddf

https://greenkeeper.io/

Merge pull request #17 from ahmadnassri/greenkeeper/mocha-3.2.0

f7e83c8

Update mocha to the latest version 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0-rc.1 #1

v1.0.0-rc.1 #1

ahmadnassri commented May 30, 2015

dougwilson May 30, 2015

ahmadnassri May 30, 2015

dougwilson May 30, 2015

dougwilson May 30, 2015

dougwilson commented May 30, 2015

dougwilson commented May 30, 2015

dougwilson May 30, 2015

ahmadnassri commented May 30, 2015

dougwilson commented May 30, 2015

dougwilson May 30, 2015

ahmadnassri May 30, 2015

dougwilson May 30, 2015

dougwilson May 30, 2015

ahmadnassri May 31, 2015

ahmadnassri commented Mar 26, 2016

ahmadnassri commented Mar 26, 2016

dougwilson commented Mar 26, 2016

dougwilson commented Mar 26, 2016

dougwilson commented Mar 27, 2016

dougwilson commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

dougwilson commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

dougwilson commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

v1.0.0-rc.1 #1

Are you sure you want to change the base?

v1.0.0-rc.1 #1

Conversation

ahmadnassri commented May 30, 2015

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dougwilson commented May 30, 2015

dougwilson commented May 30, 2015

Choose a reason for hiding this comment

ahmadnassri commented May 30, 2015

dougwilson commented May 30, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

verification vs. validation

ahmadnassri commented Mar 26, 2016

ahmadnassri commented Mar 26, 2016

dougwilson commented Mar 26, 2016

dougwilson commented Mar 26, 2016

dougwilson commented Mar 27, 2016

dougwilson commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

minimal / default

verbose

dougwilson commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

dougwilson commented Mar 27, 2016

ahmadnassri commented Mar 27, 2016

TL;DR