Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HTTP_PROXY and NO_PROXY env var #1650

Closed
shttrstckak opened this issue Sep 14, 2022 · 35 comments · Fixed by #2994
Closed

Support HTTP_PROXY and NO_PROXY env var #1650

shttrstckak opened this issue Sep 14, 2022 · 35 comments · Fixed by #2994
Labels
enhancement New feature or request Status: help-wanted This issue/pr is open for contributions

Comments

@shttrstckak
Copy link

shttrstckak commented Sep 14, 2022

A somewhat standard env var used in proxying applications is the HTTP_PROXY and NO_PROXY variables.

Undici should support automatically setting a global dispatcher if HTTP_PROXY env var is used. Currently Undici supports using setGlobalDispatcher() method to set a specific proxy endpoint.

However there is a sister environment variable to HTTP_PROXY that supports contextually non-proxying domains: NO_PROXY

NO_PROXY requires a list of domain names to be NOT proxied and should make a request normally.

If any of the above cannot be supported, then Undici (and ultimately Node's Fetch-Undici) should allow per-request proxy settings or function hook to allow anyone to develop the same effect as HTTP_PROXY and NO_PROXY env var in their application that uses Undici/Node Fetch.

Thoughts?

@shttrstckak shttrstckak added the enhancement New feature or request label Sep 14, 2022
@mcollina
Copy link
Member

I don't know if we should support HTTP_PROXY. How is widespread the use of HTTP_PROXY in the ecosystem.

@shttrstckak
Copy link
Author

shttrstckak commented Sep 14, 2022

The "ecosystem" as in the JS ecosystem or the larger software ecosystem?

For the latter, HTTP_PROXY is used everywhere.

For JS it's a quite heavily used I imagine for the same reasons. Popular frameworks tend to implement it.

Node.http module already supports it out of the box.

Got, Axios, Node-fetch (npm package) require a module global-agent to utilize. Global-agent does not appear to have any plans to support Undici.

Popular JS based testing framework Cypress supports it.

On the other side, browsers tend to support some sort of proxy configuration.

Overall I think if there is no interest in supporting HTTP_PROXY in Undici, then it would be better to support to being able to implement these env vars. HTTP_PROXY can already be supported easily however NO_PROXY doesn't seem currently possible.

@silverwind
Copy link
Contributor

silverwind commented Sep 15, 2022

Use of these environment variables is in very widespread, see number of upvotes on nodejs/node#8381. Many high-security environments restrict outbound internet access via a allowlist of domains configured on the proxy.

I do maintain a fetch wrapper specifically for the purpose of automatic proxy discovery from environment. The crucial part is that if such proxy discovery is performed, there must be an opt-out option because sometimes one wants to fetch without proxy, irregardless of environment variables.

Parsing of these env vars is tricky, I recommend https://github.com/Rob--W/proxy-from-env for it.

@mcollina
Copy link
Member

Then let's do it.

@ronag ronag added the Status: help-wanted This issue/pr is open for contributions label Sep 16, 2022
@shttrstckak
Copy link
Author

shttrstckak commented Oct 13, 2022

I offer myself as sacrifice and will probably get started this weekend.

Does anyone have any good starters on developing with this repository? For example, What would be a good place in code to start investigating for this feature?

@mcollina
Copy link
Member

mcollina commented Oct 14, 2022

The check for automatically supporting them should likely be in https://github.com/nodejs/undici/blob/main/lib/global.js.

@silverwind
Copy link
Contributor

silverwind commented Oct 14, 2022

We probably should cleanly re-implement https://github.com/Rob--W/proxy-from-env because that module includes support for npm_config_* variables which I think has no place in module that is supposed to only support the standard environment variables. See discussion in https://github.com/Rob--W/proxy-from-env/issues/13.

@silverwind
Copy link
Contributor

silverwind commented Oct 17, 2022

Also read https://about.gitlab.com/blog/2021/01/27/we-need-to-talk-no-proxy/. It gives an overview of the different implementations and the last two paragraphs give good advice on how to implement.

I plan to create a module that exactly implements such a parser.

@sla100
Copy link

sla100 commented Dec 7, 2022

There is my PoC:

import {URL} from 'node:url';
import {getGlobalDispatcher, setGlobalDispatcher, Dispatcher, ProxyAgent} from 'undici';

const proxyAgents = Object.fromEntries(
  ['http', 'https'].map( protocol => {
    const uri = process.env[ `${protocol}_proxy` ];
    if ( uri ){
      return [`${protocol}:`, new ProxyAgent(uri)];
    }
    return [];
  })
);

const noProxyRules = (process.env['no_proxy'] ?? '').split(',').map( rule => rule.trim() );

const defaultDispatcher = getGlobalDispatcher();

setGlobalDispatcher(new class extends Dispatcher {

  dispatch(options, handler) {
    if (options.origin ){
      const {host, protocol} = typeof options.origin === 'string' ? new URL(options.origin) : options.origin;
      if (! noProxyRules.some( rule => rule.startsWith('.') ? host.endsWith(rule) : host === rule )){
        const proxyAgent = proxyAgents[ protocol ];
        if ( proxyAgent ) {
          proxyAgent.dispatch( options, handler );
        }
      }
    }
    return defaultDispatcher.dispatch(options, handler);
  }

});

I think that this feature should be treated as plugin and loaded by node "preload" argument:

node -r undici/register-http-proxy ./script.js

@silverwind
Copy link
Contributor

silverwind commented Dec 7, 2022

Seems like a start, but it's definitely missing some features like wildcard (*) support in no_proxy.

I think that this feature should be treated as plugin and loaded by node "preload"

I disagree. Numerous environments support these variables by default, like Deno, Python, Golang and many more. It should of course be considered a breaking change, but I don't see why it should not be the default. Some kind of opt-out would be good to have, but I guess users could always overwrite the env with no_proxy=* I guess.

Maybe the implementation here could be "inspired" by Deno's 😉.

@jaecktec
Copy link

jaecktec commented Dec 12, 2022

@sla100 my 2 cents on this line:

 return [`${protocol}:`, new ProxyAgent(uri)];
 ...
 const proxyAgent = proxyAgents[ protocol ];

you can also do http over https, that's why most frameworks specify the https option as optional. Meaning that if you haven't specified https_proxy it would do HTTP over the http-proxy :)

that's why I propose smth like this:

const getProxyAgent = (proto: 'http' | 'https') => {
  let agent: ProxyAgent | undefined;
  if (proto === 'http') {
    agent = process.env['https_proxy'] ? new ProxyAgent(process.env['https_proxy']) : undefined;
  }
  if (!agent) {
    agent = process.env['http_proxy'] ? new ProxyAgent(process.env['http_proxy']) : undefined;
  }
  return agent;
};

const noProxyRules = (process.env['no_proxy'] ?? '').split(',').map(rule => rule.trim());

const defaultDispatcher = getGlobalDispatcher();

setGlobalDispatcher(new class extends Dispatcher {

  dispatch(options, handler) {
    if (options.origin) {
      const {
        host,
        protocol,
      } = typeof options.origin === 'string' ? new URL(options.origin) : options.origin;
      if (!noProxyRules.some(rule => rule.startsWith('.') ? host.endsWith(rule) : host === rule)) {
        const proxyAgent = getProxyAgent(protocol);
        if (proxyAgent) {
          proxyAgent.dispatch(options, handler);
        }
      }
    }
    return defaultDispatcher.dispatch(options, handler);
  }
});

@mcollina
Copy link
Member

I'd be ok to support:

import { setGlobalDispatcher, EnvHttpProxyAgent } from 'undici'

setGlobalDispatcher(new EnvHttpProxyAgent())

@mcollina
Copy link
Member

Here is an additional data point in favor of this: Deno supports it by default.

https://deno.land/manual@v1.29.4/getting_started/setup_your_environment#environment-variables

We should check if Python, Ruby and other runtimes supports this by default.

@silverwind
Copy link
Contributor

silverwind commented Jan 18, 2023

As per nodejs/node#8381, the following environments support these variables by default:

  • Python
  • Go
  • Ruby
  • R

curl and wget also use them by default and I'm sure there are many more examples of applications supporting them.

@mcollina
Copy link
Member

@silverwind would you like to send a PR to make this the default here? I would just recommend we create a new Agent and set it as default.

@silverwind
Copy link
Contributor

Possibly, but first I want to create the parser for the env vars. I have a WIP repo for that, will update here once it's in a usable state. I guess we could then incorporate this parser code here, e.g. without a dependency.

@silverwind
Copy link
Contributor

Just adding another data point, bun also now supports these variables out of the box.

@mcollina
Copy link
Member

I'll assign this to you, ok?

@silverwind
Copy link
Contributor

Okay

@silverwind
Copy link
Contributor

silverwind commented Jan 20, 2023

Is it okay to cache ProxyAgents for each environment variable in-memory? We shouldn't instantiate new agents every request, and I was thinking of doing a LRU cache that holds the ProxyAgent instances, keyed on the proxy server URL.

This above approach has the downside that the cache will become stale if the environment variables change during process runtime, so a completely robust solution could likely not cache the agents at all as long as the env vars are mutable.

In https://github.com/silverwind/fetch-enhanced, I had exposed a method to clear the cache, but that seems like a it too much API surface.

@silverwind
Copy link
Contributor

silverwind commented Jan 20, 2023

Actually, thinking again, if we have a agent cache like

{
  "https://proxy1:3128": ProxyAgent,
  "https://proxy2:3128": ProxyAgent,
}

It should not be neccessary to clear this cache ever because even with changing environment variables, the agents will still be valid. The only issue is that when env vars change, unused ProxyAgent instances will not be cleaned by the garbage collector as there are still references to it, but I take that as acceptable.

@mcollina
Copy link
Member

That's acceptable.

@silverwind
Copy link
Contributor

Yeah, in a general use case, there would only be two cache entries, one for HTTP and one for HTTPS. If a user wants more dynamic behaviour, they can just deal with ProxyAgent themselves.

JamieMagee added a commit to renovatebot/osv-offline that referenced this issue Mar 27, 2023
In v6.1.0 Octokit started using Node.js's built-in `fetch`[^1] (provided by [undici][1]. Unfortunately, it is not 100% compatible with `node-fetch`, and notably it doesn't support `HTTP_PROXY` environment variables[^2].

This change switches `osv-offline` to explicitly use `node-fetch`.

Closes #252

[1]: https://github.com/nodejs/undici

[^1]: octokit/request.js@d000a0a
[^2]: nodejs/undici#1650
JamieMagee added a commit to renovatebot/osv-offline that referenced this issue Mar 27, 2023
In v6.1.0 Octokit started using Node.js's built-in `fetch`[^1] (provided by [undici][1]. Unfortunately, it is not 100% compatible with `node-fetch`, and notably it doesn't support `HTTP_PROXY` environment variables[^2].

This change switches `osv-offline` to explicitly use `node-fetch`.

Closes #252

[1]: https://github.com/nodejs/undici

[^1]: octokit/request.js@d000a0a
[^2]: nodejs/undici#1650
@aarlaud
Copy link

aarlaud commented Sep 12, 2023

Hi there, what's the latest on this capability? Is there a branch somewhere with this work in progress?

thanks for all your work!

@ShenHongFei
Copy link
Contributor

ShenHongFei commented Sep 12, 2023

@aarlaud ProxyAgent can be configured for each request like this
The disadvantage is that you need to install and introduce the undici package. This package is large and slow to load, taking 100ms (first time).

const { ProxyAgent } = await import ('undici')

let proxy_agents: Record<string, ProxyAgent> = { }

let proxy = 'http://localhost:10080'

let options: RequestInit = {
    dispatcher: (() => {
        if (proxy)
            return proxy_agents[proxy] ??= new ProxyAgent({ uri: proxy })
    })(),

@aarlaud
Copy link

aarlaud commented Sep 12, 2023

thanks. In my current case, I'm more interested in a global, which I suppose is more this way, like @zicklag mention above roughly (https://undici.nodejs.org/#/docs/api/ProxyAgent?id=example-basic-proxy-request-with-global-agent-dispatcher)

I was hoping to avoid dealing with HTTP_PROXY vs HTTPS_PROXY, auth, and NO_PROXY, maybe helping to finish something around supporting these, but i'm not clear whether there is something in flight at this point?

@metcoder95
Copy link
Member

I think there's nothing in-progress already. But any contribution is welcome. You can open a fresh PR and kick-off the discussion from there 🙂

@zicklag
Copy link

zicklag commented Jan 13, 2024

Here's another hackish workaround that will do just enough honor to the no_proxy environment variable for my latest use-case: https://gist.github.com/zicklag/1bb50db6c5138de347c224fda14286da. It's not properly parsing the variable, but it does make sure that all requests except those that have an origin with one of the no_proxy entries as a substring, will go through the proxy globally.

mcollina pushed a commit that referenced this issue Apr 19, 2024
* feat: added EnvHttpProxyAgent

Closes #1650

* refactor(env-http-proxy-agent): parse NO_PROXY in constructor

* don't use EnvHttpProxyAgent by default

* refactor: use for loop when checking NO_PROXY entries

* feat(env-http-proxy-agent): added httpProxy, httpsProxy & noProxy options

* feat(env-http-proxy-agent): handle changes to NO_PROXY

* docs: added types for EnvHttpProxyAgent

* test: resolve windows issues, mark experimental, update doc

* docs: fix typo

* docs: fetch for EnvHttpProxyAgent
@decyjphr
Copy link

I solved this by using https://www.npmjs.com/package/proxy-from-env.

Using this, I can custom build a fetch for example:

import { ProxyAgent, fetch as undiciFetch } from "undici"
import { getProxyForUrl } from "proxy-from-env";

...

function getProxiedFetch(url: string) {
  const proxyUrl = getProxyForUrl(url);
  if (proxyUrl) {
    LOG.debug("Setting up proxy agent for ", url);
    return (url: any, options: any) => {
      return undiciFetch(url, {
        ...options,
        dispatcher: new ProxyAgent(proxyUrl)
      })
    }
  }
  return null; //Go with the default fetch

}

@zicklag
Copy link

zicklag commented Aug 31, 2024

Is this fixed in some version of undici? If so, what would I have to install to make use of it.

@metcoder95
Copy link
Member

Since https://github.com/nodejs/undici/releases/tag/v6.14.1 you can use EnvHttpProxyAgent.

It seems is missed from docs tho...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Status: help-wanted This issue/pr is open for contributions
Projects
None yet
Development

Successfully merging a pull request may close this issue.