Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: simplify chain #156

Merged
merged 6 commits into from
Sep 14, 2021
Merged

wip: simplify chain #156

merged 6 commits into from
Sep 14, 2021

Conversation

szmarczak
Copy link
Contributor

@szmarczak szmarczak commented Sep 13, 2021

this PR

  • simplify chain
    • remove handler base

wip: simplify forward

  • remove babel
  • simplify forward handler
  • added https support
    • test
  • require node.js 14
  • skipped 2 tests (will re-enable once proxy-chain is fully rewritten)

wip: simplify custom response

  • simplify custom response
  • use shared eslint config

wip: simplify direct

  • simplify direct

TODO (those will be separate PRs)

  • simplify server
  • simplify tools
  • use typescript
  • turn the 5 parameters of direct and chain into object
  • more descriptive errors
  • enable coverage

}

this.log(handlerOpts.id, 'Using HandlerTunnelDirect');
return direct(request, socket, head);
return direct(request, socket, head, handlerOpts, this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there are just the 3 parameters on direct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it has 4 params, and you are feeding 5 (and server is 5th, not 4th), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this one :) Now it's 5, handlerOpts won't be used but I prefer to keep this one so it looks the same like chain.

@szmarczak
Copy link
Contributor Author

szmarczak commented Sep 13, 2021

https://github.com/apify/proxy-chain/runs/3586133160

  1) tcp_tunnel.createTunnel
       creates tunnel that is able to transfer data:
     Uncaught Error: listen EADDRINUSE: address already in use :::38102
      at Server.setupListenHandle [as _listen2] (node:net:1319:16)
      at listenInCluster (node:net:1367:12)
      at Server.listen (node:net:1454:7)
      at /home/runner/work/proxy-chain/proxy-chain/test/tcp_tunnel.js:19:12
      at new Promise (<anonymous>)
      at serverListen (test/tcp_tunnel.js:18:40)
      at /home/runner/work/proxy-chain/proxy-chain/test/tcp_tunnel.js:154:24
      at tryCatcher (node_modules/bluebird/js/main/util.js:26:23)
      at Promise._settlePromiseFromHandler (node_modules/bluebird/js/main/promise.js:510:31)
      at Promise._settlePromiseAt (node_modules/bluebird/js/main/promise.js:584:18)
      at Async._drainQueue (node_modules/bluebird/js/main/async.js:128:12)
      at Async._drainQueues (node_modules/bluebird/js/main/async.js:133:10)
      at Immediate.Async.drainQueues (node_modules/bluebird/js/main/async.js:15:14)
      at processImmediate (node:internal/timers:464:21)

@mnmkng I just got the error @rubydev saw before, looks like Node.js selected a protocol that was already in use?

@szmarczak szmarczak requested a review from B4nan September 13, 2021 12:33
Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea for future PRs - maybe we should turn the 5 parameters into object

@mnmkng
Copy link
Member

mnmkng commented Sep 13, 2021

https://github.com/apify/proxy-chain/runs/3586133160


  1) tcp_tunnel.createTunnel

       creates tunnel that is able to transfer data:

     Uncaught Error: listen EADDRINUSE: address already in use :::38102

      at Server.setupListenHandle [as _listen2] (node:net:1319:16)

      at listenInCluster (node:net:1367:12)

      at Server.listen (node:net:1454:7)

      at /home/runner/work/proxy-chain/proxy-chain/test/tcp_tunnel.js:19:12

      at new Promise (<anonymous>)

      at serverListen (test/tcp_tunnel.js:18:40)

      at /home/runner/work/proxy-chain/proxy-chain/test/tcp_tunnel.js:154:24

      at tryCatcher (node_modules/bluebird/js/main/util.js:26:23)

      at Promise._settlePromiseFromHandler (node_modules/bluebird/js/main/promise.js:510:31)

      at Promise._settlePromiseAt (node_modules/bluebird/js/main/promise.js:584:18)

      at Async._drainQueue (node_modules/bluebird/js/main/async.js:128:12)

      at Async._drainQueues (node_modules/bluebird/js/main/async.js:133:10)

      at Immediate.Async.drainQueues (node_modules/bluebird/js/main/async.js:15:14)

      at processImmediate (node:internal/timers:464:21)

@mnmkng I just got the error @rubydev saw before, looks like Node.js selected a protocol that was already in use?

@szmarczak That's weird. Could there be race condition somewhere?

@jancurn
Copy link
Member

jancurn commented Sep 13, 2021

That's weird. Could there be race condition somewhere?

In the past we used to pick randomized ports, which was a race condition. Now it should be done by operating system, hopefully without races... so this is strange, unless this is an error from an older version?

@jancurn
Copy link
Member

jancurn commented Sep 13, 2021

Looking at all these PRs, I'm increasingly more worried that this big rewrite can potentially bring a lot of problems to this module that we rely on so heavily. There were a lot of iterations and a lot of real-world testing that went into this module to make sure it was stable and performing well (in particular the socket lifecycle), and now I feel we might be back to square one and might have to go through all the (hard-to-find) errors again, potentially with a big business impact (think of all potential problems a buggy Apify Proxy can cause to our customers).

I better stop watching these PRs and I truly hope you guys know what you're doing. Please make sure to write really good comments for the functions, so that other people can understand what's going on.

};

if (proxy.username || proxy.password) {
const auth = `${proxy.username}:${proxy.password}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is either username or password is null? Then the auth string will contain "null".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In WHATWG URL it's never null, it's an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With TypeScript we won't have to worry can this be null? :)

@szmarczak
Copy link
Contributor Author

szmarczak commented Sep 13, 2021

Could there be race condition somewhere?

The tests still use

const findFreePort = () => {
// Let 'min' be a random value in the first half of the PORT_FROM-PORT_TO range,
// to reduce a chance of collision if other ProxyChain is started at the same time.
const half = Math.floor((PORT_SELECTION_CONFIG.TO - PORT_SELECTION_CONFIG.FROM) / 2);
const opts = {
min: PORT_SELECTION_CONFIG.FROM + Math.floor(Math.random() * half),
max: PORT_SELECTION_CONFIG.TO,
retrieve: 1,
};
return portastic.find(opts)
.then((ports) => {
if (ports.length < 1) throw new Error(`There are no more free ports in range from ${PORT_SELECTION_CONFIG.FROM} to ${PORT_SELECTION_CONFIG.TO}`); // eslint-disable-line max-len
return ports[0];
});
};

so I guess it generated an in-use port. I'm starting to think @rubydev's error is unrelated to this one.

now I feel we might be back to square one and might have to go through all the (hard-to-find) errors again

Currently all the tests pass (except the two ones with stats, which will be fixed later). The point of the rewrite is so we don't have to mutate the socket if there's no need to. Only operate on objects we really have to. So if we receive a normal HTTP/1.1 request, let's operate only on request and response objects, no need to tinker with socket. This way we'll prevent event emitter memory leaks and the code will be more readable.

There were a lot of iterations and a lot of real-world testing that went into this module to make sure it was stable and performing well

We can gradually roll out the rewrite. Initially e.g. 1% of all requests can go through the rewrite, the next day it can be 5%, the next day it can be 10% etc.

in particular the socket lifecycle

What about it? As I see it, it remains the same. The socket gets closed only when a user does so or there's an error on any side. Am I missing something? If you have any questions, I'm happy to answer them.

I better stop watching these PRs and I truly hope you guys know what you're doing.

I'd prefer if you reviewed the code, you are the creator, but I'm not insisting. This can wait, but needs to be done at some point. I'm trying to keep the code as simple as possible.

@szmarczak
Copy link
Contributor Author

I've added more descriptive errors on my todo list. So if an error gets thrown, we'll know all the details, upstream proxy, status code, request options etc.

@jancurn
Copy link
Member

jancurn commented Sep 13, 2021

Thanks @szmarczak it was just a bit of an old man's afraidness of change that was speaking :) Of course, I'll have a look at the PRs, I'll do a deeper review of the big PR once it's ready. I'm confident your changes will make the module better and cleaner. Just pls comment the functions well, to make it easier to review.

Regarding the socket life-cycle, there were some rare strange races that "should never happen" but indeed happened, which led to code blocks like:

// See https://github.com/apify/proxy-chain/issues/63
if (this.isClosed) return;

These come from battle-testing, so it might happen they will reoccur. But let's see...

The gradual roll-out is a good idea, once this is ready pls chat with the Platform team how to do it.

@szmarczak
Copy link
Contributor Author

it was just a bit of an old man's afraidness of change that was speaking :)

I understand the worry :)

I'll have a look at the PRs, I'll do a deeper review of the big PR once it's ready.

Thanks a lot ❤️ Two heads always better than one :D

Just pls comment the functions well, to make it easier to review.

👍🏼

which led to code blocks like

This is also what I'm trying to prevent. HandlerBase created a response on top of a detached socket (from CONNECT method) from the native http module:

if (!this.srcResponse) {
this.srcResponse = new http.ServerResponse(srcRequest);
this.srcResponse.shouldKeepAlive = false;
this.srcResponse.chunkedEncoding = false;
this.srcResponse.useChunkedEncodingByDefault = false;
this.srcResponse.assignSocket(this.srcSocket);
}

therefore there were more critical points that needed to be handled.

@szmarczak
Copy link
Contributor Author

szmarczak commented Sep 13, 2021

// See https://github.com/apify/proxy-chain/issues/63
if (this.isClosed) return;

There was a race condition. Before the end socket gets connected the client one to the proxy may get destroyed (e.g. via disconnect). If I understand this right, the this.isClosed checks prevent this.


I think this PR is good to go. Ready to merge when you are :)

@szmarczak szmarczak merged commit fb2c412 into next Sep 14, 2021
@szmarczak szmarczak deleted the simplify-chain branch September 14, 2021 12:18
@szmarczak szmarczak mentioned this pull request Sep 27, 2021
30 tasks
@szmarczak szmarczak mentioned this pull request Oct 6, 2021
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants