Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Replace client side HTTP requests using the hostname that is in the address bar #2004

Closed
wants to merge 1 commit into from

Conversation

l0rd
Copy link
Contributor

@l0rd l0rd commented Aug 1, 2016

What does this PR do?

Che used the wsagent internal IP address to send HTTP requests from the browser to the wsagent itself. This can be a problem when the browser cannot ping this IP address. In this PR we have fixed the HTTP requests to the wsagent replacing this internal IP address with the hostname from the browser address bar.

What issues does this PR fix or reference?

This PR fixes #1644 and #1925

Previous Behavior

HTTP requests from the browser to the wsagent used the wsagent internal IP address

New Behavior

HTTP requests from the browser to the wsagent use the hostname in the address bar

Tests written?

Yes

Docs requirements?

Users don't need to set some options anymore:

  • the env variable CHE_HOST_IP in the launcher is useless
  • the option --remote: in che.sh is useless

Signed-off-by: Mario Loriedo mloriedo@redhat.com

@codenvy-ci
Copy link

Can one of the admins verify this patch?

@garagatyi
Copy link

Hi Mario! I'm concerned on how will it work if I run che on server1.mycompany.com and run docker on server2.mycompany.com.
Can you elaborate?

@benoitf
Copy link
Contributor

benoitf commented Aug 1, 2016

the fixHostName shouldn't be duplicated 4 times in different .js files, you should introduce an angular service instance that should handle this at a single place

example of a service: https://github.com/eclipse/che/blob/master/dashboard/src/components/injector/che-ui-elements-injector.service.js

as there is REGEXP I would like to see a unit test in the javascript as well
example of unit test with different urls
https://github.com/eclipse/che/blob/master/dashboard/src/components/validator/git-url-validator.spec.js

also do we need a regexp to replace a location ?

@l0rd
Copy link
Contributor Author

l0rd commented Aug 1, 2016

@garagatyi To support this kind of scenario (che server and wsagent on different hosts) you could use a reverse proxy as traefik. This has a couple of benefits:

  • flexible multihost configuration: don't need to know the IP of the wsagent when starting the server, don't need to set the --net host option when running the server and it would be possible to have running wsagents on different hosts
  • works well in scenarios where the browser just can't ping the wsagent IP (Docker for Mac and Windows, running inside VMs, firewall etc..)

@benoitf thank you for these feedbacks. I'll do the modifications. I haven't found anything better than a regexp to replace the hostname of an URL. Do you have something in mind?

@benoitf
Copy link
Contributor

benoitf commented Aug 1, 2016

@l0rd https://jsfiddle.net/2tokq7m9/ (I didn't import angular in this fiddle but it's almost the same way)

@garagatyi
Copy link

@l0rd But if you merge this PR we will broke Che for those users who already use Che in a such way without proxy.
Do we want to break Che for some set of existing users to get better experience for some set of other users?
@TylerJewell Please comment on that.

@benoitf
Copy link
Contributor

benoitf commented Aug 1, 2016

I'm not sure it will work right now on codenvy setup (with different machines that provides docker runners (where agents are running))

@l0rd
Copy link
Contributor Author

l0rd commented Aug 1, 2016

Ok I wasn't aware of that scenario. How does it work? How can you share folders between servers and agents if they live in remote machines?

To support it we could introduce a new property (che.workspace.use.internal.IP?). If this property is set to true fixHostName returns the original IP, otherwise it does the replacement.

@garagatyi
Copy link

It is possible in several cases:

  1. Che where docker is remote to the API instance. FS of workspaces is placed where docker is placed, so there is no need share folders between servers.
  2. Che that uses network FS, e.g. GlusterFS
  3. Other implementations/extensions of Che that uses it's own code/infrastructure to share folders, e.g. Codenvy.

To support it we could introduce a new property...

Properties are placed on server-side, so implementation on client side won't help. If you want to implement something like that you should do that on server-side.

@benoitf
Copy link
Contributor

benoitf commented Aug 1, 2016

agents URLs are retrieved by the client by getting workspace runtime configuration so updating the value returned by the workspace master would work on all clients instead of patching each client

@TylerJewell
Copy link

@l0rd @benoitf @garagatyi - I unfortunately am not able to understand the implications well enough to be able to comment if the trade off is acceptable for end users.

Generally, yes, I agree that any sort of details that return hostnames or IP addresses for specific locations of a workspace should be done on the server-side. This is necessary so that we can support multi-IDE scenarios where different IDEs can interrogate either the workspace or the Che server to find out the necessary configuration.

@l0rd - can you help by providing a mini-spec in the comments here on the various configurations that a user might need to do with Che to support the use case you are solving and those stated by @garagatyi?

@codenvy-ci
Copy link

Can one of the admins verify this patch?

…ddress bar

Signed-off-by: Mario Loriedo <mloriedo@redhat.com>
@l0rd
Copy link
Contributor Author

l0rd commented Aug 12, 2016

@benoitf I've updated this PR with your feedbacks on the UD side. Please review.

@garagatyi I've tested the configuration with wsmaster and wsagent(s) on remote hosts using traefik. It worked but that wasn't as easy as I thought. Should work better once that traefik issue get addressed. I will share the config here and we will discuss how to proceed.

Meanwhile I'll rename this PR as [WIP]

@l0rd l0rd changed the title Replace client side HTTP requests using the hostname that is in the address bar [WIP] Replace client side HTTP requests using the hostname that is in the address bar Aug 12, 2016
@TylerJewell
Copy link

@l0rd - can you post the configuration that you are using for testing? Is the basic setup that che server and workspaces are on a remote VM, and the browser is on a different IP, and this configuration will now work without having to set CHE_HOST_IP or set --remote:ip?

If so - can you also post your traefik configuration? Will this traefik configuration allow for the possibility of routing all browser -> ws communications through a single port, with traefik redirecting traffic back to the right ephemeral port within the ws?

Thanks for working on this - very important - improvement.

@l0rd
Copy link
Contributor Author

l0rd commented Aug 13, 2016

@TylerJewell Here are the details of how I've used Traefik to support the scenario where the Che Server and the Docker host are in different remote servers.

The setup

I've built my branch (ws-url-fix) and I've run Che using the following command bin/che.sh run --remote:<ip> -s:uid.

On MacOS with Docker toolbox, running Che this way starts the server as java web app and the workspaces as containers in the guest OS (boot2docker).

Server and workspaces have different IPs and that's the scenario I needed to test.

image

Testing

  • Run Che server 'bin/run.sh --remote: -s:uid`
  • Open Che UD at http://localhost:8080
  • Create a new workspace and confirm that:
  • Start traefik using the following configuration file (./traefik -c traefik.toml ). Now websockets can access the workspace container and the IDE is fully functional.
# 192.168.99.100 is the VM IP 
# 32775 to 32781 are the ports exposed by the workspace

logLevel = "DEBUG"
defaultEntryPoints = ["http", "https"]

# Enable web console
[web]
address = ":8081"

# Entrypoints
[entryPoints]
   [entryPoints.wsagent1]
   address = ":32775"
   [entryPoints.wsagent2]
   address = ":32776"
   [entryPoints.wsagent3]
   address = ":32777"
   [entryPoints.wsagent4]
   address = ":32778"
   [entryPoints.wsagent5]
   address = ":32779"
   [entryPoints.wsagent6]
   address = ":32780"
   [entryPoints.wsagent7]
   address = ":32781"
   [entryPoints.wsagent8]
   address = ":32782"
   [entryPoints.wsagent9]
   address = ":32783"


[file]
# Frontends
[frontends]
    [frontends.wsagent1]
    backend = "backend1"
    passHostHeader = true
    entrypoints = ["wsagent1"]
        [frontends.wsagent1.routes.defaultroute]
        rule = "Host:localhost,{subdomain:[a-z]+}.localhost"

    [frontends.wsagent2]
    backend = "backend2"
    passHostHeader = true
    entrypoints = ["wsagent2"]
        [frontends.wsagent2.routes.defaultroute]
        rule = "Host:localhost,{subdomain:[a-z]+}.localhost"

    [frontends.wsagent3]
    backend = "backend3"
    passHostHeader = true
    entrypoints = ["wsagent3"]
        [frontends.wsagent3.routes.defaultroute]
        rule = "Host:localhost,{subdomain:[a-z]+}.localhost"

    [frontends.wsagent4]
    backend = "backend4"
    passHostHeader = true
    entrypoints = ["wsagent4"]
        [frontends.wsagent4.routes.defaultroute]
        rule = "Host:localhost,{subdomain:[a-z]+}.localhost"

    [frontends.wsagent5]
    backend = "backend5"
    passHostHeader = true
    entrypoints = ["wsagent5"]
        [frontends.wsagent5.routes.defaultroute]
        rule = "Host:localhost,{subdomain:[a-z]+}.localhost"

    [frontends.wsagent6]
    backend = "backend6"
    passHostHeader = true
    entrypoints = ["wsagent6"]
        [frontends.wsagent6.routes.defaultroute]
        rule = "Host:localhost,{subdomain:[a-z]+}.localhost"

    [frontends.wsagent7]
    backend = "backend7"
    passHostHeader = true
    entrypoints = ["wsagent7"]
        [frontends.wsagent7.routes.defaultroute]
        rule = "Host:localhost,{subdomain:[a-z]+}.localhost"

[backends]
  [backends.backend1]
    [backends.backend1.servers.server1]
    url = "http://192.168.99.100:32775"
  [backends.backend2]
    [backends.backend2.servers.server2]
    url = "http://192.168.99.100:32776"
  [backends.backend3]
    [backends.backend3.servers.server3]
    url = "http://192.168.99.100:32777"
  [backends.backend4]
    [backends.backend4.servers.server4]
    url = "http://192.168.99.100:32778"
  [backends.backend5]
    [backends.backend5.servers.server5]
    url = "http://192.168.99.100:32779"
  [backends.backend6]
    [backends.backend6.servers.server6]
    url = "http://192.168.99.100:32780"
  [backends.backend7]
    [backends.backend7.servers.server7]
    url = "http://192.168.99.100:32781"

Considerations

  • Next step could be to extend the che launcher with a new command that generates the configuration and start Traefik. But that's not a long term solution: Traefik is able to auto-configure itself listening to Docker events and should be used this way.
  • A long term solution would be to use labels when creating workspaces (docker run -l traefik...). Frontends and beckends sections would be set/updated as soon as the container is started. Unfortunately a current Traefik limitation doesn't allow to configure multiple routes for one container that has multiple ports. And Che workspaces have 6 exposed ports.
  • Another limitation is that Traefik entrypoints cannot be updated dynamically. As a consequence every time a new workspace is created, 6 extra entrypoints need to be added and traefik need to be restared. A solution could be to the same 6 entrypoints for all the Che workspaces. Using for example the workspaces internal ports as entrypoint addresses (8000, 8080, 9876, 22, 4401, 4403, 4411) and infer the route to the right backend based on HTTP Header.

@TylerJewell
Copy link

@l0rd - thanks for the contribution!

It's taken me some time to get around to reviewing this, but I have been through it now, and understand the approach and the analysis.

A few questions - these may be ignorant based upon my limited understand how multi-host or reverse proxies work. This area of technology is not my strong suit.

1: If the browser is sending all port communications to the same host that the Che server is running on, then what are the types of code changes that are being made? If the solution that you are proposing by adding in a reverse proxy communicating on the same host as the che-server, and that proxy is going to look for all communications over certain ports (to forward), shouldn't this solution work without any Che server code modifications?

2: Out of curiosity, what are the pros / cons of modifying the Che server to proxy these communications to the related workspace?

BTW - I now recognize the power of Traefik - it really allows for moment-in-time reverse proxy deployment. We could deploy 1..n reverse proxies on-demand alongside a Che server or other system to be used as special purpose configurations to solve certain challenges with networking in multi-host situations.

@l0rd
Copy link
Contributor Author

l0rd commented Aug 29, 2016

@TylerJewell

  1. Short term solution only need the changes already included in this PR (and these changes are needed). Long term solution should modify the che-server to add labels when running a new container.
  2. Adding reverse proxy capability within Che would be possible and users would not need to use an external tool but:
    • 99% of Che users won't need a reverse proxy
    • building a reverse proxy is hard (think about the network performances)

@TylerJewell
Copy link

@l0rd - I am unsure of the path forward needed to let us merge these changes. Is there perhaps an environment variable that we can set on the command line which would then trigger the che-launcher to launch a traefik reverse proxy?

If so, let's include those improvements into this PR and then launch the CLI, che-launcher and these improvements together. Can you suggest additional command-line configuration details that would be present that would trigger this configuration? Then we can make improvements together on the che-launcher and che cli. I like this approach.

We will also need to do some detailed docs together.

@garagatyi
Copy link

I think that this PR is not needed anymore. #2402 is next idea on how to workaround the problem.
@l0rd can we close this PR?

@l0rd
Copy link
Contributor Author

l0rd commented Sep 12, 2016

@garagatyi I would prefer to keep it open. I would like to continue the work to integrate Traefik in the che-launcher here. I'm waiting proposal traefik/traefik#444 is addressed before going any further.

@garagatyi
Copy link

Can you try idea from #2402 with IPTables? I think it would be even better than usage of traefik.

@l0rd
Copy link
Contributor Author

l0rd commented Sep 12, 2016

Yes I'm trying that idea right now :-). That should fix the Docker for Mac issue.

However Traefik would add the ability to start a reverse proxy automatically configured to route requests to Che and the containers. That could still be valuable. wdyt @garagatyi?

@garagatyi
Copy link

What use-cases are covered by this approach?

@l0rd
Copy link
Contributor Author

l0rd commented Sep 12, 2016

@garagatyi
Copy link

I understand what reverse proxy is. I just want to understand what profit will get Che if we use reverse proxy in it?

@l0rd
Copy link
Contributor Author

l0rd commented Sep 12, 2016

The "Uses of reverse proxy" section explains what are the benefits of a reverse proxy. In particular:

  • Reverse proxies can hide the existence and characteristics of an origin server or servers.
  • Reverse proxies can operate wherever multiple web-servers must be accessible via a single public IP address. The web servers listen on different ports in the same machine, with the same local IP address or, possibly, on different machines and different local IP addresses altogether. The reverse proxy analyzes each incoming request and delivers it to the right server within the local area network.

could make sense for Che.

@TylerJewell
Copy link

I think we want to continue exploring the embedded reverse proxy approach for the following use cases:

  1. Users want to eventually route all traffic through a single port for both ws master & agents. We'd need a central proxy to handle such traffic.
  2. Users want to distribute che server and docker daemon onto different IPs - we can use a reverse proxy to manage that, if the iptables approach is not sufficient.

@l0rd
Copy link
Contributor Author

l0rd commented Nov 9, 2016

A quick update on this PR. It was initially opened to fix issue #1644. This issue has now been fixed with #2402 but we would like to continue working on this PR for the use cases listed by @TylerJewell and to allow Che workspaces to run with OpenShift routes.

The goal would be to be able to contact a workspace service (e.g. wsagent) on port 80 or 443 using a specific hostname (e.g. servicename.workspacename.mychehost.com) and avoid exposing ephemeral TCP ports publicly.

Work on that PR should start as soon as Che will be able to start and stop workspaces as OpenShift pods (second bullet of #2847).

@TylerJewell
Copy link

This would be a substantial improvement - being able to hide the ephemeral ports - many people have asked for that.

@TylerJewell
Copy link

@l0rd - what is our plan with this PR given the release of 5.0?

@TylerJewell
Copy link

From a meeting with Red Hat. The current plan for this issue is that this will be replaced with a different implementation. The focus of the implementation will be to provide a new URL resolution strategy that allows browsers to channel traffic through a single Che port. Mario will open this new PR in the next couple weeks and then we will close this PR.

@l0rd
Copy link
Contributor Author

l0rd commented Jun 21, 2017

This PR has been replaced by #4351

@l0rd l0rd closed this Jun 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants