Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming does not appear to work properly with ALB #2026

Closed
askmike1 opened this issue Jan 28, 2022 · 30 comments
Closed

Streaming does not appear to work properly with ALB #2026

askmike1 opened this issue Jan 28, 2022 · 30 comments
Labels
bug Something isn't working help wanted Good feature for contributors streaming-logs waiting-on-response Waiting for a response from the user

Comments

@askmike1
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

When performing a plan or apply, the link that is given to view the live stream doesn't appear to run anything, just a blank black box.

image

(looks like similar issues posted in the original PR - #1937)

Atlantis Version: 0.18.2

If the streaming window is already open, it will print out -----Starting New Process-----, but that is all

Based on the logs, it is getting a broken pipe possibly because an ALB is being used?

Reproduction Steps

Run an atlantis plan or apply
Go to streaming url
[No input]

Logs

Environment details

Atlantis: 0.18.2

Additional Context

We are running Atlantis as a single Docker container on an AWS ECS cluster with an ALB in front of it

@askmike1 askmike1 added the bug Something isn't working label Jan 28, 2022
@s33dunda
Copy link

s33dunda commented Jan 31, 2022

we're experiencing the same issue... atlantis does give us some logs:

{
  "level": "error",
  "ts": "2022-01-27T21:59:48.706Z",
  "caller": "logging/simple_logger.go:161",
  "msg": "writing to ws 2uinc/atlantis-test-repo/15/iam/default: upgrading websocket connection: websocket: the client is not using the websocket protocol: 'upgrade' token not found in 'Connection' header",
  "json": {},
  "stacktrace": "github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log\n\tgit.luolix.top/runatlantis/atlantis/server/logging/simple_logger.go:161\ngit.luolix.top/runatlantis/atlantis/server/controllers.(*JobsController).respond\n\tgit.luolix.top/runatlantis/atlantis/server/controllers/jobs_controller.go:141\ngit.luolix.top/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS\n\tgit.luolix.top/runatlantis/atlantis/server/controllers/jobs_controller.go:134\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2047\ngit.luolix.top/gorilla/mux.(*Router).ServeHTTP\n\tgit.luolix.top/gorilla/mux@v1.8.0/mux.go:210\ngit.luolix.top/urfave/negroni.Wrap.func1\n\tgit.luolix.top/urfave/negroni@v1.0.0/negroni.go:46\ngit.luolix.top/urfave/negroni.HandlerFunc.ServeHTTP\n\tgit.luolix.top/urfave/negroni@v1.0.0/negroni.go:29\ngit.luolix.top/urfave/negroni.middleware.ServeHTTP\n\tgit.luolix.top/urfave/negroni@v1.0.0/negroni.go:38\ngit.luolix.top/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP\n\tgit.luolix.top/runatlantis/atlantis/server/middleware.go:69\ngit.luolix.top/urfave/negroni.middleware.ServeHTTP\n\tgit.luolix.top/urfave/negroni@v1.0.0/negroni.go:38\ngit.luolix.top/urfave/negroni.(*Recovery).ServeHTTP\n\tgit.luolix.top/urfave/negroni@v1.0.0/recovery.go:193\ngit.luolix.top/urfave/negroni.middleware.ServeHTTP\n\tgit.luolix.top/urfave/negroni@v1.0.0/negroni.go:38\ngit.luolix.top/urfave/negroni.(*Negroni).ServeHTTP\n\tgit.luolix.top/urfave/negroni@v1.0.0/negroni.go:96\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2879\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"
}

I'm not sure what to make of them though. Hope this helps, cause the devs will be stoked to have this feature working.

@MattMencel
Copy link

I see the same issue in AKS with the AKS LB and an Azure App Gateway.

@david-heward-unmind
Copy link
Contributor

See the same here. With ALB in front of EC2.

@gmontanola
Copy link

Same here.

EKS + ELB + Traefik

@mcrivar
Copy link

mcrivar commented Mar 7, 2022

got the same issue with ALB + ECS in AWS.

@pantelis-karamolegkos
Copy link
Contributor

Same issue but when accessing directly a cloud VM. via https://11.22.33.44:9100/jobs/...

@mcrivar
Copy link

mcrivar commented May 22, 2022

anything about this?
I get 200 in response but the json is empty:
{"level":"debug","ts":"2022-05-22T17:14:45.011Z","caller":"server/middleware.go:44","msg":"GET /jobs/.../735/terraform/default/ws – from 172.30.10.164:38882","json":{}}

and after 60 sec I get this (which is the timeout on the LB):
{"level":"warn","ts":"2022-05-22T17:15:45.011Z","caller":"websocket/writer.go:62","msg":"Failed to read WS message: websocket: close 1006 (abnormal closure): unexpected EOF","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/controllers/websocket.(*Writer).setReadHandler\n\tgit.luolix.top/runatlantis/atlantis/server/controllers/websocket/writer.go:62"}

  • Increasing timeout on the LB did not help
  • Running Atlantis without any wrappers - regular terraform commands.

@adutchak
Copy link

adutchak commented May 24, 2022

Same here, we use custom workflows. Issue persist with and without proxies (i.e. with k8s port-forward issue is the same).
Could be related to custom workflows still as according to documentation it's only supported for regular terraform commands https://www.runatlantis.io/docs/streaming-logs.html#real-time-logs.
However it would be really nice to have that working for the other cases too.

@jamengual
Copy link
Contributor

is this still happening with v0.19.8?

@jamengual jamengual added the waiting-on-response Waiting for a response from the user label Aug 26, 2022
@spamoom
Copy link

spamoom commented Aug 26, 2022

@jamengual not for us :(

2022-08-26 at 08 43 17@2x

@jreslock
Copy link

@jamengual yes this is still an issue with v0.19.8 and does not appear to be related to ALB or proxy configuration. We are running with an ALB -> ECS Fargate and I was able to reproduce the issue with and without the ALB in play.

@jamengual jamengual added help wanted Good feature for contributors and removed waiting-on-response Waiting for a response from the user labels Aug 26, 2022
@jamengual
Copy link
Contributor

ok, we will look into this.

@jreslock
Copy link

In our case this was not the AWS ALB but instead it was the nginx sidecar container we have running alongside atlantis in ECS Fargate.

Adding the following settings in nginx.conf resolved the issue for us.

    location / {
      # redirect all traffic to the backend
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Host $http_host;
      proxy_pass http://${APP}:${APP_PORT};

      # WebSocket support
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "Upgrade";
    }

@evanstachowiak
Copy link

@jamengual I have this working successfully with ALB & ECS and custom workflows, no additional config needed.

@jamengual
Copy link
Contributor

jamengual commented Aug 30, 2022 via email

@mcrivar
Copy link

mcrivar commented Oct 3, 2022

@evanstachowiak any specific configuration? We still fail to have it working properly.

@evanstachowiak
Copy link

I'm not sure what sort of config to include here.

I have an alb that forwards 443 -> 4141 with atlantis target group.

image

I don't have nginx running or any other proxy in between.

@bschaatsbergen
Copy link
Member

Same here, External HTTPS Load Balancer and VM on GCP.

@nitrocode
Copy link
Member

I bet it's something to do with the configuration of the load balancer. I have it working on a load balancer in AWS.

Is stickiness enabled on the load balancer?

https://stackoverflow.com/a/40423241/2965993

Has anyone contacted aws or gcp support to figure out what the issue could be?

@jamengual
Copy link
Contributor

I have deployed about 10+ Atlantis servers in aws using ALBs and never had a problem with the log streaming.

some people have reported corporate firewall denying connection, bad configuration in the LBs, antivirus firewalls could cause issues, but that is not on the Atlantis side.

@bschaatsbergen
Copy link
Member

bschaatsbergen commented Jan 2, 2023

It probably isn't related to Atlantis, but I do think that if this many people run into this we need to sort out what that common pitfall is and document it properly in the Deployment section.

@mcrivar
Copy link

mcrivar commented Jan 17, 2023

anything about this? I get 200 in response but the json is empty: {"level":"debug","ts":"2022-05-22T17:14:45.011Z","caller":"server/middleware.go:44","msg":"GET /jobs/.../735/terraform/default/ws – from 172.30.10.164:38882","json":{}}

and after 60 sec I get this (which is the timeout on the LB): {"level":"warn","ts":"2022-05-22T17:15:45.011Z","caller":"websocket/writer.go:62","msg":"Failed to read WS message: websocket: close 1006 (abnormal closure): unexpected EOF","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/controllers/websocket.(*Writer).setReadHandler\n\tgit.luolix.top/runatlantis/atlantis/server/controllers/websocket/writer.go:62"}

  • Increasing timeout on the LB did not help
  • Running Atlantis without any wrappers - regular terraform commands.

Just an update from my side, per the given error it appears that I was running an old version of atlantis v0.18.2
Updated to the newest available stable version and the issue is gone as I'm able to get the streamed logs.

@nitrocode
Copy link
Member

That's great @mcrivar! Thank you for sharing.

For all others who are running into issues, could you folks use the latest version and confirm if the issue is still present?

cc @askmike1 @s33dunda @MattMencel @davidh-unmind @gmontanola @pantelis-karamolegkos @adutchak @spamoom

@MattMencel
Copy link

This was 100% to do with websockets for me. Once I configured websockets on my LB/Ingress it started working. See #2216

@DomFourn
Copy link

Running version 0.22.3, this is still happening in GCP, running in a VM.
Sometimes we get some data, sometimes none.
There are no specific config for websockets in GCP LBs.

@nitrocode
Copy link
Member

@bschaatsbergen do you see the gcp web socket issue using your gce Atlantis module?

https://github.com/bschaatsbergen/terraform-gce-atlantis

@bschaatsbergen
Copy link
Member

bschaatsbergen commented Feb 10, 2023

@nitrocode I found out the issue for Google Cloud users, when using Identity Aware Proxy (to protect the Atlantis UI) websockets are not supported.. the bearer authorization header is stripped off.

@nitrocode
Copy link
Member

Ah that's good to know. Thank you for closing the loop on that.

Is this relevant for that ?

https://cloud.google.com/iap/docs/authentication-howto#authenticating_from_proxy-authorization_header

I wonder if other people in this thread are running into similar issues where something that fronts atlantis is manipulating the bearer authorization header which leads to the websocket failure.

@github-actions
Copy link

This issue is stale because it has been open for 1 month with no activity. Remove stale label or comment or this will be closed in 1 month.'

@github-actions github-actions bot added the Stale label Mar 15, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 16, 2023
@scott-standard
Copy link

@nitrocode I found out the issue for Google Cloud users, when using Identity Aware Proxy (to protect the Atlantis UI) websockets are not supported.. the bearer authorization header is stripped off.

@bschaatsbergen I'm seeing something similar where the "live plan" view via the console UI has strange behavior where sometimes the full plan doesn't display (especially for larger plans). I used your terraform module for GCE setup.

Any ideas for a workaround? Could I put nginx in front of the atlantis docker container to deal with the authorization header issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Good feature for contributors streaming-logs waiting-on-response Waiting for a response from the user
Projects
None yet
Development

No branches or pull requests