Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy timeout when auto-scaling machines #430

Open
andig opened this issue Feb 27, 2024 · 2 comments
Open

Proxy timeout when auto-scaling machines #430

andig opened this issue Feb 27, 2024 · 2 comments

Comments

@andig
Copy link

andig commented Feb 27, 2024

Migrated from superfly/litefs-example#10

I'm new to litefs and following the example tutorial. The demo application comes up on fly but is unstable. It will serve a few requests and then go into sporadic

Proxy timeout

It seems as if this is cause by the application auto-scaling the second, cloned machine away:

2024-02-24T10:25:45Z runner[e784966ce65358] ams [info]Machine created and started in 17.176s
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]config file read from /etc/litefs.yml
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]LiteFS v0.5.11, commit=63eab529dc3353e8d159e097ffc4caa7badb8cb3
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="host environment detected" type=fly.io
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="no backup client configured, skipping"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="Using static primary: primary=true hostname= advertise-url=http://primary:20202"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="48F8DDD307D7E3CB: primary lease acquired, advertising as http://primary:20202"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="set cluster id on \"static\" lease \"LFSC85E68DD3249EC869\""
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="LiteFS mounted to: /litefs"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="http server listening on: http://localhost:20202"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="waiting to connect to cluster"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="connected to cluster, ready"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="proxy server listening on: http://localhost:8080"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="starting background subprocess: litefs-example [-addr :8081 -dsn /litefs/db]"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]waiting for signal or subprocess to exit
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]database opened at /litefs/db
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]level=INFO msg="database file is zero length on initialization: /var/lib/litefs/dbs/db/database"
2024-02-24T10:25:45Z app[e784966ce65358] ams [info]http server listening on :8081
2024-02-24T10:29:39Z proxy[7842043c462768] ams [info]Downscaling app litefs-example-2 from 2 machines to 1 machines, stopping machine 7842043c462768 (region=ams, process group=app)
2024-02-24T10:29:39Z app[7842043c462768] ams [info] INFO Sending signal SIGINT to main child process w/ PID 313
2024-02-24T10:29:39Z app[7842043c462768] ams [info]sending signal to exec process
2024-02-24T10:29:39Z app[7842043c462768] ams [info]waiting for exec process to close
2024-02-24T10:29:39Z app[7842043c462768] ams [info]signal received, litefs shutting down
2024-02-24T10:29:39Z app[7842043c462768] ams [info]litefs shut down complete
2024-02-24T10:29:39Z app[7842043c462768] ams [info]level=INFO msg="FE9F1F5AB015D813: exiting primary, destroying lease"
2024-02-24T10:29:40Z app[7842043c462768] ams [info] INFO Main child exited normally with code: 0
2024-02-24T10:29:40Z app[7842043c462768] ams [info] INFO Starting clean up.
2024-02-24T10:29:40Z app[7842043c462768] ams [info] INFO Umounting /dev/vdb from /var/lib/litefs
2024-02-24T10:29:40Z app[7842043c462768] ams [info] WARN hallpass exited, pid: 314, status: signal: 15 (SIGTERM)
2024-02-24T10:29:40Z app[7842043c462768] ams [info]2024/02/24 10:29:40 listening on [fdaa:0:30b1:a7b:242:c65e:fbdc:2]:22 (DNS: [fdaa::3]:53)
2024-02-24T10:29:41Z app[7842043c462768] ams [info][  375.687714] reboot: Restarting system

I assume this is due to the proxy trying to connect to the downscaling machine. As such, the second machine would decrease availability, at least when auto-scaling.

Is this the expected behaviour of the proxy?

@benbjohnson
Copy link
Collaborator

We generally don't recommend autostopping machines with LiteFS. Is this the tutorial you were following?

@andig
Copy link
Author

andig commented Mar 3, 2024

I was following this and then on to https://github.com/superfly/litefs-example/tree/main/fly-io-config. Noticed the problem when one of the small machines went away due to being idle.

We generally don't recommend autostopping machines with LiteFS.

I was assuming that the LiteFS Proxy (or the Fly Loadbalancer?) uses the Consul information to direct the request to a running instance. Maybe that's just not how it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants