Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 restart core containers if they fail automatically #3423

Merged
merged 1 commit into from
May 17, 2021

Conversation

jrhizor
Copy link
Contributor

@jrhizor jrhizor commented May 14, 2021

see https://docs.docker.com/compose/compose-file/compose-file-v3/#restart for docs on restart

This adds restarts for all long-running containers. I allowed unless-stopped instead of always so we can still manually control services.

We've seen a couple of issues on Slack where users were confused because some container (like temporal or the server) died and never restarted. This should fix that problem.

Manual Testing
I tested this by running docker-compose up in one window, and in the other running this sequence of commands:

→ docker exec -it airbyte-server /bin/bash
root@9365322a100e:/app# apt-get install ps
...
root@9365322a100e:/app# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  7.2  8.1 3392784 250900 ?      Ssl  18:38   0:16 /usr/local/open
root        63  0.0  0.1   5624  3576 pts/0    Ss   18:41   0:00 /bin/bash
root       391  0.0  0.0   9396  2988 pts/0    R+   18:42   0:00 ps aux
root@9365322a100e:/app# kill 1
root@9365322a100e:/app# %

which killed the server. Then I could see it restarted successfully in the docker-compose logs/UI interactions.

Copy link
Contributor

@michel-tricot michel-tricot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@marcosmarxm
Copy link
Member

@jrhizor just curious, This won't resolve the problem when the scheduler cannot connect to temporal because temporal starts after the scheduler. Can we use the option depends_on: airbyte-temporal to impose starting the scheduler only after the airbyte-temporal is ready?

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

@jrhizor
Copy link
Contributor Author

jrhizor commented May 17, 2021

@marcosmarxm

We had depends_on at some point but removed it because it was misleading.

According to https://docs.docker.com/compose/compose-file/compose-file-v3/#depends_on, depends_on doesn't actually wait for the dependency to become ready; it only waits until it starts, which isn't enough to be useful on first launch in 99% of cases.

Since we have overhead for starting that isn't related to the dependency, starting that process gets us a slight jump on overall startup latency.

@jrhizor jrhizor merged commit 42ecf1f into master May 17, 2021
@jrhizor jrhizor deleted the jrhizor/restart-docker-on-failures branch May 17, 2021 17:03
htrueman added a commit that referenced this pull request May 18, 2021
* Requirements updated to CDK.
airbyte-protocol and base-python requirements removed.

* Bugfix: BufferedStreamConsumer. (#3387)

* Format.

* Bump versions.

* main_dev.py renamed to main.py
README.md updated

* Source Stripe: Add Acceptance Tests to Stripe Connector (#3367)

* Add Acceptance Tests to Stripe Connector

* move configured_catalog.json to sample_files

* bump version

Co-authored-by: ykurochkin <y.kurochkin@zazmic.com>

* Legacy lib references removed

* FB Marketing source - lookback window logic not functioning correctly

* FB Marketing source #1390 - returning buffered record while incremental sync

* FB Marketing source #1390 - improving checking while syncing buffered record

* FB Marketing source #1390 - adding loop_back to IncrementalStreamAPI

* FB Marketing source #1390 - bump version

* FB Marketing source #1390 - add CHANGELOG.md

* Stop formatting python with spotless (#3388)

* add test that migration output schema same as source schema (#3356)

* Add updated architecture diagram to high level docs. (#3399)

* Add updated architecture doc to high level docs.

* Address review comments

Co-authored-by: Abhi Vaidyanatha <abhivaidyanatha@Abhis-MacBook-Pro.local>

* Correct GA readme error. (#3407)

* make shopify more resilient to timeouts (#3409)

* Update migration schema to include recent changes to the StandardSync object. (#3414)

* Update all of Pydantic to 1.6.2 per Dependabot. (#3408)

* Update all to 1.6.2.

* Publish new airbyte-cdk version.

* Use repr instead of str for exceptions.

* Use rc.

* Edit test.

* Bump for SAT.

* Format.

* Docker ignore update. Fix setup.py

* fixing ONLY problematic fields in freshdesk JSON schemas (#3376)

* bump airbyte-webapp version (#2266)

* add configuration for bumping webapp versionn

* set to current version

* Bump version: 0.16.0-alpha → 0.16.1-alpha

* Revert "Bump version: 0.16.0-alpha → 0.16.1-alpha"

Thiss reverts commit fdbf6dc.

* also update package lock so we don't run into files changed errors

* use 0.19.0-alpha

* add npm webapp version

* Add a CDK speedrun tutorial doc (#3403)

* Add CDK Speedrun document.

* Finish speedrun doc.

* Address review comments

* Add to SUMMARY.md

Co-authored-by: Abhi Vaidyanatha <abhivaidyanatha@Abhis-MacBook-Pro.local>

* Add Rust as a connector specific dependency to source-file (#3426)

* Add Rust as a connector specific dependency to source-file

* Add more details about installation.

* Markdown lines are weird.

Co-authored-by: Abhi Vaidyanatha <abhivaidyanatha@Abhis-MacBook-Pro.local>

* API update to latest airbyte-cdk version

* Add section Deploy Local on Windows (#3425)

* add deploy on windows steps

* correct minor

* change suggestions by @avaidyanatha

* GitBook: [master] 161 pages and 75 assets modified

* Display icons (#3140)

* Display icons

* Improve icons views

* MS SQL Server Destination implementation

Fixes issue #613.

Normalization is not yet enabled.  This will have to be added at a later point.

* Workflow to handle operations (custom transformation) (#3379)

* Keep normalization backward compatible with old settings from destination

* Bumpversion normalization image

* add npm install before all npm run generates' (#3442)

* restart containers if they fail automatically (#3423)

* Update link for contribution scheduling (#3443)

* Address issue with icon in onboarding (#3437)

* rename toy connector tutorial to "Build a connector the hard way"  (#3421)

* Upload test reports (from integration test slash commands) as GitHub artifacts (#3416)

* Archive test reports in github workflow

* Archive Test reports only when failures

* Fixing SqlServerOperations.java (#3454)

Fixing some issues with `SqlServerOperations`, which was out of sync with recent changes to `SqlOperations`.

* Add redirect to cdk tutorial page (#3456)

* add redirect to cdk tutorial page

* change path to cdk README.md

Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: Yevhenii <34103125+yevhenii-ldv@users.noreply.github.com>
Co-authored-by: ykurochkin <y.kurochkin@zazmic.com>
Co-authored-by: vitaliizazmic <75620293+vitaliizazmic@users.noreply.github.com>
Co-authored-by: Charles <giardina.charles@gmail.com>
Co-authored-by: Abhi Vaidyanatha <abhi@airbyte.io>
Co-authored-by: Abhi Vaidyanatha <abhivaidyanatha@Abhis-MacBook-Pro.local>
Co-authored-by: Jared Rhizor <jared@dataline.io>
Co-authored-by: vovavovavovavova <39351371+vovavovavovavova@users.noreply.github.com>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: Marcos Marx <marcos@airbyte.io>
Co-authored-by: Artem Astapenko <3767150+Jamakase@users.noreply.github.com>
Co-authored-by: masonwheeler <masonwheeler@yahoo.com>
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Michel Tricot <michel@dataline.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants