Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-524 staging environment #532

Merged
merged 6 commits into from
Jul 21, 2023
Merged

GH-524 staging environment #532

merged 6 commits into from
Jul 21, 2023

Conversation

alabdao
Copy link
Collaborator

@alabdao alabdao commented Jul 20, 2023

Split out docker setup for potentially used on other nodes.

Putting custom vars into extra-vars file to be referenced on cli
while executing script.

Fixes #524

hevans66 and others added 4 commits July 18, 2023 15:22
Split out docker setup for potentially used on other nodes.

Putting custom vars into extra-vars file to be referenced on cli
while executing script.
@vercel
Copy link

vercel bot commented Jul 20, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 21, 2023 2:36pm

@alabdao alabdao temporarily deployed to ci July 20, 2023 21:26 — with GitHub Actions Inactive
@alabdao alabdao requested a review from hevans66 July 20, 2023 21:27

# slow_start = 60

# TODO: need to figure out healthcheck for IPFS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a rabbit hole I started down at some point.

Copy link
Contributor

@hevans66 hevans66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great its certainly a more complete set up than we had before.

Coupe of notes:

  • This does not set up a receptor instance for staging. I think that's ok for now, but I think as soon as the receptor starts doing fancier things (like actually rejecting jobs based on criteria) we will want to add a receptor for testing purposes.
  • I'm a little worried about the requester instances having an auto scaling group. Mostly because I don't really know what it means for two bacalhau requester nodes (that are not peered together)
    to share the same set of compute nodes. What would happen if one requester node accepts a job and hands it off to a compute node, then the cli later requests the job status and that request goes to a different requester node? The requester node is not doing any computation so I don't anticipate ever really needing more than one. Unless the idea is to never n > 1 for this asg.
  • This still won't automatically run the ansible provision scripts when an instance launches right? For now were still running ansible-playbook from command line?

@@ -1,109 +1,67 @@
- name: Provision Bacalhau Compute Instance
remote_user: ubuntu
hosts: tag_Type_compute_only:&tag_Env_prod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you limiting to specific Env's when running ansible-playbook? Or is there some magic I am missing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah using --limit tag_Env_staging while executing ansible-playbook command.

@alabdao
Copy link
Collaborator Author

alabdao commented Jul 21, 2023

This is looking great its certainly a more complete set up than we had before.

Coupe of notes:

* This does not set up a receptor instance for staging. I think that's ok for now, but I think as soon as the receptor starts doing fancier things (like actually rejecting jobs based on criteria) we will want to add a receptor for testing purposes.

Yes recepter will definitely come later.

* I'm a little worried about the requester instances having an auto scaling group. Mostly because I don't really know what it means for two bacalhau requester nodes (that are not peered together)
  to share the same set of compute nodes. What would happen if one requester node accepts a job and hands it off to a compute node, then the cli later requests the job status and that request goes to a different requester node? The requester node is not doing any computation so I don't anticipate ever really needing more than one. Unless the idea is to never n > 1 for this asg.
  • Need to ASG is to have HA capability in case of node becoming unhealthy. Work needs to be done to make Bacalhau and IPFS state to be preservable probably EFS would do the job.
  • Needed LB target to terminate public traffic. Step towards having nodes NOT having public IP addresses being accessed via bastian/VPN/EC2 Instance Connect Endpoint.
  • Easier bootstrapping mecnism.
  • At some point probably have multiple requesters with Stickness enabled so client hits the same node.
* This still won't automatically run the ansible provision scripts when an instance launches right? For now were still running ansible-playbook from command line?
  • Yes correct. This is step towards that. Having compute nodes being completely dynamic with headless setup.

@alabdao alabdao temporarily deployed to ci July 21, 2023 14:34 — with GitHub Actions Inactive
@alabdao alabdao temporarily deployed to ci July 21, 2023 14:35 — with GitHub Actions Inactive
@alabdao alabdao merged commit 9538de2 into main Jul 21, 2023
@alabdao alabdao deleted the ops/524-staging-environment branch July 21, 2023 16:04
Copy link
Contributor

@thetechnocrat-dev thetechnocrat-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I'll defer to @hevans66 on the final approval. I do think the large upcoming strategic decision that is coming up is, how we organize the infrastructure code for the private versus public clusters.

@@ -4,6 +4,12 @@
vars:
ipfs_path: /opt/local/ipfs
tasks:
# Must provide limit flag to ensure running against current environment
- fail:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, didn't know about this trick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup Plex Staging environment
3 participants