-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obtain Weave peers from members of the same ECS cluster #73
Comments
This one's tricky. In fact, we initially tried to infer the network peers from the ECS cluster members instead of the autoscaling group but there's a chicken and egg problem:
Unfortunately (1) and (2) cause the cluster to be empty when weave is launched, making impossible to satisfy (3). I looked into registering peers dynamically (i.e. after weave is launched), but at the time this was either not possible or it made impossible to reach initial IPAM consensus (I don't really remember). @awh @bboreham Any ideas? @awh Maybe your current IPAM pre-consensus work could help with this? @bryanvaz As an alternative, would it be good enough to allow peer-identification through tags? (see #1 ). You could tag your spot fleet with a specific tag. Plus it gives you finer control of your peers (you could join multiple autoscaling groups or only certain instances) |
@2opremio Peer identification through tags would work. The main problem right now is actually ECS. Since, ECS has no way to bind tasks of a service group to a particular set of instances (conditions either by type, ASG, LC, or spot/reserved). For example, we were experimenting with using a memcached cluster in ECS to back a group of web servers. However in our hybrid spot/reserved model, some of the web servers may be put in the spot fleet (to take advantage of spot savings). Unfortunately, spot servers can't use weave to talk to the memcached cluster sitting in the reserved fleet. The obvious solution would be to register the memcached cluster with an ELB, and point the web servers to the ELB instead, but the use case for Weave is that it eliminates the need for each microservice to have an ELB, so probably not the best option. The 2 other options we were considering were:
|
What is the ordering? Can we get this: weave launch ... container attach to weave ... ecs-agent start ... weave discovers its peers ? |
@bryanvaz yes, it is the case indeed and we had been discussing this with the folks at Amazon. From what I understand, it shouldn't be too hard to implement a custom schedule to achieve something like this. The custom scheduler abstraction is a rather basic one and one hand easy to use, on the other requires you to do all the work; in a nutshell you simply need to make the discussion and call |
would be addressed by weaveworks/weave#1721, so long as you can |
Yep, like @errordeveloper already mentioned, we are also missing some sort scheduling rules/affinity . In particular we could really use something like k8s' DaemonSet.
@bryanvaz Any other causes for this apart from the ASG-based peer detection? In other words, would a tag-based solution (or ideally a ECS-cluster-based solution, if we solve the chicken-and-egg problem) be enough to get you forward? |
@bboreham That is the current ordering, except for weave discovers its peers, which we do before weave launch. I gave up on discovering the peers after the launch for the reasons above.
I don't think we can control that. What would happen if an IP is allocated, by, for instance, running a container before the |
If you have a 1-node cluster then, effectively, the correct number of |
But you might want a way to have IPAM defer any allocation until you know the number of peers. Do note I'm trying to address the OP's point in a possible future, not trying to discuss the current implementation. |
I guess that's needed for running a container before the weave connects happen, right? Otherwise it can result in an IPAM conflicts. |
Yes. To give an example, suppose the order is (on host1):
then we want to be able to say at the end of that sequence " The current implementation will force consensus at line 2, resulting in a clique. |
The new architecture we are designing uses ECS to hold a mix of EC2 instances (spot and reserved instances to optimize costs) which require two Auto Scaling Groups (ASGs) to work. However because weave is based on ASG, the Spot fleet does not join the Reserved fleet and the Weave network breaks down.
Is it possible to specify a parameter in a config file on boot to allow Weave to use the ECS Cluster instead of the ASG to group instances into a Weave network?
The text was updated successfully, but these errors were encountered: