Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling Support #345

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

betamatt
Copy link
Contributor

@betamatt betamatt commented Mar 4, 2014

Feedback requested/appreciated.

Adds support for EC2 autoscaling groups as a new cluster type.I followed the example set by the RDS provider. It doesn't exactly mash with how Ironfan looks at the world but might be good enough to be usable.

Known issues:

  • Each AS group gets a pointless Chef node. I haven't been able to figure out a good way to indicate that nodes are irrelevant for a particular provider.
  • Cluster show doesn't give useful output for paused AS groups.
  • Cluster stop pauses the AS group without killing its instances. This breaks with convention but was determined to be more in line with what our operations folks really wanted. Could be made more conventional with the addition of new cluster pause/unpause actions that only apply to AS groups.
  • It is anticipated that the desired size will be controlled by operations and not in cluster definitions, beyond initial launch.
  • Scaling policy is not yet supported (also not fully implemented by fog)
  • ELB registration is not yet supported (should be straightforward)
  • Several autoscaling attributes have not yet been fully implemented. (VPC, Termination policy, etc)
  • Tags have not yet been applied.
  • Some options are unsupported by fog (spot pricing, especially)
  • cluster ssh will ignore autoscaled nodes
  • Tests :-\

Bugs:

  • regions other the us-east-1 don't work
  • bootstrapping fails instead of refusing
  • cluster ssh is entirely broken

There are likely more problems that haven't been fully exposed yet but this should be enough for a conversation about mergeability.

cc/@gerbercj

  * Adds basic support for EC2 autoscaling groups as a new cluster type.
  * Start/Stop pause scaling activities but leave instances running

  Hat tip to Chris Gerber (chris.gerber@tapjoy.com) for initial spike.
@betamatt
Copy link
Contributor Author

betamatt commented Mar 4, 2014

cc/ @wjossey @Ofanite @marcuswalser

@meekmichael
Copy link
Contributor

Hi, and a big Boston shout-out to you, this is something we've really wanted.

Some feedback:

  • cloud(:autoscale) might be better as cloud(:ec2_autoscale) so as not to conflict with any other IAAS's autoscaling support that Ironfan may someday support
  • The lack of cloudwatch alarms tied to the launch config is a hard pill to swallow. Would it be worthwhile to pull in the aws-sdk gem to handle this? I know it's ugly form to mix aws-sdk and fog, but fog just doesn't do it, even in version 1.20 if the docs are to believed.
  • it tended to say "All computers launched correctly" even when things went terribly wrong.
  • Its not possible to launch an ASG anywhere other than us-east-1. I like us-west-2, and it finds my AMI ID, but then tries to launch the ASG in us-east with the us-west AMI ID.
  • Attempting "knife cluster ssh clustername" tells me Exception: NoMethodError: undefined methodvpc_id' for #Ironfan::Provider::Autoscale::Machine:0x007f8a57101b78`. I don't use VPCs.
  • Attempting to bootstrap my node gave me "ERROR: undefined method `dns_name' for #Ironfan::Provider::Autoscale::Machine:0x007fac8e1d9100 (NoMethodError)"
  • No keypair is created or attached to the machines.
  • A sample cluster definition would be nice to have, and would help more people give this a try.

Here is my config for my contrived resque cluster;

https://gist.github.com/meekmichael/9414653

If there is any way I can help, let me know. I have a few hours a week to hack on this.

@betamatt
Copy link
Contributor Author

betamatt commented Mar 7, 2014

@meekmichael thanks for taking a look. You're brought up some great points.

cloud(:autoscale) might be better as cloud(:ec2_autoscale)

True. Don't want to spend the time right now but it should be fixed before merge.

The lack of cloudwatch alarms tied to the launch config is a hard pill to swallow.

Agree. I'd suggest adding support to fog and using that to implement. I don't think I'll have time to get to this.

Its not possible to launch an ASG anywhere other than us-east-1

No reason this shouldn't work. I'm probably forgetting to set the region somewhere.

Attempting "knife cluster ssh clustername" tells me Exception [...] undefined method vpc_id

This is an ironfan issue. It's tightly-bound to the ec2 machine implementation. I think fixing this will involve some changes to ironfan. I haven't thought it through yet.

No keypair is created or attached to the machines.

I think you can attach an existing keypair by specifying key_name. We don't manage keys this way so I haven't tested it.

Attempting to bootstrap my node gave me "ERROR: undefined method `dns_name'

Yeah, bootstrapping doesn't make sense with autoscaling it should refuse to attempt. Instead of bootstrapping, the distro script that would be used to bootstrap gets loaded into the group definition as the user data. We use our own scripts so I haven't tested with the out-of-the-box ones yet. You can control what script is used with the bootstrap_distro option.

@betamatt
Copy link
Contributor Author

betamatt commented Mar 7, 2014

@meekmichael
I fixed ssh so that it doesn't exception but it also isn't going to do anything useful. Getting it to actually be able contact the instances is a more substantial change to ironfan than I'm down for.

I looked at the region issue but couldn't immediately identify the problem. I have no experience with using ironfan in other regions and seem to be missing some bit of config to make it work at all. If you have a few hours, it would be great if you could look into this since you already have a known-good setup.

Michael Mittelstadt added 3 commits March 11, 2014 14:10
…og::Compute::AWS::Error: Duplicate => The permission \'1500008810-1-1-65535\' has already been authorized on the specified group"
@meekmichael
Copy link
Contributor

@betamatt, I've made a pull request to your repo with a region fix. It was an issue with get_slice() being hardcoded to use the EC2 cloud type.

@betamatt
Copy link
Contributor Author

@meekmichael merged. Thanks for the assist!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants