Skip to content
This repository has been archived by the owner on Jan 9, 2023. It is now read-only.

Adds management of ssh_known_hosts file proposal #643

Merged
merged 4 commits into from
Dec 20, 2018

Conversation

JoshVanL
Copy link
Contributor

What this PR does / why we need it:
As we are replacing the ssh command line tool for all programmatic use cases in favor of an in package solution, we have discovered some problems managing the ssh_known_hosts file correctly. This proposal aims to address these.

Management of SSH known_hosts file Proposal

Signed-off-by: JoshVanL <vleeuwenjoshua@gmail.com>
@jetstack-bot jetstack-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 30, 2018
| public_key | ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBPF+xkIGMUNVI0gElRaTLjfA4QMN/XGJhHswDyv59DNSOtG3KwZvDF3YkAb0PkTQAYo8N5fxoKqimGugOAaefPc= |
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The population of this tag will occur during a Terraform apply in which the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't tarmak also use 'autoscaling groups' to autoprovision instances? How will these have the tag populated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're quite right, this is a major problem...
Originally I thought of having Wing in charge of tagging it's own instance but that's not great keeping the instance with that power. It is possible to create a Cloud Watch Event that can call a lamda function that would revoke the privilege after it's first call. That is a lot of moving parts though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the tricky bit here. In theory the nodes could self update it's tags. But I think the AWS IAM Policies can't be configured that tight to allow only self update. This also has some issues, once the node is overtaken and other tags (e.g. tarmak_role) are no longer safe.

Another problem is that the amount of tags to be stored on an AWS instance cannot exceed 255 chars. Also we have more than just the ecdsa host key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we have to look further than AWS. Not every provider will have support for tags. I think we should come up with a universal solution for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree, IMO taging instances on AWS would be the best way to couple the public keys with the instance life cycle. If we keep the SSH config with,
HostKeyCallback: ssh.tarmak.Provider().HostKeyCallBack,
then each provider can have it's specific implementation.
This is assuming all targeted providers will have some kind of instance metasdata we can pull and push from

@JoshVanL
Copy link
Contributor Author

JoshVanL commented Dec 3, 2018

/assign
/cc @alljames @MattiasGees

Copy link
Contributor

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comments suggest, terraform is not the right place to update host keys per instance as of the behaviour of ASG groups.

Furthermore I don't think we should update the tags directly from the instance itself, as it might have other implications (e.g. retagging workers to masters)

I would suggest we investigate how we could do that setup using lambda and format the tags in a way that we break it up according to the aws limits

/assign @JoshVanL
/unassign

| public_key | ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBPF+xkIGMUNVI0gElRaTLjfA4QMN/XGJhHswDyv59DNSOtG3KwZvDF3YkAb0PkTQAYo8N5fxoKqimGugOAaefPc= |
+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The population of this tag will occur during a Terraform apply in which the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the tricky bit here. In theory the nodes could self update it's tags. But I think the AWS IAM Policies can't be configured that tight to allow only self update. This also has some issues, once the node is overtaken and other tags (e.g. tarmak_role) are no longer safe.

Another problem is that the amount of tags to be stored on an AWS instance cannot exceed 255 chars. Also we have more than just the ecdsa host key.

@simonswine simonswine added this to the release-0.6 milestone Dec 3, 2018
Signed-off-by: JoshVanL <vleeuwenjoshua@gmail.com>
@jetstack-bot jetstack-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 4, 2018
@JoshVanL
Copy link
Contributor Author

JoshVanL commented Dec 6, 2018

/unassign
/assign @simonswine

@jetstack-bot jetstack-bot assigned simonswine and unassigned JoshVanL Dec 6, 2018
@MattiasGees
Copy link
Member

MattiasGees commented Dec 6, 2018

TBH I don't like the solution with the lambda function. This is really AWS specific and I know we can always have a specific implementation. But IMHO this is reinventing the wheel.
Why don't we use Hashicorp Vault for generating a CA that can be used for the host verification key? Both the user and server will need to trust that CA and can use that CA to verify the connection.

With this we would end up with a solution that works everywhere.

Some more information: https://jameshfisher.com/2018/03/16/how-to-create-an-ssh-certificate-authority.html

And after talking quickly with @JoshVanL this will not works as well as we connect to Vault with SSH :(

Copy link
Contributor

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress @JoshVanL. See a few more comments in general ok

/assign @JoshVanL
/unassign

following:

+-------------------------+---------------------------------------------------------------------------+
| PublicKey_ssh-ed25519_0 | AAAAC3NzaC1lZDI1NTE5AAAAIE90XYYm6GSDlNGejM+aY5dZEe5vK4XyU++89WdGJcDc==EOF |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't really like the format here: maybe we do something like that:
tarmak.io/ssh-host-ed255519-pub-0
tarmak.io/ssh-host-ecdsa-pub-0
tarmak.io/ssh-host-rsa-pub-0

I suggest also we don't worry about dsa

file management.

In order to create a source of truth for each host's public key, each instance
will have it's public key's attached as a tags, shortly after boot time like the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also some considerations regarding if we can make sure terraform is not removing those tags every time we run it.

This is esp important for aws_instance resources. Maybe https://www.terraform.io/docs/configuration/resources.html#ignore_changes is good enough

via an Amazon Auto Scaling Group. At execution time, Wing - present on every
instance - will invoke an Amazon Lambda function for Instance Tagging. Passed to
this function will be a collection of the instances public keys, it's Amazon
identity document and matching PKCS7 document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the instance document provides some assurances that it is an actual instance from AWS we should not done the limitations of that in the proposal:

  • There is no cryptographic signature that the public keys come actually from the host that is sending the request
  • The lambda function has all the information acting like the instance to someone else with that document (e.g. logging into vault with the same method).

The instance document could be coming form another instance and

@jetstack-bot jetstack-bot assigned JoshVanL and unassigned simonswine Dec 11, 2018
Signed-off-by: JoshVanL <vleeuwenjoshua@gmail.com>
@jetstack-bot jetstack-bot added the kind/documentation Categorizes issue or PR as related to documentation. label Dec 14, 2018
Signed-off-by: JoshVanL <vleeuwenjoshua@gmail.com>
@JoshVanL
Copy link
Contributor Author

/unassign
/assign @simonswine

@jetstack-bot jetstack-bot assigned simonswine and unassigned JoshVanL Dec 14, 2018
@simonswine
Copy link
Contributor

Thank you josh. I am having some issues with it's portability as well (concerns Mattias mentioned), but I can't really see any short term work arounds. Esp at bootstrap it's really hard to do that correct

/approve
/lgtm

@simonswine
Copy link
Contributor

/approve
/lgtm

@jetstack-bot jetstack-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 20, 2018
@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: simonswine

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 20, 2018
@jetstack-bot jetstack-bot merged commit fa1fb1a into jetstack:master Dec 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/documentation Categorizes issue or PR as related to documentation. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants