Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle newlines in serial files #18405

Merged

Conversation

enj
Copy link
Contributor

@enj enj commented Feb 2, 2018

Correctly handle newlines in SerialFileGenerator

This change makes it so that SerialFileGenerator correctly reads and writes serial files that end in a newline. This allows it to interoperate with other tools that may interact with the serial file such as openssl. Tests were added to assert that the incrementing logic works and that the serial file always ends with a newline.

Bug 1512825

Signed-off-by: Monis Khan mkhan@redhat.com


/kind bug
/assign @simo5 @deads2k
@openshift/sig-security

@openshift-ci-robot openshift-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/security size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 2, 2018
@enj enj changed the title Do not overwrite by default in create-{server|signer}-cert Correctly handle newlines in serial files Feb 2, 2018
@@ -76,7 +76,7 @@ func NewCommandCreateServerCert(commandName string, fullName string, out io.Writ
flags.StringVar(&options.KeyFile, "key", "", "The key file. Choose a name that indicates what the service is.")

flags.StringSliceVar(&options.Hostnames, "hostnames", options.Hostnames, "Every hostname or IP you want server certs to be valid for. Comma delimited list")
flags.BoolVar(&options.Overwrite, "overwrite", true, "Overwrite existing cert files if found. If false, any existing file will be left as-is.")
flags.BoolVar(&options.Overwrite, "overwrite", false, "Overwrite existing cert files if found. If false, any existing file will be left as-is.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably breaking. It means things that run this for cert renewal/rotation will break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to understand the logic behind this.
Is this affecting all kind of certs equally ?
I would definitely be ok overwriting server certs, everything in the infra ideally needs to be able to work when a cert rotates.
However if this utility also deal woth the CA cert or other root-of-trust related certs, then a default overwrite is a sharp blade wth no handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the help text from current master:

$ oc adm ca create-server-cert -h
Create a key and server certificate 

Create a key and server certificate valid for the specified hostnames, signed by the specified CA. These are useful for
securing infrastructure components such as the router, authentication server, etc. 

Example: Creating a secure router certificate. 

  CA=openshift.local.config/master
      oc adm ca create-server-cert --signer-cert=$CA/ca.crt \
            --signer-key=$CA/ca.key --signer-serial=$CA/ca.serial.txt \
            --hostnames='*.cloudapps.example.com' \
            --cert=cloudapps.crt --key=cloudapps.key
  cat cloudapps.crt cloudapps.key $CA/ca.crt > cloudapps.router.pem

Usage:
  oc adm ca create-server-cert [options]

Options:
      --cert='': The certificate file. Choose a name that indicates what the service is.
      --expire-days=730: Validity of the certificate in days (defaults to 2 years). WARNING: extending this above default value is highly discouraged.
      --hostnames=[]: Every hostname or IP you want server certs to be valid for. Comma delimited list
      --key='': The key file. Choose a name that indicates what the service is.
      --overwrite=true: Overwrite existing cert files if found.  If false, any existing file will be left as-is.
      --signer-cert='openshift.local.config/master/ca.crt': The certificate file.
      --signer-key='openshift.local.config/master/ca.key': The key file.
      --signer-serial='openshift.local.config/master/ca.serial.txt': The serial file that keeps track of how many certs have been signed.

Use "oc adm options" for a list of global command-line options (applies to all commands).
$ oc adm ca create-signer-cert -h 
Create a self-signed CA key/cert for signing certificates used by server components.

Usage:
  oc adm ca create-signer-cert [options]

Options:
      --cert='openshift.local.config/master/ca.crt': The certificate file.
      --expire-days=1825: Validity of the certificate in days (defaults to 5 years). WARNING: extending this above default value is highly discouraged.
      --key='openshift.local.config/master/ca.key': The key file.
      --name='openshift-signer@1517594420': The name of the signer.
      --overwrite=true: Overwrite existing cert files if found.  If false, any existing file will be left as-is.
      --serial='openshift.local.config/master/ca.serial.txt': The serial file that keeps track of how many certs have been signed.

Use "oc adm options" for a list of global command-line options (applies to all commands).

I guess it may be OK to overwrite since it is not the root CA bits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sounds like overwrite = true is not the end of the world here.
So perhaps it is better to keep backwards compatibility

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with leaving overwrite defaulting to true. the rest looks OK to me

@@ -351,6 +337,16 @@ func NewSerialFileGenerator(serialFile string, createIfNeeded bool) (*SerialFile
func (s *SerialFileGenerator) Next(template *x509.Certificate) (int64, error) {
s.lock.Lock()
defer s.lock.Unlock()

// do a best effort check to make sure concurrent external writes are not occurring to the underlying serial file
serial, err := fileToSerial(s.SerialFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems futile… we know this utility isn't safe to run concurrently. Making the occurrences of races harder to notice doesn't really make it actually safe. Randomizing the serial would be better if we want to allow concurrent running

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to lock the serial while the utility runs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the occurrences of races harder to notice doesn't really make it actually safe.

How does this do that? All it does is blow up when it definitely detects that something is writing to the same file. Thus, instead of always silently ignoring and overwriting changes to the underlying file, it has some chance of noticing so the user is clearly aware that they are doing something wrong. If it fails to detect concurrent writes, it is still no worse than not checking at all.

Randomizing the serial would be better if we want to allow concurrent running

I do not intend on allowing concurrent running - I just want to fail hard if I notice concurrent running.

Is there any way to lock the serial while the utility runs ?

I am unaware of something like that. Would not that be operation system / file system specific?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how locking is done. A traditional crude way of doing portable whole-file locking is to create a lock file. The problem with a lockfile is the need to manually clean it up if the tool that creates it dies before proper cleanup, so it is not a great tool.
Given here you are just detecting IMO your approach is fine as it is.

@@ -161,6 +197,9 @@ os::cmd::expect_success_and_text \
--expire-days=$((365*3))" \
'WARNING: .* is greater than 2 years'

os::cmd::expect_success_and_text "cat '${CERT_DIR}/ca.serial.txt'" '08'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd test only first 1-2 iterations. What is the value of testing the same for 8 times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because different commands are interleaved throughout that are all incrementing this number.

This change makes it so that SerialFileGenerator correctly reads and
writes serial files that end in a newline.  This allows it to
interoperate with other tools that may interact with the serial file
such as openssl.  Tests were added to assert that the incrementing
logic works and that the serial file always ends with a newline.

Bug 1512825

Signed-off-by: Monis Khan <mkhan@redhat.com>
@enj enj force-pushed the enj/i/ca_serial_number/1512825 branch from 8d2e78b to 266aa46 Compare February 14, 2018 18:42
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 14, 2018
@simo5
Copy link
Contributor

simo5 commented Feb 14, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 14, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enj, simo5

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@enj
Copy link
Contributor Author

enj commented Feb 14, 2018

/retest

@enj
Copy link
Contributor Author

enj commented Feb 14, 2018

/retest

2 similar comments
@enj
Copy link
Contributor Author

enj commented Feb 14, 2018

/retest

@enj
Copy link
Contributor Author

enj commented Feb 14, 2018

/retest

openshift-merge-robot added a commit to openshift/openshift-ansible that referenced this pull request Feb 14, 2018
Automatic merge from submit-queue.

Revert "Bug 1512825 - add mux pod failed for Serial number 02 has already been issued"

@sdodson @nhosoi This reverts #6798 as we are not changing the default in origin openshift/origin#18405.

This reverts commit ac23e6e.

That commit introduced a backwards incompatible change to how the commands run.  This undoes that.  The original change was not required to prevent overwriting of the serial file.

Bug 1512825
@enj
Copy link
Contributor Author

enj commented Feb 15, 2018

/retest

3 similar comments
@enj
Copy link
Contributor Author

enj commented Feb 15, 2018

/retest

@enj
Copy link
Contributor Author

enj commented Feb 15, 2018

/retest

@enj
Copy link
Contributor Author

enj commented Feb 15, 2018

/retest

@enj
Copy link
Contributor Author

enj commented Feb 16, 2018

@simo5 @mrogers950 @tiran @sdodson @liggitt @stevekuznetsov

Maybe one of you guys has an idea on this error which is happening consistently after the change from this PR in this task:

https://github.com/openshift/openshift-ansible/blob/a5289b2c460402ad3b2e49751dbd7e50d560b2dc/roles/openshift_service_catalog/tasks/start_api_server.yml#L13

curl: (35) You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.

The solution seems to be use a different version of curl or different backend or clear the cert cache? https://curl.haxx.se/mail/lib-2017-08/0089.html

Is it safe to reset the NSS db in the ansible task?

@tiran
Copy link
Contributor

tiran commented Feb 20, 2018

@enj you are correct. Fedora, CentOS and RHEL build curl with NSS as backend. NSS uses a database for certs instead of PEM files on the file system. NSS complains that you have multiple certs with the same issuer name and serial.

Is the NSS database a private database owned by the test or a system-wide database? In the first case you are safe to remove the DB and create a new one. In both cases you could remove the old cert first, but you need to know the nick name:

certutil -d /path/to/nssdb/directory -D -n "nick name"

In case it's a cert with private key, you can use -F instead of -D. It will remove the private key and cert together. For non-RSA certs, you also need to -k option for key type. For encrypted NSS DB, you have to store the password in a file and use -f /path/to/password_file.

enj added a commit to enj/openshift-ansible that referenced this pull request Feb 20, 2018
This should prevent issues with curl getting upset with certs that have the same serial number and name.

See openshift/origin#18405
openshift-merge-robot added a commit to openshift/openshift-ansible that referenced this pull request Feb 20, 2018
Automatic merge from submit-queue.

Update API healthz check to use uri module

This should prevent issues with curl getting upset with certs that have the same serial number and name.

See openshift/origin#18405

/assign @sdodson
@enj
Copy link
Contributor Author

enj commented Feb 20, 2018

/retest

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 21, 2018

@enj: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/cmd 266aa46 link /test cmd

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue (batch tested with PRs 18404, 18405).

@openshift-merge-robot openshift-merge-robot merged commit 7642cc8 into openshift:master Feb 21, 2018
vrutkovs pushed a commit to vrutkovs/openshift-ansible that referenced this pull request Feb 26, 2018
This should prevent issues with curl getting upset with certs that have the same serial number and name.

See openshift/origin#18405
@simo5
Copy link
Contributor

simo5 commented Feb 26, 2018

@tiran not sure if this makes any difference but in Fedora Rawhide curl has been switched back top OpenSSL, so things may change in future ...

vrutkovs pushed a commit to vrutkovs/openshift-ansible that referenced this pull request Mar 8, 2018
This should prevent issues with curl getting upset with certs that have the same serial number and name.

See openshift/origin#18405
leseb pushed a commit to leseb/openshift-ansible that referenced this pull request Dec 14, 2018
This should prevent issues with curl getting upset with certs that have the same serial number and name.

See openshift/origin#18405
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. sig/security size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants