proposal: encoding/pem: add DecodeStrict #34069

tux21b · 2019-09-04T10:02:00Z

The "encoding/pem".Decode function in the standard library was designed to find and decode PEM blocks within arbitrary text (like emails according to a comment by rsc). Therefore, it tries to find BEGIN blocks anywhere and if parsing fails, it automatically backtracks and tries to decode the next block. No errors are reported.

Nowadays, this function is mainly used to decode TLS certificates as also stated by the package comment. Within this context, it is really annoying to not see any errors at all. Lot's of server software is written in Go, and invalid characters within those files (including white spaces or a windows line-ending at a wrong location) usually causes the software to silently ignore the certificate.

I currently work at a consulting company and we regularly have reports from clients that can be traced back to silently ignored errors when decoding PEM files and until now, I didn't know about this rather unusually behavior in the standard library either.

Can we please add a "encoding/pem.DecodeStrict" function for the common case of decoding certificates? We should probably also add a warning to the Decode function as well, because I think it is often used wrongly.

What version of Go are you using (`go version`)?

$ go version
go version go1.13 linux/amd64

The text was updated successfully, but these errors were encountered:

rsc · 2019-09-12T21:26:08Z

Right now we have:

func Decode(data []byte) (p *Block, rest []byte)

There's no way to say "this is a malformed block". This was definitely a mistake in retrospect.

The proposal is to add

func DecodeBlock(data []byte) (*Block, error)

where data is exactly a PEM-encoded block, with no non-blank text before or after. This isn't as useful for certificate chains (sequences of PEM-encoded things).

This doesn't seem quite right either. Maybe a callback interface that is passed text chunks and decoded blocks and decode errors?

/cc @bradfitz @FiloSottile @agl;
thoughts about what the best new PEM decoding API would be?

agl · 2019-09-12T21:58:20Z

PEM is intended to ignore everything that's not a PEM block. This is commonly used so that people can put comments around the, otherwise completely opaque, blocks and so that tools can extract them from text files etc. See https://pki.goog/roots.pem for an example of this. Adding a strict function would break this and misunderstands the point of PEM. Maybe PEM should have defined a clear comment syntax so that anything else could be rejected, but I'm afraid it didn't.

If there's single PEM block expected then not finding it is a clear indication of a problem. If there's an unknown number of PEM blocks expected, however, then the flexibility of PEM can be an issue. In that case, perhaps PEM isn't the right tool and you should just distribute binary DER files?

rsc · 2019-09-25T17:40:08Z

@agl, I agree with all that. But is it worth distinguishing "we found what looked like the beginning of a PEM block but not the end", or "the base64 data in the middle is not valid" or other kinds of problems? That is, should the API at least be able to distinguish "nothing is here" from "something starts here but is wrong"?

Maybe a Decode that returned all the blocks and an error would solve the error reporting problem and also relieve people of writing the loop?

agl · 2019-09-26T16:08:00Z

That is, should the API at least be able to distinguish "nothing is here" from "something starts here but is wrong"?

That would be sub-setting the file format, but there might be utility in doing that. The cost is that reality is defined by the intersection of what's accepted by all the common tools: so if Go considers pattern x to be indicative of a problem, and something else considers pattern y to be so, then nobody can do either x nor y in practice. Also, since these behaviours will be obscure, parties might only find out that they're hitting these issues late, when things are expensive to fix etc.

But, if broken PEM files are causing issues, that's a cost too which has to be balanced against.

So I still believe that a “strict” interface would be a mistake because that maximally favours the latter interest, but the current code follows normal PEM rules, which maximally favour the former.

Off the cuff, I could see that finding the substring “--- BEGIN ” in the input, but not part of a recognised PEM block, would be the sort of pattern that might strike a balance. Also making things up on the spot, since this is only an issue when an unknown number of PEM blocks are expected in an input:

// DecodeSeveral parses zero or more PEM blocks from input and returns them.
// It also detects patterns that indicate that a PEM block might have been
// intended, but which weren't part of a valid PEM block, and returns a sorted
// slice of the line numbers on this these concerns were found. If this slice is
// non-empty, one may wish to indicate to a human that the PEM input may
// be ill-formed and direct their attention to the indicated lines.
func DecodeSeveral(input []byte) (blocks []Block, contraindicationLines []int)

rsc · 2019-10-02T17:44:14Z

Based on the conversation here it seems like there is not really a consensus that this is a problem that needs to be solved in the standard library, nor what the solution would be.

I would suggest to @tux21b to write and publish a helper package that does what you think would work best and see if others feel the same way and start using it. That would give us more signal for whether something in the core encoding/pem is worth adding.

For now, this seems like a likely decline.

Leaving open for a week for final comments.

rsc · 2019-10-09T17:05:04Z

No final comments. Declined

anitgandhi · 2020-05-19T22:11:01Z

I came across this as I was in need of the functionality outlined in this discussion.

Usecase

We provide a managed certificates product that supports bring-your-own-certificate, which is then attached to cloud resources, which may use OpenSSL under the hood, which has stricter PEM requirements.

(One can replication the OpenSSL behavior with openssl crl2pkcs7 -nocrl -out /dev/null -certfile chain.pem)

We saw a situation where a customer provided a certificate chain with multiple PEM blocks, one or more of which were invalid. Our existing Go based validation did not catch any errors, while OpenSSL down the line did. i.e. something like this:

Note how the first certificate is otherwise valid, except the ending line, which makes it invalid; at which point, Decode just carries on - correctly skipping the terminal shell lines, and processes the last 2. As mentioned, OpenSSL is stricter, and errored out earlier.

Possible solution

I ended up creating a drop-in replacement that adds a new exported function DecodeStrict that returns an error on the first invalid PEM block found.

It moves all of the decode logic to an internal helper decodeWithErrorHandler otherwise unchanged. This helper takes in f which is a callback for custom error handling. Decode simply calls decodeWithErrorHandler(data, decodeError) , which maintains all existing behavior.

This wasn't exactly what was described above, nor is it necessarily something we want in the stdlib in its current form, but it was the smallest diff to get things working, and all existing tests pass as expected.

DecodeStrict as I implemented it is a bit of an all-or-nothing solution, so the API is still open to debate. But in the future if more granularity is needed, decodeWithErrorHandler could also be exported, and the callback could be extended in a way that allows for a more meaningful error to be bubbled up to the caller.

Proposal

Personally, I think this is worth solving in the stdlib rather than folks having to copy the implementation details of Decode and its helpers.

ianlancetaylor · 2020-05-19T23:40:24Z

@anitgandhi This issue is closed, for the reasons given above. As @rsc suggests above, the path forward is to create a helper package and see if people start using it. Thanks.

anitgandhi · 2020-05-19T23:42:28Z

Yeah I understand. I should have been clear that my intention was largely to provide a link and context to any future people that visit this issue; my bad on that.

Thanks!

ianlancetaylor · 2020-05-19T23:56:34Z

@anitgandhi Ah, OK, thanks.

ALTree changed the title ~~encoding/pem: add DecodeStrict~~ proposal: encoding/pem: add DecodeStrict Sep 4, 2019

gopherbot added this to the Proposal milestone Sep 4, 2019

gopherbot added the Proposal label Sep 4, 2019

andybons mentioned this issue Sep 12, 2019

proposal: review meeting minutes #33502

Open

rsc closed this as completed Oct 9, 2019

golang locked and limited conversation to collaborators May 19, 2021

gopherbot added the FrozenDueToAge label May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: encoding/pem: add DecodeStrict #34069

proposal: encoding/pem: add DecodeStrict #34069

tux21b commented Sep 4, 2019 •

edited

Loading

rsc commented Sep 12, 2019

agl commented Sep 12, 2019

rsc commented Sep 25, 2019

agl commented Sep 26, 2019

rsc commented Oct 2, 2019

rsc commented Oct 9, 2019

anitgandhi commented May 19, 2020 •

edited

Loading

ianlancetaylor commented May 19, 2020

anitgandhi commented May 19, 2020

ianlancetaylor commented May 19, 2020

proposal: encoding/pem: add DecodeStrict #34069

proposal: encoding/pem: add DecodeStrict #34069

Comments

tux21b commented Sep 4, 2019 • edited Loading

What version of Go are you using (go version)?

rsc commented Sep 12, 2019

agl commented Sep 12, 2019

rsc commented Sep 25, 2019

agl commented Sep 26, 2019

rsc commented Oct 2, 2019

rsc commented Oct 9, 2019

anitgandhi commented May 19, 2020 • edited Loading

Usecase

Possible solution

Proposal

ianlancetaylor commented May 19, 2020

anitgandhi commented May 19, 2020

ianlancetaylor commented May 19, 2020

tux21b commented Sep 4, 2019 •

edited

Loading

What version of Go are you using (`go version`)?

anitgandhi commented May 19, 2020 •

edited

Loading