-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: encoding/pem: add DecodeStrict #34069
Comments
Right now we have:
There's no way to say "this is a malformed block". This was definitely a mistake in retrospect. The proposal is to add
where data is exactly a PEM-encoded block, with no non-blank text before or after. This isn't as useful for certificate chains (sequences of PEM-encoded things). This doesn't seem quite right either. Maybe a callback interface that is passed text chunks and decoded blocks and decode errors? /cc @bradfitz @FiloSottile @agl; |
PEM is intended to ignore everything that's not a PEM block. This is commonly used so that people can put comments around the, otherwise completely opaque, blocks and so that tools can extract them from text files etc. See https://pki.goog/roots.pem for an example of this. Adding a strict function would break this and misunderstands the point of PEM. Maybe PEM should have defined a clear comment syntax so that anything else could be rejected, but I'm afraid it didn't. If there's single PEM block expected then not finding it is a clear indication of a problem. If there's an unknown number of PEM blocks expected, however, then the flexibility of PEM can be an issue. In that case, perhaps PEM isn't the right tool and you should just distribute binary DER files? |
@agl, I agree with all that. But is it worth distinguishing "we found what looked like the beginning of a PEM block but not the end", or "the base64 data in the middle is not valid" or other kinds of problems? That is, should the API at least be able to distinguish "nothing is here" from "something starts here but is wrong"? Maybe a Decode that returned all the blocks and an error would solve the error reporting problem and also relieve people of writing the loop? |
That would be sub-setting the file format, but there might be utility in doing that. The cost is that reality is defined by the intersection of what's accepted by all the common tools: so if Go considers pattern x to be indicative of a problem, and something else considers pattern y to be so, then nobody can do either x nor y in practice. Also, since these behaviours will be obscure, parties might only find out that they're hitting these issues late, when things are expensive to fix etc. But, if broken PEM files are causing issues, that's a cost too which has to be balanced against. So I still believe that a “strict” interface would be a mistake because that maximally favours the latter interest, but the current code follows normal PEM rules, which maximally favour the former. Off the cuff, I could see that finding the substring “--- BEGIN ” in the input, but not part of a recognised PEM block, would be the sort of pattern that might strike a balance. Also making things up on the spot, since this is only an issue when an unknown number of PEM blocks are expected in an input:
|
Based on the conversation here it seems like there is not really a consensus that this is a problem that needs to be solved in the standard library, nor what the solution would be. I would suggest to @tux21b to write and publish a helper package that does what you think would work best and see if others feel the same way and start using it. That would give us more signal for whether something in the core encoding/pem is worth adding. For now, this seems like a likely decline. Leaving open for a week for final comments. |
No final comments. Declined |
I came across this as I was in need of the functionality outlined in this discussion. UsecaseWe provide a managed certificates product that supports bring-your-own-certificate, which is then attached to cloud resources, which may use OpenSSL under the hood, which has stricter PEM requirements. (One can replication the OpenSSL behavior with We saw a situation where a customer provided a certificate chain with multiple PEM blocks, one or more of which were invalid. Our existing Go based validation did not catch any errors, while OpenSSL down the line did. i.e. something like this:
Note how the first certificate is otherwise valid, except the ending line, which makes it invalid; at which point, Possible solutionI ended up creating a drop-in replacement that adds a new exported function It moves all of the decode logic to an internal helper This wasn't exactly what was described above, nor is it necessarily something we want in the stdlib in its current form, but it was the smallest diff to get things working, and all existing tests pass as expected.
ProposalPersonally, I think this is worth solving in the stdlib rather than folks having to copy the implementation details of |
@anitgandhi This issue is closed, for the reasons given above. As @rsc suggests above, the path forward is to create a helper package and see if people start using it. Thanks. |
Yeah I understand. I should have been clear that my intention was largely to provide a link and context to any future people that visit this issue; my bad on that. Thanks! |
@anitgandhi Ah, OK, thanks. |
The
"encoding/pem".Decode
function in the standard library was designed to find and decode PEM blocks within arbitrary text (like emails according to a comment by rsc). Therefore, it tries to find BEGIN blocks anywhere and if parsing fails, it automatically backtracks and tries to decode the next block. No errors are reported.Nowadays, this function is mainly used to decode TLS certificates as also stated by the package comment. Within this context, it is really annoying to not see any errors at all. Lot's of server software is written in Go, and invalid characters within those files (including white spaces or a windows line-ending at a wrong location) usually causes the software to silently ignore the certificate.
I currently work at a consulting company and we regularly have reports from clients that can be traced back to silently ignored errors when decoding PEM files and until now, I didn't know about this rather unusually behavior in the standard library either.
Can we please add a
"encoding/pem.DecodeStrict
" function for the common case of decoding certificates? We should probably also add a warning to theDecode
function as well, because I think it is often used wrongly.What version of Go are you using (
go version
)?The text was updated successfully, but these errors were encountered: