-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review CertificateParser to support new tika "x-x509-cert" contentType. #1978
Labels
Comments
patrickdalla
added a commit
that referenced
this issue
Nov 14, 2023
classify DER and PEM encoded certificates as this kind of mimetype. Other minor formatting changes were included.
Thanks @patrickdalla. I'll try to crawl certificate samples to test CertificateParser, so we can enable it by default if everything seems good. |
patrickdalla
added a commit
that referenced
this issue
Nov 14, 2023
to be expanded and have its certificates extracted as subitems.
patrickdalla
added a commit
that referenced
this issue
Nov 14, 2023
patrickdalla
added a commit
that referenced
this issue
Nov 14, 2023
as subitems if in format PKCS7 and to be used in conjunction with tika PKCS7Parser.
patrickdalla
added a commit
that referenced
this issue
Nov 14, 2023
patrickdalla
added a commit
that referenced
this issue
Nov 14, 2023
patrickdalla
added a commit
that referenced
this issue
Nov 14, 2023
patrickdalla
added a commit
that referenced
this issue
Nov 14, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have just reviewed Pkcs7Parser code from tika.
Pkcs7 is a container spec to hold content and its signature info in same file/stream. Pkcs7Parser of tika only strips/ignores the signature and delegate the content parsing to the corresponding parser. Pkcs7Parser doesn't parse any signature and respectives certification information.
Pkcs7 is most used to save certification revogation list and certificate files itself (when included with entire certificates of certification path). The CertificateParser uses java.security.cert.CertificateFactory that can extract the certificates these files PKCS7 formatted contains.
PKCS7 is not the format of the certificate used to sign the APK.
It seems from https://issues.apache.org/jira/browse/TIKA-3205, code done after the implementation of CertificateParser, that TIKA didn't classified PEM and DER files as "x-x509-ca-cert". But now it do.
I have created in CertificateParser "application/x-pem-file" and "application/pkix-cert" mime-types to identify this kind of content, but now it seems it can use the new "application/x-x509-ca-cert" identified by Tika.
The text was updated successfully, but these errors were encountered: