Tool to extract rules from any CIS benchmark PDF, written in Go.
The tool had a success rate of 100% for all of the tested PDFs as of May 2022 (considering the amount of rules found and successful content extraction). Some rules might still be missed. Also, if there are structural changes made by CIS for the PDFs, this tool will need to be updated.
Supports CSV and YAML output, preserves linebreaks by default. I made this primarily to improve my Regex skills.
Please report any issues and create a pull request if you have any improvements :)
⚠️ Note: Some anti virus softwares detect Go executables as a trojan virus, especially on Windows machines (see official statement from Go). This is a false positive. Please check the source and build script for yourself if you want to be extra sure.
- Download the prebuilt executable for your OS (see Releases) or build from source yourself (see below)
- Add execution permissions to the file (
chmod +x <file>
) - Install poppler-utils:
sudo apt install poppler-utils
orsudo pacman -S poppler
- Run the tool in your favourite console
- Unblock the file in Powershell (
Unblock-File <file>
) - Install poppler-utils:
- Download + extract the latest package from here
- Add the bin/ directory to your $PATH
- Run the tool in your favourite console
- If you have any anti virus software, it might take some time for it to run (whitelist it / confirm the notification)
- Add execution permissions to the file (
chmod +x <file>
) - Install poppler-utils:
brew install poppler
- Run the tool in your favourite console
usage: cisextractor_windows_amd64.exe --in=IN [<flags>] Flags:
--help Show context-sensitive help (also try --help-long and --help-man).
-i, --in=IN Filepath to parse (PDF)
-o, --out=OUT Optional: Output to filepath (default: ./<pdfname>.<csv|yaml>)
-t, --trimsections Optional: Remove all new line characters from section content (default=disabled)
-c, --csv Optional: Export in CSV format (default=YAML)
-d, --details Optional: Shows details for identification errors if needed (default=disabled)
Additional notes:
- It can happen that some rules are not identified or there are issues with extracting their respective contents - especially when there are fundamental changes to the PDF files that break the tool's (quite complex) regular expressions. No issues were found for the latest CIS PDFs as of June 2022.
- Formatting of section contents does not always look good, even though lots of processing is done
- Use the
-t
flag if you want to remove all linebreaks (this will lead to confusion when actually reading the content for some sections)
- Use the
- Formatting of the "CIS Controls" table is not good yet - maybe additional processing will be added in the future
- ONLY the pdftotext executable found in poppler-utils will work!
- Add the
-c
flag when running the tool to use CSV mode - In Excel, go to the Data tab and select From text/CSV then choose the generated CSV file
- In the import dialogue, choose UTF-8 Unicode as charset, make sure Comma is selected as delimiter and then click Transform data
- In the PowerQuery editor, select the first column (ID), go to the Transform tab and then choose Text as the Data type
- Return to the Start tab and hit Close & load to load the data to the current worksheet
- This tool was developed with Go version 1.17
- Simply clone this repository, run
go mod tidy
in the source folder and you should be good to go.- In case of issue, delete go.mod and run
go mod init <modulename>
first
- In case of issue, delete go.mod and run
- Use
go run cisextractor.go
to run directly orgo build
to build the executable for your OS- You can build for other architectures using the usual Go flow.