IOC Parser is a tool to extract indicators of compromise from security reports in PDF format. A good collection of APT related reports with many IOCs can be found here: APTNotes. [-h] [-p INI] [-i FORMAT] [-o FORMAT] [-d] [-l LIB] FILE
- FILE File/directory path to report(s)
- -p INI Pattern file
- -i FORMAT Input format (pdf/txt/html)
- -o FORMAT Output format (csv/json/yara)
- -d Deduplicate matches
- -l LIB Parsing library
Import IOC_Parser and create iocp object with 'data' output format. 'data' output format allows you to get any parsed IOCs as a dict.
from ioc_parser.iocp import IOC_Parser
iocp = IOC_Parser(output_format='data')
Adding a host to a whitelist after creating iocp object. IOC_Parser constructor parses any whitelist_*.ini files supplied in the basedir, but this allows you to add whitelists inline.
whitelist_host_str = "{}$".format("")
whitelist_dict = {"Host": whitelist_host_str}
wl = WhiteList(whitelist_dict=whitelist_dict)
Open a file and pass the file object and path to the parse_pdf_pdfminer method. This specifies which pdf parser to use, alternatively you can specify which pdf parser to use in the IOC_Parser constructor and use parse_pdf here. Or use the default pdf parser.
with open(pdf_path, "rb") as f:
iocp.parse_pdf_pdfminer(f, pdf_path)
iocs = iocp.handler.get_iocs() # Returns a dictionary of any IOCs found
returns a dictionary in the following format:
"Email": {
"file": "report1.pdf",
"match": "",
"page": 4,
"path": "./downloaded_files/report1.pdf",
"type": "Email"
"IP": {
"file": "report1.pdf",
"match": "",
"page": 8,
"path": "./downloaded_files/report1.pdf",
"type": "IP"
One of the following PDF parsing libraries:
For HTML parsing support:
- BeautifulSoup - pip install beautifulsoup4
For HTTP(S) support:
- requests - pip install requests