Skip to content

A Python 3 library for exploring EPSS scores with Polars

Notifications You must be signed in to change notification settings

whitfieldsdad/epss

Repository files navigation

Exploit Prediction Scoring System (EPSS) tooling

This repository contains a lightning-fast Python 3 module and a series of bash scripts that are designed to make it easy for anyone to work with the daily outputs of the Exploit Prediction Scoring System (EPSS).

⚠️ This project is under active development ⚠️

Features

  • Idempotently download daily sets of EPSS scores1 in JSON, JSONL, CSV, or Apache Parquet2 format
  • Explore EPSS scores using Polars, a lightning-fast dataframe library written in Rust
  • Optionally drop unchanged scores3
  • Optionally disable TLS certificate validation when downloading scores (i.e. to support environments where TLS MitM is being performed)
  • Easily switch between different versions4 of the EPSS model

1. By default, EPSS scores will be downloaded from 2023-03-07 onward, as this is the date when the outputs of EPSS v3 (v2023.03.01) were first published.

2. Apache Parquet is the default file format.

3. The Cyentia Institute publishes sets of EPSS scores partitioned by date on a daily basis in GZIP compressed CSV format.

4. EPSS has undergone 3 major revisions: EPSS v1, EPSS v2 (v2022.01.01), and EPSS v3 (v2023.03.01) where the first, second, and third revisions all contain major improvements.

Background

The Exploit Prediction Scoring System (EPSS) is a probabilistic model that is designed to predict the likelihood of a given computer security vulnerability being exploited somewhere in the wild within the next 30 days.

The first version of the EPSS model was released in 2021, and it has since undergone two major revisions.

The first version of the EPSS model used logistic regression, but subsequent models have used gradient-boosted decision trees (XGBoost) to make predictions.

For additional information on EPSS and its applications, please consult the following resources:

Additional resources:

Usage

Building

This package is not currently available on PyPi, but can be easily added to your project in one of two ways:

  • Using poetry1:
poetry add git+https://github.com/whitfieldsdad/epss.git

By branch:

poetry add git+https://github.com/whitfieldsdad/epss.git#main

By tag:

poetry add git+https://github.com/whitfieldsdad/epss.git#v3.0.0
  • Using requirements.txt:

By tag:

git+https://github.com/whitfieldsdad/epss@releases/tag/v3.0.0

By branch:

git+git+https://github.com/owner/repo@main

1. Using Poetry for dependency management and adding this project as a dependency of your project without explicitly specifying a branch or tag is recommended.

Command line interface

Listing scores published between two dates

To list1 all scores published since 2024 without dropping unchanged scores2:

poetry run epss scores -a 2024-01-01 --no-drop-unchanged | head
shape: (7_992_196, 4)
┌──────────────────┬─────────┬────────────┬────────────┐
│ cve              ┆ epss    ┆ percentile ┆ date       │
│ ---              ┆ ---     ┆ ---        ┆ ---        │
│ str              ┆ f64     ┆ f64        ┆ date       │
╞══════════════════╪═════════╪════════════╪════════════╡
│ CVE-2019-2725    ┆ 0.97572 ┆ 1.0        ┆ 2024-01-01 │
│ CVE-2019-1653    ┆ 0.97567 ┆ 1.0        ┆ 2024-01-01 │
│ CVE-2015-7297    ┆ 0.97564 ┆ 0.99999    ┆ 2024-01-01 │
│ CVE-2014-6271    ┆ 0.97564 ┆ 0.99999    ┆ 2024-01-01 
...
poetry run epss scores -a 2024-01-01 --drop-unchanged | head
shape: (33_592, 4)
┌──────────────────┬─────────┬────────────┬────────────┐
│ cve              ┆ epss    ┆ percentile ┆ date       │
│ ---              ┆ ---     ┆ ---        ┆ ---        │
│ str              ┆ f64     ┆ f64        ┆ date       │
╞══════════════════╪═════════╪════════════╪════════════╡
│ CVE-2019-1653    ┆ 0.97555 ┆ 0.99998    ┆ 2024-01-03 │
│ CVE-2020-14750   ┆ 0.97544 ┆ 0.99995    ┆ 2024-01-03 │
│ CVE-2013-2423    ┆ 0.97512 ┆ 0.99983    ┆ 2024-01-03 │
│ CVE-2019-19781   ┆ 0.97485 ┆ 0.99967    ┆ 2024-01-03 │
...

The --output-format argument can be used to change the output format.

For example, to list scores in CSV format:

poetry run epss scores -a 2024-01-01 --drop-unchanged --output-format=csv | head
cve,epss,percentile,date
CVE-2019-1653,0.97555,0.99998,2024-01-03
CVE-2020-14750,0.97544,0.99995,2024-01-03
CVE-2013-2423,0.97512,0.99983,2024-01-03
CVE-2019-19781,0.97485,0.99967,2024-01-03
CVE-2019-1652,0.9747,0.99959,2024-01-03
CVE-2013-1559,0.9728,0.99833,2024-01-03
CVE-2019-3398,0.9722,0.99798,2024-01-03
CVE-2019-1458,0.97194,0.99782,2024-01-03
CVE-2020-7209,0.9719,0.99778,2024-01-03
...

To save the output to a CSV file, you could use shell redirection, or the --output-file flag:

poetry run epss scores -a 2024-01-01 --drop-unchanged --output-format=csv --output-file 2024-01-01.csv
du -sh 2024-01-01.csv
1.3M    2024-01-01.csv

Or, in JSONL format:

poetry run epss scores -a 2024-01-01 --drop-unchanged --output-format=jsonl | head | jq -c
{"cve":"CVE-2019-1653","epss":0.97555,"percentile":0.99998,"date":"2024-01-03"}
{"cve":"CVE-2020-14750","epss":0.97544,"percentile":0.99995,"date":"2024-01-03"}
{"cve":"CVE-2013-2423","epss":0.97512,"percentile":0.99983,"date":"2024-01-03"}
{"cve":"CVE-2019-19781","epss":0.97485,"percentile":0.99967,"date":"2024-01-03"}
{"cve":"CVE-2019-1652","epss":0.9747,"percentile":0.99959,"date":"2024-01-03"}
{"cve":"CVE-2013-1559","epss":0.9728,"percentile":0.99833,"date":"2024-01-03"}
{"cve":"CVE-2019-3398","epss":0.9722,"percentile":0.99798,"date":"2024-01-03"}
{"cve":"CVE-2019-1458","epss":0.97194,"percentile":0.99782,"date":"2024-01-03"}
{"cve":"CVE-2020-7209","epss":0.9719,"percentile":0.99778,"date":"2024-01-03"}
{"cve":"CVE-2021-43798","epss":0.97105,"percentile":0.99734,"date":"2024-01-03"}

From here, it's easy to see when specific vulnerabilities experienced an increase or decrease in their perceived exploitability:

poetry run epss scores --drop-unchanged --output-format=jsonl | 
grep "CVE-2016-0060" | jq -c
{"cve":"CVE-2016-0060","epss":0.07609,"percentile":0.931,"date":"2023-04-04"}
{"cve":"CVE-2016-0060","epss":0.12376,"percentile":0.94566,"date":"2023-05-13"}
{"cve":"CVE-2016-0060","epss":0.51531,"percentile":0.97065,"date":"2023-06-19"}
{"cve":"CVE-2016-0060","epss":0.66813,"percentile":0.9746,"date":"2023-07-23"}
{"cve":"CVE-2016-0060","epss":0.7155,"percentile":0.97673,"date":"2023-09-28"}
{"cve":"CVE-2016-0060","epss":0.71177,"percentile":0.97697,"date":"2023-10-31"}
{"cve":"CVE-2016-0060","epss":0.7436,"percentile":0.97832,"date":"2023-12-03"}
{"cve":"CVE-2016-0060","epss":0.76991,"percentile":0.97928,"date":"2024-01-04"}
{"cve":"CVE-2016-0060","epss":0.828,"percentile":0.98183,"date":"2024-02-05"}

1. When querying historical sets of EPSS scores, any scores that have not already been downloaded will be downloaded automatically to a configurable working directory3. You do not have to explicitly download EPSS scores before querying them.

2. Unchanged scores are dropped by default - this behaviour can be toggled using the --drop-unchanged/--no-drop-unchanged flags.

3. If a working directory is not explicitly provided, scores will be written to a folder named 476c9b0d-79c6-4b7e-a31a-e18cec3d6444/epss/scores-by-date within the system's temporary directory (e.g. /var/folders/ps/c0fn47n54sg08wck9_x9qncr0000gp/T/476c9b0d-79c6-4b7e-a31a-e18cec3d6444/epss/scores-by-date/).

Download scores published between two dates

To download scores published between two dates without writing to the console, simply add the --download flag1:

poetry run epss scores -a 2024-01-01 --download

1. Unchanged scores will still be saved to disk regardless of the value of the --drop-unchanged/--no-drop-unchanged flags.

Python

Additional examples are available in the examples folder.

Load unique EPSS scores into Polars

To load EPSS scores into Polars:

from epss.client import PolarsClient

import polars as pl
import tempfile
import os

cfg = pl.Config()
cfg.set_tbl_rows(-1)    # Unlimited output length

WORKDIR = os.path.join(tempfile.gettempdir(), 'epss')

client = PolarsClient(
    include_v1_scores=False,
    include_v2_scores=False,
    include_v3_scores=True,
)
df = client.get_scores(workdir=WORKDIR, drop_unchanged_scores=True)
print(df)

Generating a spreadsheet of changed EPSS scores

To generate a spreadsheet containing the EPSS scores of all CVEs known to be exploitable using FireEye's leaked red team tools:

from xlsxwriter import Workbook
from epss.client import PolarsClient, Query

import tempfile
import os

WORKDIR = os.path.join(tempfile.gettempdir(), 'epss')

client = PolarsClient(
    include_v1_scores=False,
    include_v2_scores=False,
    include_v3_scores=True,
)
query = Query(
    cve_ids=[
        'CVE-2019-11510',
        'CVE-2020-1472',
        'CVE-2018-13379',
        'CVE-2018-15961',
        'CVE-2019-0604',
        'CVE-2019-0708',
        'CVE-2019-11580',
        'CVE-2019-19781',
        'CVE-2020-10189',
        'CVE-2014-1812',
        'CVE-2019-3398',
        'CVE-2020-0688',
        'CVE-2016-0167',
        'CVE-2017-11774',
        'CVE-2018-8581',
        'CVE-2019-8394',
    ]
)
df = client.get_scores(
    workdir=WORKDIR,
    query=query,
    drop_unchanged_scores=True
)

with Workbook('epss.xlsx') as wb:
    df.write_excel(
        workbook=wb,
        worksheet='FireEye red team tools'
    )

About

A Python 3 library for exploring EPSS scores with Polars

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published