Skip to content

Python library for generating a confusion matrix from actual and predicted label.

License

Notifications You must be signed in to change notification settings

waikato-datamining/simple-confusion-matrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simple-confusion-matrix

Python library for generating a confusion matrix from actual and predicted label.

Functionality:

  • load actual, predicted and weight columns from a CSV

  • if no weight list (values: 0-1) is provided, a value of 1 is assumed; this value is used as weight for the counting then

  • can generate three types of matrices:

    • counts
    • percentages (all cells sum up to 1)
    • percentages per row (all cells of a row sum up to 1)
  • can generate plain text or CSV output

  • can write generated matrix back to a CSV file

Installation

  • create virtual environment

    python3 -m venv venv
    ./venv/bin/pip install simple_confusion_matrix
    

Command-line usage

You don't have to write code for using the library, you can just use the scm-generate command-line utility:

usage: scm-generate [-h] -i FILE [-d DELIMITER] [-q CHAR] [-H] [-o FILE]
                    [-O FORMAT] [-a COL] [-A PREFIX] [-p COL] [-P PREFIX]
                    [-w COL] [-l LABELS] [-C CORNER] [-D NUM] [-t TYPE]

Generates a confusion matrix from a CSV file with actual/predicted label
columns.

optional arguments:
  -h, --help            show this help message and exit
  -i FILE, --input FILE
                        the CSV file to load the actual/predicted labels from
                        (default: None)
  -d DELIMITER, --delimiter DELIMITER
                        the column delimited in the CSV file (default: ,)
  -q CHAR, --quotechar CHAR
                        the quote character to use in the CSV file (default:
                        ")
  -H, --no_header       whether the CSV file has no header row (default: True)
  -o FILE, --output FILE
                        the optional CSV file to write the generated matrix to
                        (default: None)
  -O FORMAT, --output_format FORMAT
                        the output format to use when writing to the output
                        file (default: csv)
  -a COL, --actual COL  the 1-based column index for the actual/ground truth
                        labels (default: 1)
  -A PREFIX, --actual_prefix PREFIX
                        the prefix to use for the labels depicted in the
                        'actual' labels column (default: a: )
  -p COL, --predicted COL
                        the 1-based column index for the predicted labels
                        (default: 2)
  -P PREFIX, --predicted_prefix PREFIX
                        the prefix to use for the labels depicted in the
                        'predicted' labels row (default: p: )
  -w COL, --weight COL  the 1-based column index for the weight (0-1) of the
                        predicted label (default: None)
  -l LABELS, --labels LABELS
                        comma-separated list of predefined labels to use (eg
                        if not all labels present in CSV file) (default: None)
  -C CORNER, --corner CORNER
                        the text to print in the top-left corner (default: x)
  -D NUM, --max_decimals NUM
                        the maximum number of decimals after the decimal point
                        to use in case of float values like percentages
                        (default: 3)
  -t TYPE, --matrix_type TYPE
                        the type of matrix to generate (default: counts)

Code examples

  • Generating counts:

    from scm.matrix import ConfusionMatrix, MatrixType
    
    actual = ["a", "a", "b", "b", "a", "b", "c", "c", "c", "c"]
    predicted = ["a", "c", "b", "a", "a", "b", "a", "b", "c", "c"]
    matrix = ConfusionMatrix(actual=actual, predicted=predicted)
    result = matrix.generate(matrix_type=MatrixType.COUNTS)
    print(result)

    Generates the following output:

    x    p: a p: b p: c
    a: a    2    0    1
    a: b    1    2    0
    a: c    1    1    2
    
  • Using predefined label order:

    from scm.matrix import ConfusionMatrix, MatrixType
    
    actual = ["a", "a", "b", "b", "a", "b", "c", "c", "c", "c"]
    predicted = ["a", "c", "b", "a", "a", "b", "a", "b", "c", "c"]
    labels = ["c", "b", "a"]
    matrix = ConfusionMatrix(actual=actual, predicted=predicted, labels=labels)
    result = matrix.generate(matrix_type=MatrixType.COUNTS)
    print(result)

    Generates the following output:

    x    p: c p: b p: a
    a: c    2    1    1
    a: b    0    2    1
    a: a    1    0    2
    
  • Using global percentages:

    from scm.matrix import ConfusionMatrix, MatrixType
    
    actual = ["a", "a", "b", "b", "a", "b", "c", "c", "c", "c"]
    predicted = ["a", "c", "b", "a", "a", "b", "a", "b", "c", "c"]
    matrix = ConfusionMatrix(actual=actual, predicted=predicted, labels=labels)
    result = matrix.generate(matrix_type=MatrixType.PERCENTAGES)
    print(result)

    Generates the following output:

    x    p: a  p: b  p: c 
    a: a 0.200 0.000 0.100
    a: b 0.100 0.200 0.000
    a: c 0.100 0.100 0.200
    
  • Using row percentages (and only two decimals after decimal point):

    from scm.matrix import ConfusionMatrix, MatrixType
    
    actual = ["a", "a", "b", "b", "a", "b", "c", "c", "c", "c"]
    predicted = ["a", "c", "b", "a", "a", "b", "a", "b", "c", "c"]
    matrix = ConfusionMatrix(actual=actual, predicted=predicted, labels=labels)
    result = matrix.generate(matrix_type=MatrixType.PERCENTAGES_PER_ROW, max_decimals=2)
    print(result)

    Generates the following output:

    x    p: a p: b p: c
    a: a 0.67 0.00 0.33
    a: b 0.33 0.67 0.00
    a: c 0.25 0.25 0.50
    
  • Saving the matrix as CSV file:

    from scm.matrix import ConfusionMatrix, MatrixType
    
    actual = ["a", "a", "b", "b", "a", "b", "c", "c", "c", "c"]
    predicted = ["a", "c", "b", "a", "a", "b", "a", "b", "c", "c"]
    matrix = ConfusionMatrix(actual=actual, predicted=predicted, labels=labels)
    result = matrix.generate(matrix_type=MatrixType.PERCENTAGES_PER_ROW, max_decimals=2)
    result.to_csv(output_file="/some/where/out.csv")
  • Loading the actual and predicted values from a CSV file and writing the result back to a CSV:

    from scm.matrix import ConfusionMatrix, MatrixType, load_csv
    
    actual, predicted, _ = load_csv("/some/where/in.csv", 1, 2)
    matrix = ConfusionMatrix(actual=actual, predicted=predicted)
    result = matrix.generate(matrix_type=MatrixType.PERCENTAGES_PER_ROW)
    result.to_csv(output_file="/some/where/out.csv")

About

Python library for generating a confusion matrix from actual and predicted label.

Resources

License

Stars

Watchers

Forks

Languages