This repository contains the R implementation of the Vendi Score (VS), a metric for evaluating diversity in machine learning and the natural sciences. The Vendi Scores are a family of diversity metrics that are flexible, interpretable, and unsupervised. Defined as the exponential of the entropy of the eigenvalues of a similarity matrix
Check out our Python implementation of the Vendi Scores here!
The Vendi Scores in R require no additional dependencies and can be directly installed with devtools.
devtools::install_github("vertaix/Vendi-Score-R")
The Vendi Scores have 3 inputs: your data, a pair-wise similarity metric
The data can be any data frame, matrix, vector, list or higher-dimensional array for which index
library(VendiScore)
library(datasets)
data(iris)
iris_mat <- data.matrix(iris)
iris_mat <- iris_mat[,colnames(iris_mat)!='Species']
rbf_kernel <- function(x, y, gamma = 0.1) exp(-gamma * sum((x - y)^2))
score(iris_mat, rbf_kernel, q=1.)
# 3.169735
A score of about
We can also use the cosine kernel trick to speed up Vendi Score computation for larger datasets in numerical form that can use a cosine similarity kernel. Data must be normalized in this case.
norm_samples <- t(apply(iris_mat, 1, function(row) row / sqrt(sum(row^2))))
VS <- score_cosine(samples=norm_samples, q=1)
# 1.20783
In cases where already have pre-computed a similarity matrix:
K <- matrix(data=c(1,1,0,1,1,0,0,0,1), nrow=3, ncol=3)
VS <- score_K(K, q=1.)
# 1.88988
We provide documentation for all functions in the package.
Check out our vignette for a demonstration of the advantages of the Vendi Score over metrics like average similarity.
@article{friedman2022vendi,
title={The Vendi Score: A Diversity Evaluation Metric for Machine Learning},
author={Friedman, Dan and Dieng, Adji Bousso},
journal={arXiv preprint arXiv:2210.02410},
year={2022}
}
@article{pasarkar2023cousins,
title={Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning},
author={Pasarkar, Amey P and Dieng, Adji Bousso},
journal={arXiv preprint arXiv:2310.12952},
year={2023},
}