UnexPatterns is a an algorithm that mines unexpected patterns from large attributed graphs.
It relies on the combination of information-theoretic based measure of Unexpectedness [1] as an interestingness measure to select relevant patterns and pruning techniques to reduce the search space.
[1] Dessalles, J.L.: Algorithmic simplicity and relevance. Algorithmic Probability and Friends - LNAI 7070, pp. 119–130, Springer (2013).
Clone this project and install dependencies:
git clone
cd UnexPatterns
pip install -r requirements.txt
The five real-world datasets used in our work are provided either in data/
directory or accessible through Netset:
'wikivitals'
'wikivitals-fr'
'wikischools'
'sanFranciscoCrimes'
'ingredients'
Use parameters.txt
to specify the dataset(s) and parameter(s) value. As an example:
datasets: wikivitals, wikivitals-fr
s: 8
beta: 4
delta: 0
patterns_path: output
where:
datasets
: Dataset name (if several datasets, names are coma-separated)s
: Minimum number of nodes in patternbeta
: Minimum number of attributes in patterndelta
: Minimum amount of unexpectedness difference between two patternspatterns_path
: Path to output directory
To mine patterns according to parameters.txt
, use the following command:
python unexpatterns.py parameters.txt
Patterns are stored in binary format, within output/patterns/
directory. To extract them, you can use:
with open(f'output/patterns/<patterns>.bin', 'rb') as data:
patterns = pickle.load(data)
The naming convention of each file is the following: result_<dataset>_<beta>_<s>_<delta>.bin
.
The variable patterns
then contains a list of patterns, each in the form of a tuple (Xi,Qi) where Xi is the set of node indexes and Qi the set of attribute indexes of the ith pattern.
Code for experiments and comparisons with baseline and state-of-the-art aglorithms are provided in experiments/
directory.