Skip to content

Latest commit

 

History

History
605 lines (409 loc) · 74.8 KB

README.md

File metadata and controls

605 lines (409 loc) · 74.8 KB

PyPI PyPI - Python Version GitHub license PyPI - Implementation Documentation Status PyPI - Wheel PyPI - Status GitHub issues GitHub forks GitHub stars Downloads Downloads Downloads

Click here for more information


Table of Contents


Introduction

PAttern MIning (PAMI) is a Python library containing several algorithms to discover user interest-based patterns in a wide-spectrum of datasets across multiple computing platforms. Useful links to utilize the services of this library were provided below:

  1. Youtube tutorial https://www.youtube.com/playlist?list=PLKP768gjVJmDer6MajaLbwtfC9ULVuaCZ

  2. Tutorials (Notebooks) https://github.com/UdayLab/PAMI/tree/main/notebooks

  3. User manual https://udaylab.github.io/PAMI/manuals/index.html

  4. Coders manual https://udaylab.github.io/PAMI/codersManual/index.html

  5. Code documentation https://pami-1.readthedocs.io

  6. Datasets https://u-aizu.ac.jp/~udayrage/datasets.html

  7. Discussions on PAMI usage https://github.com/UdayLab/PAMI/discussions

  8. Report issues https://github.com/UdayLab/PAMI/issues


Flow Chart of Developing Algorithms in PAMI

PAMI's production process


Inputs and Outputs of an Algorithm in PAMI

Inputs and Outputs


Recent Updates

  • Version 2024.05.01: In this latest version, the following updates have been made:
    • Included two new algorithms, Gspan and TKG, for frequent subgraph mining.
    • Updated three Synthetic Data Generator, transactional database, temporal database, and geo-referenced transactional database.
    • Optimized the following frequent pattern mining algorithms: Apriori, Aprioribitset, ECLAT, ECLATbitset, FPGrowth, and CHARM.
    • startMine() function has been deprecated to mine() function.

Total number of algorithms: 83


Features

  • ✅ Well-tested and production-ready
  • 🔋 Highly optimized to our best effort, light-weight, and energy-efficient
  • 👀 Proper code documentation
  • 🍼 Ample examples of using various algorithms at ./notebooks folder
  • 🤖 Works with AI libraries such as TensorFlow, PyTorch, and sklearn.
  • ⚡️ Supports Cuda and PySpark
  • 🖥️ Operating System Independence
  • 🔬 Knowledge discovery in static data and streams
  • 🐎 Snappy
  • 🐻 Ease of use

Maintenance

Installation

  1. Installing basic pami package (recommended)

    pip install pami
    
  2. Installing pami package in a GPU machine that supports CUDA

    pip install 'pami[gpu]'
    
  3. Installing pami package in a distributed network environment supporting Spark

    pip install 'pami[spark]'
    
  4. Installing pami package for developing purpose

    pip install 'pami[dev]'
    
  5. Installing complete Library of pami

    pip install 'pami[all]'
    

Upgradation

    pip install --upgrade pami

Uninstallation

    pip uninstall pami 

Information

    pip show pami

Try your first PAMI program

$ python
# first import pami 
from PAMI.frequentPattern.basic import FPGrowth as alg
fileURL = "https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv"
minSup=300
obj = alg.FPGrowth(iFile=fileURL, minSup=minSup, sep='\t')
#obj.startMine()  #deprecated
obj.mine()
obj.save('frequentPatternsAtMinSupCount300.txt')
frequentPatternsDF= obj.getPatternsAsDataFrame()
print('Total No of patterns: ' + str(len(frequentPatternsDF))) #print the total number of patterns
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
print('Memory (USS): ' + str(obj.getMemoryUSS()))
Output:
Frequent patterns were generated successfully using frequentPatternGrowth algorithm
Total No of patterns: 4540
Runtime: 8.749667644500732
Memory (RSS): 522911744
Memory (USS): 475353088

Evaluation:

  1. we compared three different Python libraries such as PAMI, mlxtend and efficient-apriori for Apriori.
  2. (Transactional_T10I4D100K.csv)is a transactional database downloaded from PAMI and used as an input file for all libraries.
  3. Minimum support values and seperator are also same.
  • The performance of the Apriori algorithm is shown in the graphical results below:
  1. Comparing the Patterns Generated by different Python libraries for the Apriori algorithm:

    Screenshot 2024-04-11 at 13 31 31
  2. Evaluating the Runtime of the Apriori algorithm across different Python libraries:

    Screenshot 2024-04-11 at 13 31 20
  3. Comparing the Memory Consumption of the Apriori algorithm across different Python libraries:

    Screenshot 2024-04-11 at 13 31 08

For more information, we have uploaded the evaluation file in two formats:


Reading Material

For more examples, refer this YouTube link YouTube


License

GitHub license


Documentation

The official documentation is hosted on PAMI.


Background

The idea and motivation to develop PAMI was from Kitsuregawa Lab at the University of Tokyo. Work on PAMI started at University of Aizu in 2020 and has been under active development since then.


Getting Help

For any queries, the best place to go to is Github Issues GithubIssues.


Discussion and Development

In our GitHub repository, the primary platform for discussing development-related matters is the university lab. We encourage our team members and contributors to utilize this platform for a wide range of discussions, including bug reports, feature requests, design decisions, and implementation details.


Contribution to PAMI

We invite and encourage all community members to contribute, report bugs, fix bugs, enhance documentation, propose improvements, and share their creative ideas.


Tutorials

0. Association Rule Mining

Basic
Confidence Open In Colab
Lift Open In Colab
Leverage Open In Colab

1. Pattern mining in binary transactional databases

1.1. Frequent pattern mining: Sample

Basic Closed Maximal Top-k CUDA pyspark
Apriori Open In Colab CHARM Open In Colab maxFP-growth Open In Colab FAE Open In Colab cudaAprioriGCT parallelApriori Open In Colab
FP-growth Open In Colab cudaAprioriTID parallelFPGrowth Open In Colab
ECLAT Open In Colab cudaEclatGCT parallelECLAT Open In Colab
ECLAT-bitSet Open In Colab
ECLAT-diffset Open In Colab

1.2. Relative frequent pattern mining: Sample

Basic
RSFP-growth Open In Colab

1.3. Frequent pattern with multiple minimum support: Sample

Basic
CFPGrowth Open In Colab
CFPGrowth++ Open In Colab

1.4. Correlated pattern mining: Sample

Basic
CoMine Open In Colab
CoMine++ Open In Colab

1.5. Fault-tolerant frequent pattern mining (under development)

Basic
FTApriori Open In Colab
FTFPGrowth (under development) Open In Colab

1.6. Coverage pattern mining (under development)

Basic
CMine Open In Colab
CMine++ Open In Colab

2. Pattern mining in binary temporal databases

2.1. Periodic-frequent pattern mining: Sample

Basic Closed Maximal Top-K
PFP-growth Open In Colab CPFP Open In Colab maxPF-growth Open In Colab kPFPMiner Open In Colab
PFP-growth++ Open In Colab Topk-PFP Open In Colab
PS-growth Open In Colab
PFP-ECLAT Open In Colab
PFPM-Compliments Open In Colab

2.2. Local periodic pattern mining: Sample

Basic
LPPGrowth (under development) Open In Colab
LPPMBreadth (under development) Open In Colab
LPPMDepth (under development) Open In Colab

2.3. Partial periodic-frequent pattern mining: Sample

Basic
GPF-growth Open In Colab
PPF-DFS Open In Colab
GPPF-DFS Open In Colab

2.4. Partial periodic pattern mining: Sample

Basic Closed Maximal topK CUDA
3P-growth Open In Colab 3P-close Open In Colab max3P-growth Open In Colab topK-3P growth Open In Colab cuGPPMiner (under development) Open In Colab
3P-ECLAT Open In Colab gPPMiner (under development) Open In Colab
G3P-Growth Open In Colab

2.5. Periodic correlated pattern mining: Sample

Basic
EPCP-growth Open In Colab

2.6. Stable periodic pattern mining: Sample

Basic TopK
SPP-growth Open In Colab TSPIN Open In Colab
SPP-ECLAT Open In Colab

2.7. Recurring pattern mining: Sample

Basic
RPgrowth Open In Colab

3. Mining patterns from binary Geo-referenced (or spatiotemporal) databases

3.1. Geo-referenced frequent pattern mining: Sample

Basic
spatialECLAT Open In Colab
FSP-growth Open In Colab

3.2. Geo-referenced periodic frequent pattern mining: Sample

Basic
GPFPMiner Open In Colab
PFS-ECLAT Open In Colab
ST-ECLAT Open In Colab

3.3. Geo-referenced partial periodic pattern mining:Sample

Basic
STECLAT Open In Colab

4. Mining patterns from Utility (or non-binary) databases

4.1. High utility pattern mining: Sample

Basic
EFIM Open In Colab
HMiner Open In Colab
UPGrowth Open In Colab

4.2. High utility frequent pattern mining: Sample

Basic
HUFIM Open In Colab

4.3. High utility geo-referenced frequent pattern mining: Sample

Basic
SHUFIM Open In Colab

4.4. High utility spatial pattern mining: Sample

Basic topk
HDSHIM Open In Colab TKSHUIM Open In Colab
SHUIM Open In Colab

4.5. Relative High utility pattern mining: Sample

Basic
RHUIM Open In Colab

4.6. Weighted frequent pattern mining: Sample

Basic
WFIM Open In Colab

4.7. Weighted frequent regular pattern mining: Sample

Basic
WFRIMiner Open In Colab

4.8. Weighted frequent neighbourhood pattern mining: Sample

Basic
SSWFPGrowth

5. Mining patterns from fuzzy transactional/temporal/geo-referenced databases

5.1. Fuzzy Frequent pattern mining: Sample

Basic
FFI-Miner Open In Colab

5.2. Fuzzy correlated pattern mining: Sample

Basic
FCP-growth Open In Colab

5.3. Fuzzy geo-referenced frequent pattern mining: Sample

Basic
FFSP-Miner Open In Colab

5.4. Fuzzy periodic frequent pattern mining: Sample

Basic
FPFP-Miner Open In Colab

5.5. Fuzzy geo-referenced periodic frequent pattern mining: Sample

Basic
FGPFP-Miner (under development) Open In Colab

6. Mining patterns from uncertain transactional/temporal/geo-referenced databases

6.1. Uncertain frequent pattern mining: Sample

Basic top-k
PUF Open In Colab TUFP
TubeP Open In Colab
TubeS Open In Colab
UVEclat

6.2. Uncertain periodic frequent pattern mining: Sample

Basic
UPFP-growth Open In Colab
UPFP-growth++ Open In Colab

6.3. Uncertain Weighted frequent pattern mining: Sample

Basic
WUFIM Open In Colab

7. Mining patterns from sequence databases

7.1. Sequence frequent pattern mining: Sample

Basic
SPADE Open In Colab
PrefixSpan Open In Colab

7.2. Geo-referenced Frequent Sequence Pattern mining

Basic
GFSP-Miner (under development) Open In Colab

8. Mining patterns from multiple timeseries databases

8.1. Partial periodic pattern mining (under development)

Basic
PP-Growth (under development) Open In Colab

9. Mining interesting patterns from Streams

  1. Frequent pattern mining
Basic
to be written
  1. High utility pattern mining
Basic
HUPMS

10. Mining patterns from contiguous character sequences (E.g., DNA, Genome, and Game sequences)

10.1. Contiguous Frequent Patterns

Basic
PositionMining Open In Colab

11. Mining patterns from Graphs

11.1. Frequent sub-graph mining

Basic topk
Gspan Open In Colab TKG Open In Colab

12. Additional Features

12.1. Creation of synthetic databases

Database type
Transactional database Open In Colab
Temporal database Open In Colab
Utility database (coming soon)
spatio-transactional database (coming soon)
spatio-temporal database (coming soon)
fuzzy transactional database (coming soon)
fuzzy temporal database (coming soon)
Sequence database generator (coming soon)

12.2. Converting a dataframe into a specific database type

Approaches
Dense dataframe to databases Open In Colab
Sparse dataframe to databases (coming soon)

12.3. Gathering the statistical details of a database

Approaches
Transactional database Open In Colab
Temporal database Open In Colab
Utility database (coming soon)

12.4. Generating Latex code for the experimental results

Approaches
Latex code (coming soon)

Real World Case Studies

  1. Air pollution analytics Open In Colab

Go to Top