GitHub - shubhamguptaiitd/frequent_pattern_mining: C++ implementation of frequent pattern mining algorithms (Apriori and FP Tree)

shubhamguptaiitd / frequent_pattern_mining Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

C++ implementation of frequent pattern mining algorithms (Apriori and FP Tree)

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CSZ188551.sh		CSZ188551.sh
README.txt		README.txt
a.out		a.out
apriori.cpp		apriori.cpp
class_def.h		class_def.h
compile.sh		compile.sh
existing_result.png		existing_result.png
fptree.cpp		fptree.cpp
functions.h		functions.h
install.sh		install.sh
plot.py		plot.py
retail.dat_plot.png		retail.dat_plot.png
time_fp_support_1.000000		time_fp_support_1.000000
tt_ap		tt_ap
tt_fp		tt_fp

Repository files navigation

COL 761 - Frequent itemsets mining : assignment - 1

Team consist of -
Sahil Manchanda - 2018CSZ8551
Raj Kamal - 2018CSZ8013
Shubham Gupta - 2019CSZ8470

compile.sh - running it generate executables which will the used by csz188551.sh to generate the frequent item sets and execution time plot of apriori vs fptree.

CSZ188551.sh -

./CSZ188551.sh input_file X -apriori <filename> will run apriori algorithm and generate frequent itemsets from input file with minimum support X and save these to filename.

./CSZ188551.sh input_file X -fptree <filename> will run fptree algorithm and generate frequent itemsets from input file with minimum support X and save these to filename.

./CSZ188551.sh input_file -plot will run apriori and fptree on given dataset with support on [1,5,10,25,50,90] and generate a plot displaying the time of execution vs support.

fptree.cpp - it implements the algorithm of fp tree based growth pattern mining .
apriori.cpp -it implements the algorithm of apriori based pattern mining.
class_def.h and functions.h includes the common functionality used by both fptree.cpp and apriori.cpp.

existing_result.png - plot of apriori vs fptree execution time, on webdocs.dat data.

Observation -
As we can see a very low support 1% and 5%, both algorithms have timed out. At support greater than 5%, it can be seen that fptree is very fast than apriori.
One of the reason is that fptree avoid multiple passes over transactios and create a compressed tree representation which can be mined to produce frequent items very fast.

In Apriori, multiple passes over dataset is required to eliminate infrequent itemsets at ever level of pruning while fptree does only 2 passes to create a compressed representation from which frequent itemsets can be mined quickly. Apriori requires generation of candidate itemsets, from which it mines the frequent ones but in fptree, no candidate generation is required .At very low support, candidate itemsets in apriori increases exponentially compared to high support. So, running time also increases exponentially in apriori with decrease in threshold.

Also it seems that memory requirement in fptree sometime is high compared to apriori.