A repo to work on StackOverflow dataset to see the similarities between different posts for Course CSC 791 Foundation of Software Science by Dr. Menzies at NCSU.
This decade has seen a significant contribution in Machine Learning. But, there is also a enormous demand of resource usage in most cases. In the Text Mining domain, Deep Learning, for example, demands days of CPU time while identifying the relationship between different StackOverflow posts. Few studies showed that, a simpler approach is possible by combining Optimizer and Data Miner. Our work explores this region by using Optimizer to tune the hyperparameters of Data Miners that demands less CPU time without sacrificing the results. We extended the work of Majumdar et al. and Fu et al. by implementing FLASH as a hyperparameter tuner for SVMs on both clustered and unclustered data. We have also tried to produce a new dataset for Text Mining. In this study, we have found that, FLASH is a simpler and cheaper optimizer than DE that generates similar results as complex machine learning algorithms in significantly less CPU time.
Paper: https://www.overleaf.com/read/hrtrzmrrtybr http://tiny.cc/fss18FlashSVMPaper
Presentation Slides: http://tiny.cc/fss18FlashSVMPresentation