Primarily applied pandas, and scikit-learn tools on JINAbase to do:
- regression/imputation - predict missing microturbulence values
- classification - classify stars based on stellar types
Report of this project can be found here. Requirements can be found in the YAML file here
Random Forest Classifier
|
Support Vector Classifier
|
KNN Classifier
|
XGBoost Classifier
|
MLP Classifier
|
Interactive 3D plot can be found here.
[1] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Random Forests, pages 587–604. Springer New York, New York, NY, 2009.
[2] Support vector machines — scikit-learn 1.2.2 documentation. https://scikit-learn.org/stable/modules/svm.html, 2023. Accessed:17-04-2023.
[3] Cosma Shalizi. CMU Statistics 36-462/662: Methods of Statistical Learning. Lecture Notes 11: k-Nearest Neighbors, 2022. URL: https://www.stat.cmu.edu/˜cshalizi/dm/22/lectures/11/lecture-11.pdf.
[4] Introduction to boosted trees — XGBoost documentation. https://xgboost.readthedocs.io/en/latest/tutorials/model.html#introduction-to-boosted-trees, 2022. Accessed: 15-04-2023.
[5] Roger Grosse. University of Toronto CSC 411: Machine Learning and Data Mining. Lecture 5: Multilayer Perceptrons, 2019. URL: https://www.cs.toronto.edu/˜mren/teach/csc411_19s/lec/lec10_notes1.pdf.
[6] Abdu Abohalima and Anna Frebel. Jinabase—a database for chemical abundances of metal-poor stars. The Astrophysical Journal Supplement Series, 238(2):36, oct 2018.
[7] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008.
[8] C. Chatfield. Problem Solving: A statistician’s guide, Second edition. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 1995.