This project utilizes Python to apply supervised learning techniques to two distinct real-world data domains: banking credit assessments and relational datasets (Cora, CiteSeer, PubMed). The objective is to navigate through the inherent challenges of these datasets, deploying a variety of classification methods to gain insights and predictive accuracy.
I employed several classical and contemporary supervised learning algorithms, including but not limited to Naive Bayes, K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and Decision Trees. Special emphasis was placed on preprocessing techniques suitable for each dataset's nature, ensuring robust model training and evaluation.
- Data Exploration: Initial analysis to understand the datasets' characteristics, distribution, and potential challenges.
- Feature Engineering: Crafting and selecting meaningful features to enhance model performance.
- Model Selection and Training: Comparative analysis of various algorithms to identify the most effective models for these specific datasets.
- Performance Evaluation: Utilizing accuracy, precision, recall, and F1-score metrics to assess model efficacy.
The analysis revealed insightful patterns and predictive accuracies that underscore the complexities and potential of supervised learning in real-world applications.