donorchoose-projects-screening

The goal of this project is to develop a predictive model that can determine the likelihood of approval for project proposals submitted by teachers on DonorsChoose.org. To achieve this, the project involved several key steps.

Data Preprocessing: The project began with the collection of project descriptions, teacher information, and school metadata from DonorsChoose.org. This raw data was then preprocessed to clean and organize it for analysis.
Text Vectorization: The text content of project descriptions was transformed into numerical vectors using techniques such as TF-IDF (Term Frequency-Inverse Document Frequency) or word embeddings like Word2Vec or GloVe. This allowed the model to work with the textual information effectively.
Handling Categorical Values: The categorical variables, such as teacher and school information, were encoded into numerical values using techniques like one-hot encoding or label encoding. This enabled the inclusion of these important features in the machine learning model.
Feature Engineering: Additional features were derived from the available metadata, such as teacher experience, school location, and project category. These features provided more context and information for the model to make predictions.
Model Selection: Different machine learning algorithms were explored, including decision trees, random forests, support vector machines, and neural networks. Each algorithm's strengths and weaknesses were considered, and the best-suited models were chosen for experimentation.
Model Training and Evaluation: The dataset was divided into training and testing sets to train the chosen models. Evaluation metrics like accuracy, precision, recall, and F1-score were used to assess the model's performance on the test set.
Hyperparameter Tuning: Hyperparameters of the selected models were fine-tuned using techniques like grid search or random search to optimize their performance.
Ensemble Methods: To further improve predictive accuracy, ensemble methods like stacking or boosting were explored. These techniques combine the predictions of multiple models to create a more robust final prediction.
Cross-Validation: To validate the models more rigorously, k-fold cross-validation was applied, ensuring that the models' performance wasn't influenced by a specific train-test split.

In summary, this project aimed to predict the approval or rejection of DonorsChoose.org project proposals using a combination of natural language processing techniques, feature engineering, and various machine learning algorithms. By utilizing project descriptions, teacher information, and school metadata, the model provided valuable insights to both teachers and the platform, ultimately enhancing the success rate of impactful educational projects.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
1_EDA.ipynb		1_EDA.ipynb
2_Preprocessing.ipynb		2_Preprocessing.ipynb
3_Vectorization.ipynb		3_Vectorization.ipynb
4_Vectorization.ipynb		4_Vectorization.ipynb
5_KNN.ipynb		5_KNN.ipynb
6_Naive_Bayes.ipynb		6_Naive_Bayes.ipynb
7_Decision_tree.ipynb		7_Decision_tree.ipynb
8_GBDT.ipynb		8_GBDT.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

donorchoose-projects-screening

About

Releases

Packages

Languages

License

pratikroy311/donorchoose-projects-screening

Folders and files

Latest commit

History

Repository files navigation

donorchoose-projects-screening

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages