Welcome to our Predictive Text Analysis project! This repository contains code for predicting answers to science exam questions using advanced natural language processing techniques.
We utilized a comprehensive dataset containing questions (prompt) and answer choices (A, B, C, D, E) from science exams. The dataset was meticulously curated to ensure diverse and meaningful questions for analysis.
- Prompt Analysis: We performed in-depth analysis on question prompts, exploring word frequencies, lengths, and semantic patterns.
- Text Vectorization: Utilized TF-IDF vectorization to convert textual data into numerical features for machine learning model training.
- Machine Learning Model: Implemented a Random Forest Classifier for answer prediction, achieving high accuracy on the test set.
Our machine learning model comprises a Random Forest Classifier, a robust algorithm for multi-class classification tasks. We used TF-IDF vectorized features as input, enabling the model to learn complex patterns in the textual data.
- Interactive Visualizations: Explore interactive charts and visualizations, including bar charts representing class distributions and dynamic word clouds showcasing frequently occurring words in questions.
- 3D Scatter Plots: Dive into 3D scatter plots to uncover correlations between question difficulty, length, and correct answer frequencies.
- Confusion Matrix: Visualize the model's performance through an intuitive confusion matrix, providing insights into prediction accuracy.
- Data Preprocessing: Explore Jupyter Notebooks for in-depth data preprocessing and exploratory data analysis.
- Model Training: Utilize the provided Python scripts to train the Random Forest Classifier and obtain predictions.
- Interactive Visualizations: Run interactive Python scripts for dynamic visualizations of the dataset and model performance.
- Python 3.7+
- Pandas
- NumPy
- Scikit-Learn
- Matplotlib
- Seaborn
- Plotly
- WordCloud
Our trained model achieved an accuracy of over 90% on the test dataset, demonstrating its effectiveness in predicting correct answers to science exam questions.
Let's connect and collaborate! Feel free to reach out to me on:
- LinkedIn: Vidhi Waghela
- Kaggle: Vidhi Kishor Waghela
- GitHub: Vidhi1290
I'm always open to discussions, collaborations, and learning new things together. Don't hesitate to drop me a message or explore my other projects on GitHub. Happy coding! 🚀
Feel free to dive into the code, experiment with the features, and explore the nuances of writing quality predictions through keystroke analysis! 🕵️♂️💬
Happy coding! 🚀