This project focuses on developing an intelligent query autocompletion system using advanced natural language processing techniques. The system combines GPT-2 and Masked Language models to generate accurate and contextually relevant query suggestions.
The project aims to enhance the user experience in search interfaces by providing smart query autocompletion. It utilizes two main models:
- Fine-tuned GPT-2 model
- Masked Language Model (DistilRoBERTa-base)
These models are combined to overcome individual limitations and produce more accurate and complete query suggestions.
- Used a Well Formedness Dataset containing query ratings and sentences
- Performed data cleaning, exploration, and preprocessing
- Applied filtering and class balancing techniques
- Utilized Hugging Face's transformer library
- Implemented tokenization and fine-tuning processes
- Optimized training parameters
- Used DistilRoBERTa-base architecture
- Implemented tokenization and masking techniques
- Trained the model using TensorFlow
To address limitations of individual models:
- Generate initial output using GPT-2
- Append a masked token to the GPT-2 output
- Use MLM to predict the masked token and complete the query
- Developed a Streamlit web application
- Implemented tabs for different functionalities:
- Masked Language Model predictions
- GPT-2 text generation
- Combined model predictions
The project successfully demonstrates:
- Effective query autocompletion using combined NLP models
- A user-friendly interface for interacting with the system
- Potential for further improvements and personalization
- Implement user authentication
- Develop personalized query suggestions based on user history
- Introduce real-time suggestions as users type
- Implement query ranking based on popularity
Our Team:
- Harsita Keerthikanth
- Alena Chao
- Sofia Nguyen
- Vanessa Huynh
- Erica Xue
Special thanks to challenge advisors Kanay Gupta and Smrithika Appaiah, and course support Mako Ohara.