Break Through Tech: Google 2E

Search Query Recommendation System

This project focuses on developing an intelligent query autocompletion system using advanced natural language processing techniques. The system combines GPT-2 and Masked Language models to generate accurate and contextually relevant query suggestions.

Project Overview

The project aims to enhance the user experience in search interfaces by providing smart query autocompletion. It utilizes two main models:

Fine-tuned GPT-2 model
Masked Language Model (DistilRoBERTa-base)

These models are combined to overcome individual limitations and produce more accurate and complete query suggestions.

Process

Data Preparation

Used a Well Formedness Dataset containing query ratings and sentences
Performed data cleaning, exploration, and preprocessing
Applied filtering and class balancing techniques

Model Implementation

GPT-2 Model

Utilized Hugging Face's transformer library
Implemented tokenization and fine-tuning processes
Optimized training parameters

Masked Language Model (MLM)

Used DistilRoBERTa-base architecture
Implemented tokenization and masking techniques
Trained the model using TensorFlow

Model Combination

To address limitations of individual models:

Generate initial output using GPT-2
Append a masked token to the GPT-2 output
Use MLM to predict the masked token and complete the query

User Interface

Developed a Streamlit web application
Implemented tabs for different functionalities:
- Masked Language Model predictions
- GPT-2 text generation
- Combined model predictions

Conclusions

The project successfully demonstrates:

Effective query autocompletion using combined NLP models
A user-friendly interface for interacting with the system
Potential for further improvements and personalization

Future Work

Implement user authentication
Develop personalized query suggestions based on user history
Introduce real-time suggestions as users type
Implement query ranking based on popularity

Acknowledgements

Our Team:

Harsita Keerthikanth
Alena Chao
Sofia Nguyen
Vanessa Huynh
Erica Xue

Special thanks to challenge advisors Kanay Gupta and Smrithika Appaiah, and course support Mako Ohara.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
demo-app		demo-app
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Break Through Tech: Google 2E

Search Query Recommendation System

Project Overview

Process

Data Preparation

Model Implementation

GPT-2 Model

Masked Language Model (MLM)

Model Combination

User Interface

Conclusions

Future Work

Acknowledgements

About

Releases

Packages

Contributors 5

Languages

License

harsita-keerthi/btt-google-2e

Folders and files

Latest commit

History

Repository files navigation

Break Through Tech: Google 2E

Search Query Recommendation System

Project Overview

Process

Data Preparation

Model Implementation

GPT-2 Model

Masked Language Model (MLM)

Model Combination

User Interface

Conclusions

Future Work

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages