- Designed a data jobs recommendation platform for given resumes using Airflow, MangoDB and Spark ML
- Experimented with different embedding models using Sentence Transformers to optimize accuracy of the system
- Built a pipeline to collect job data automatically and store over 3500 job descriptions in NoSQL database
- Minimize browsing time by ~40% by conducting multivariate experiments to find optimal design factors
- Run partial F-tests to determine interaction effects between design factors and fitted a response surface model to further locate the optimum
- Conducting a regression analysis to explore the digital divide in NYC neighborhoods on internet adoption rates and socioeconomic factors in ArcGIS
- Improved R-2 score by ~30% by implementing a geographically weighted regression (GWR) model
- Visualized clusters and outliers of internet adoption using spatial weight matrix and Local Moran’s I in Python