Skip to content

HarrisonJYU/ds_projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 

Repository files navigation

Selected Data Science Projects for Harrison Yu


Keywords: Big Data, Pipeline, Spark, no-SQL, Airflow, Cloud Computing(GCP), Sentence Embedding

  • Designed a data jobs recommendation platform for given resumes using Airflow, MangoDB and Spark ML
  • Experimented with different embedding models using Sentence Transformers to optimize accuracy of the system
  • Built a pipeline to collect job data automatically and store over 3500 job descriptions in NoSQL database

Keywords: A/B Testing, Experiments, Hypothesis Testing, Response Surface Methology

  • Minimize browsing time by ~40% by conducting multivariate experiments to find optimal design factors
  • Run partial F-tests to determine interaction effects between design factors and fitted a response surface model to further locate the optimum

NYC Household Broadband Adoption Rate Analysis

Keywords: Geospatial Analytics, Regression Analysis, Correlation Analysis, GIS, Open Data

  • Conducting a regression analysis to explore the digital divide in NYC neighborhoods on internet adoption rates and socioeconomic factors in ArcGIS
  • Improved R-2 score by ~30% by implementing a geographically weighted regression (GWR) model
  • Visualized clusters and outliers of internet adoption using spatial weight matrix and Local Moran’s I in Python

About

Selected data science projects and works

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published