Skip to content

Write a Big Data Script that uses the Pandas API for Spark or Dask

Notifications You must be signed in to change notification settings

pr-124/Data_eng_project1-Pragya

 
 

Repository files navigation

Data_eng_project1_Pragya

Hello, this is a repo for Project 1 of IDS 706: Data Engineering. Here, I am using Kaggle Dataset on books scraped via the Goodreads API (https://www.kaggle.com/datasets/jealousleopard/goodreadsbooks)

Kaggle Dataset

Goals of the project:

  1. Build a repo in Github
  2. Configure “scaffold”: Makefile,requirementsfile, app file (example: streamlit, cli, fastapi), test file
  3. Test with Github Actions
  4. To build a very simple microservice system that talks to a Big Data Script using Dask.
  5. Create a webservice using FastAPI
  6. Testing in demo (link)

About

Write a Big Data Script that uses the Pandas API for Spark or Dask

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.5%
  • Makefile 4.5%