Skip to content

Latest commit

 

History

History
20 lines (16 loc) · 1.24 KB

README.md

File metadata and controls

20 lines (16 loc) · 1.24 KB

Book Depository

Open in colab GitHub license

This repository is a challenge for the Junior Data Engineer position. For that, crossings, merges and transformations will be carried out in the data, in order to answer some questions and extract some insights from the book depository dataset.

Data source: https://www.kaggle.com/sp1thas/book-depository-dataset

Questions to be answered

  • What is the total amount of books in the base?
  • How many books have only 1 author?
  • Which are the 5 authors with the most books?
  • How many books per category?
  • What are the 5 categories with the most books?
  • Which format has the most books?
  • Considering the bestsellers-rank column, what are the 10 best ranked books?
  • Considering the rating-avg column, what are the 10 best ranked books?
  • How many books have rating-avg greater than 3.5?
  • How many books have a publication date (publication-date) greater than 01-01-2020?