Skip to content

πŸ“š Performing data crossing, merging and transformation, in order to answer some questions about the book depository dataset

License

Notifications You must be signed in to change notification settings

gprzy/book-depository

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Book Depository

Open in colab GitHub license

This repository is a challenge for the Junior Data Engineer position. For that, crossings, merges and transformations will be carried out in the data, in order to answer some questions and extract some insights from the book depository dataset.

Data source: https://www.kaggle.com/sp1thas/book-depository-dataset

Questions to be answered

  • What is the total amount of books in the base?
  • How many books have only 1 author?
  • Which are the 5 authors with the most books?
  • How many books per category?
  • What are the 5 categories with the most books?
  • Which format has the most books?
  • Considering the bestsellers-rank column, what are the 10 best ranked books?
  • Considering the rating-avg column, what are the 10 best ranked books?
  • How many books have rating-avg greater than 3.5?
  • How many books have a publication date (publication-date) greater than 01-01-2020?

About

πŸ“š Performing data crossing, merging and transformation, in order to answer some questions about the book depository dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published