G8_BookBank

main Library used:

request
numpy
pandas
mysqlConnector
SQLAlchemy
streamlit

Main page used is IranKetab

What is Goal ?

Creating a beautiful dashboard and do some null hypothesis 😄

All purpose is just showing our skills and knowledge about Analytics, So we don't have any mistrust to "IranKetab" Website

Step 1 (Scraping)

In two separate files named scrap_link and crawl_page, we have extracted all the necessary and unnecessary information from the target website.

`scrap_link` File

In the scrap_link file, only all the links to the books from all categories (tags) have been extracted and saved in a file named links.csv. It takes approximately 60 minutes to extract all the links.

`crawl_page` File

In the crawl_page file, all the essential details such as the book's ISBN, discount amount, print series, tags, etc., have been extracted. Subsidiary information includes footnotes or golden quotes from the book. The final file has an approximate size of 100 megabytes (assuming all links are present) and is saved in CSV format.

instead of using classical method you can use pandas_readHTML method, but there is no way to get link of translators or publisher

Step 2 (Database)

In the cleaner_book file, all the data is cleaned and then saved in a new CSV file. Following that, in the ExportDB file, we segregated the data according to the diagram below, storing it locally. Subsequently, we obtained CSV outputs from the same data.

File Structure

cleaner_book: Cleans and saves data in a new CSV file.
ExportDB: Segregates data according to the diagram and saves locally.

Diagram:

Step 3 (Dashboarding)

In st_git file we create a dynamic dashboard with streamlit library.

In this dashboard, graphs about the frequency of books of each genre, the number of books by top authors, translators and publishers, books published in different years, the frequency of books based on the type of cover, as well as the distribution of book prices based on the year of publication and book rating are displayed in has come

There is also a search section where the user can search for the desired book by specifying the maximum price, minimum score, author, publisher and book genre.

In the figure below, a view of the dashboard is displayed:

You can check final result here Streamlit_Dashboard 🕶️

Step 4 (Null Hypothesis)

Include Some statistics tests for finding relationships between features

First Hypothesis

Translation has a significant impact on the price of the book

Second Hypothesis

There is a significant difference in prices between hardcover and paperback versions

for more details you can check first_hyp & second_hyp

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Crawling		Crawling
DB Section		DB Section
Req Customer		Req Customer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
st_git.py		st_git.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G8_BookBank

What is Goal ?

Step 1 (Scraping)

`scrap_link` File

`crawl_page` File

Step 2 (Database)

File Structure

Step 3 (Dashboarding)

Step 4 (Null Hypothesis)

First Hypothesis

Second Hypothesis

About

Releases

Packages

Contributors 5

Languages

License

amiralira/G8_BookBank

Folders and files

Latest commit

History

Repository files navigation

G8_BookBank

What is Goal ?

Step 1 (Scraping)

scrap_link File

crawl_page File

Step 2 (Database)

File Structure

Step 3 (Dashboarding)

Step 4 (Null Hypothesis)

First Hypothesis

Second Hypothesis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

`scrap_link` File

`crawl_page` File

Packages