Skip to content

Providing Insights, understanding and processing Big Data using SQL and PySpark

Notifications You must be signed in to change notification settings

KizMan-23/queries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Queries is a repository for sql and pyspark projects as frameworks used in querrying and processing big data given different conditons. sql is a widely used domain specific language used to process data stored in relational databases. PySpark is the Python API for Apache Spark, a powerful distributed computing framework designed for processing large-scale data. pyspark is widely used for data distributed across multiple storage and is prominent for its adaptation with sql, machine learning, pandas in natures of: Spark SQL, Mlib, structured streaming, Pandas API on Spark.

Employyees SQL is a project which uses sql in a company_employee data setting to analyze problems, bringing solutions to questions that are important to understand the scope of the data. SQL is primarily used in providing answers to questions that surround business settings.

employ sql 1

Music Store Analysis just like music platforms like spotify, this projects showcases the use of sql to understand artists, albums, tracks and other related problems. The project follows a question and answerformat and also provides insight into understanding the complexites of performing complex sql querries for business solutions

music store sql

sql-practice 1,2,3 are a json files of sql solutions i solved from sql_practice website.The Website offers business related problems and expects sql solutions for each problem, thus can help to understand business the more and offer growth insights for the business.

PySpark as an apache spark api is accessible through the data analytics platform, Databricks. All pyspark projects were carried out on the databricks workspace notebooks.

Employee On PySpark is replication and re-purposing of the sql version of employee_sql problems where company_employees relation problems were sorted using pypsark applications. This was a project to show the similarites and difficulties between sql and pyspark in providing business solutions.

emp on spark

spotify streams on pyspark is a typical analysis of track, artists and album data across different streaming platforms such as spotify, YouTube, TikTok etc. The project showcases the use of pyspark as an analytical solution to provide understaning and metrics into the numbers surrounding streams of Tracks and Arists performances.

basic ml spar

Basic ML on Pyspark is a continued project on the capabilities of pyspark. utilizing Mlib functions of spark, classical regression and classification tasks and models can be performed on Resilient Distributed Datasets(RDD) which is a core structure for spark framework.

basic ml spark 2

About

Providing Insights, understanding and processing Big Data using SQL and PySpark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published