Skip to content

Repository for storing code for my MS in Data Science course CS673 Scalable Databases at Pace University.

Notifications You must be signed in to change notification settings

awesomecosmos/CS673-Scalable-Databases

Repository files navigation

CS673-Scalable-Databases

Repository for storing code for my MS in Data Science course CS673 Scalable Databases at Pace University.

Course description: After reviewing relational databases and SQL, students will learn the fundamentals of alternative data storage schemas to deal with large amounts of data (structured and unstructured). The course covers big data and the development of the Hadoop file system, the MapReduce programming paradigm, and database management systems such as Cassandra, HBase, and Neo4j. Students will learn about NoSQL, distributed databases, and graph databases. The course emphasizes the differences between traditional database management systems and alternatives with respect to accessibility, cost, transaction speed, and structure. Part of the course is dedicated to accessing, handling, and processing data from different sources and of different types using Python. The course provides hands-on practice.

Project 1

In this project, I analyzed a dataset of my choosing using SQL. Specifically, I analyzed the Data Scientist Salaries 2023 dataset, and created a local database, created tables, and wrote queries to explore this dataset.

Project 2

In this project, I had to complete some basic Python commands.

Project 3

In this project, I completed some tasks using SparkSQL in the Spark big data context. You can also find my results here on Databricks Community.

Project 4

In this project, I completed some tasks and wrote queries using HBase/Cassandra.

Midterm Project

In this project, my partner and I analyzed the Data Scientist Salaries 2023 dataset, and performed EDA, data cleaning, wrangling, manipulation, etc. in order to answer targeted queries and extract insights from the dataset. We recorded our presentation on this project here: https://youtu.be/z1-39Pkm-2E

Final Project

In this project, I wrote Neo4j queries to create a graph network and analyze connections. I recorded a tutorial here: https://youtu.be/_NO4wwGkpRo, and this project has its own Github repo here.

About

Repository for storing code for my MS in Data Science course CS673 Scalable Databases at Pace University.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published