GitHub - VikentiosVitalis/advanced_topics_in_database_systems: Data Science Project - for 'Advanced Topics in Database Systems' M.Sc. Course ECE @ntua

ECE @NTUA Advanced Topics in Database Systems

Data Science Project

This project focuses on data analysis using Apache Hadoop and Apache Spark.

The goal is to familiarize working with distributed systems and modern data science techniques.

The project utilizes large datasets related to crime data in Los Angeles.

Tools Used

Apache Hadoop 3.3.6 The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.

Distributed processing of large datasets across clusters of computers using simple programming models.

Apache Spark 3.5.0

Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

Batch/streaming data

SQL analytics

Data science at scale

Machine learning

~Okeanos-knossos

The project uses virtual machines from the public cloud ~Okeanos-knossos.

A detailed setup guide for the installation of the tools used is available in the files/documents folder.

Results

A detailed report with the execution and the interpretation of queries is also available in the files/documents folder.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
datasets		datasets
files		files
queries		queries
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECE @NTUA Advanced Topics in Database Systems

Data Science Project

Tools Used

Results

Contributors

About

Releases

Packages

Languages

License

VikentiosVitalis/advanced_topics_in_database_systems

Folders and files

Latest commit

History

Repository files navigation

ECE @NTUA Advanced Topics in Database Systems

Data Science Project

Tools Used

Results

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages