Skip to content

A workshop with the focus on building big data processing pipeline.

Notifications You must be signed in to change notification settings

iandow/bb-mapr-demo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data Basics

by BigBoards

In this 2-day workshop you will learn how to build a complete data processing pipeline. The workshop is hands-on using a BigBoards Hex. You will be touching several common Big Data technologies.

The accompanying repository, contains all the technologies, resources and solutions to complete the workshop.

During this workshop, you will

  • ingest data from a rather large relational database that contains weather and sales data;
  • store the raw data on distributed file system as your primary data;
  • restructure the data for easier analysis;
  • and finally apply machine learning to build a recommendation engine.

Big Data technologies

We have packaged all the required technologies for this workshop as a BigBoards Tint. With the click of a button you can install everything on a Hex, in the cloud or on your own servers. Just head over to the BigBoards Hive.

The technologies which you will be using for your end-to-end data pipeline, are:

For now, we still host the data external to the big data clusters.

Presentations

You will learn the basics on Big Data and cluster processing using 2 presentations:

  1. Big Data Basics - Common explains Big Data and it use cases.
  2. Big Data Basics - Building a Data Pipeline guides you through the practical exercises. This presentation covers all the technologies and the resources for this project.

The BigBoards Hex gives you everything you need to get your hands dirty.

Practical

You can login to jupyterhub with the default bigboards username (bb) and password (Swh^bdl)

License

Made with ♡ for data!

You are free to use the content, presentations and resources from the workshop. Do keep in mind that we have put an aweful lot of work in creating these artefacts: please mention us to spread the karma!

Creative Commons-Licentie

Big Data Basics by BigBoards CVBA is licensed under a Creative Commons Attribution 4.0 International Licence.

Based on work from https://github.com/bigboards/bb-stack-training

About

A workshop with the focus on building big data processing pipeline.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 60.6%
  • Shell 19.8%
  • PigLatin 13.1%
  • Batchfile 4.7%
  • XSLT 1.8%