Skip to content

b1ackout/trg_challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TRG Evaluation

Challenge 1

I wanted to make something easily deployable so after some research I cloned this repo and made the appropriate changes. So a big thanks to omahoco

This project contains the following containers:

  • A Spark cluster with two worker nodes.
  • A Jupiter notebook server for exploration.
  • A Shared filesystem for blob storage.

Prerequisites

This runs only on unix systems. Please do not run on Windows. You just need to have docker installed.

Building the environment

# Clone the repo
git clone https://github.com/b1ackout/trg_challenge

# Create the environment
cd trg_challenge
make all

# Unpack the data:
make unpack_data

To submit the spark job that will generate the parquet files just run:

make spark-submit

To submit the spark job that will generate the json kpis run:

make spark-kpis

Checking the data via Jupyter and steps followed

Navigate to http://localhost:9999. Jupyter will ask you for a token for authentication.

To get the authentication token from the Jupiter server.

make jupyter_token

In Jupyter, I did some data exploration.

I concluded that the left join of the street with the outcome datasets is the way to go, because the crimes of the outcome that don't join must be from older time period than that of our scope.

Also a window function was used to ensure that the outcome is actually the latest.

I also wanted to deploy an elastic and a kibana node but was not able to make elastic ingest the parquet data, I was always running to errors. I used this for reference.

So my other thought was to use databricks for visualization. Since databricks uses its own datasources, lets assume that the shared spark containers storage (/data directory) is a blob storage like S3 and databricks can access it.

An html export file of a databricks notebook is included, containing the kpi visualizations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published