Skip to content

Commit

Permalink
Merge branch 'main' into refactoring/#301-Refactor_bucket_handling
Browse files Browse the repository at this point in the history
  • Loading branch information
ckunki committed Jul 23, 2024
2 parents 69da750 + c1245f0 commit 0a4ffd9
Show file tree
Hide file tree
Showing 6 changed files with 1,834 additions and 3 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/hyperlinks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ jobs:

- name: Check Hyperlinks in Jupyter Notebooks
run: >
poetry run pytest --check-links exasol/ds/sandbox/runtime/ansible/roles/jupyter/files/notebook/
poetry run pytest --check-links --check-links-ignore "https://www.transtats.bts.gov/.*" exasol/ds/sandbox/runtime/ansible/roles/jupyter/files/notebook/
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "3e21b55a-32e5-47bf-a226-1a56a72e4699",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"# US Flights\n",
"\n",
"In this notebook we will load a dataset with information about US Flights. The data is publicly accessible at the [Bureau of Transportation Statistics](https://www.transtats.bts.gov/Homepage.asp) of the US Department of Transportation. We will load a selection of this data stored in a AWS cloudfront.\n",
"\n",
"We will be running SQL queries using <a href=\"https://jupysql.ploomber.io/en/latest/quick-start.html\" target=\"_blank\" rel=\"noopener\"> JupySQL</a> SQL Magic.\n",
"\n",
"## Prerequisites\n",
"\n",
"Prior to using this notebook the following steps need to be completed:\n",
"1. [Configure the AI-Lab](../main_config.ipynb).\n",
"\n",
"## Setup\n",
"\n",
"### Open Secure Configuration Storage"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5fa71bb-193e-438b-b126-cdd558d44e48",
"metadata": {},
"outputs": [],
"source": [
"%run ../utils/access_store_ui.ipynb\n",
"display(get_access_store_ui('../'))"
]
},
{
"cell_type": "markdown",
"id": "1ce12284-f647-4435-a8a4-48aeb83d4c14",
"metadata": {},
"source": [
"Let's bring up JupySQL and connect to the database via SQLAlchemy. Please refer to the documentation of <a href=\"https://github.com/exasol/sqlalchemy-exasol\" target=\"_blank\" rel=\"noopener\">sqlalchemy-exasol</a> for details on how to connect to the database using the Exasol SQLAlchemy driver."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fad4e5e2-5674-470f-ac6c-d07885ad2b33",
"metadata": {},
"outputs": [],
"source": [
"%run ../utils/jupysql_init.ipynb"
]
},
{
"cell_type": "markdown",
"id": "2d088386-edc1-4ddf-a89d-9b329cf54488",
"metadata": {},
"source": [
"## Create tables"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d50e1a26-3fc0-47a1-a56a-9ded22e144ae",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"CREATE OR REPLACE TABLE US_FLIGHTS (\n",
" FL_DATE DATE,\n",
" OP_CARRIER_AIRLINE_ID DECIMAL(10, 0),\n",
" ORIGIN_AIRPORT_SEQ_ID DECIMAL(10, 0),\n",
" ORIGIN_STATE_ABR CHAR(2),\n",
" DEST_AIRPORT_SEQ_ID DECIMAL(10, 0),\n",
" DEST_STATE_ABR CHAR(2),\n",
" CRS_DEP_TIME CHAR(4),\n",
" DEP_DELAY DECIMAL(6, 2),\n",
" CRS_ARR_TIME CHAR(4),\n",
" ARR_DELAY DECIMAL(6, 2),\n",
" CANCELLED BOOLEAN,\n",
" CANCELLATION_CODE CHAR(1),\n",
" DIVERTED BOOLEAN,\n",
" CRS_ELAPSED_TIME DECIMAL(6, 2),\n",
" ACTUAL_ELAPSED_TIME DECIMAL(6, 2),\n",
" DISTANCE DECIMAL(6, 2),\n",
" CARRIER_DELAY DECIMAL(6, 2),\n",
" WEATHER_DELAY DECIMAL(6, 2),\n",
" NAS_DELAY DECIMAL(6, 2),\n",
" SECURITY_DELAY DECIMAL(6, 2),\n",
" LATE_AIRCRAFT_DELAY DECIMAL(6, 2)\n",
");"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3018f314-3046-4460-94cc-3985e6a7500a",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"CREATE OR REPLACE TABLE US_AIRLINES (\n",
" OP_CARRIER_AIRLINE_ID DECIMAL(10, 0) IDENTITY PRIMARY KEY,\n",
" CARRIER_NAME VARCHAR(1000)\n",
");"
]
},
{
"cell_type": "markdown",
"id": "5daf1eeb-7835-4c49-8abf-2b21e381ba9b",
"metadata": {},
"source": [
"## Bring in the UI functions\n",
"\n",
"We will need some UI functions that will handle loading the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "196760d5-4ab7-4386-a7c1-a3adae70f4ac",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"%run utils/flight_utils.ipynb"
]
},
{
"cell_type": "markdown",
"id": "be9d3cfc-df08-4095-804d-fe7153684a24",
"metadata": {},
"source": [
"## Load the data\n",
"\n",
"Please select one or more data periods for the flights in the table below. Once the data for the selected periods is loaded the entries will be removed from the table. Please do not load data for the same period more than once.\n",
"\n",
"Load the airlines' data (their codes and names). A repeated attempt to load the airlines' data will result in the primary key violation error."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35ce66da-4c13-4af4-8bcf-6cdcfc377b36",
"metadata": {
"tags": [
"data_selection"
]
},
"outputs": [],
"source": [
"display(get_data_selection_ui(ai_lab_config))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "16b44e84-d811-4b10-a9a2-172ff6629e93",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "60e2a631-bbe1-4d46-b42d-a3ae6d63bf2a",
"metadata": {},
"outputs": [],
"source": [
"from exasol.nb_connector.utils import upward_file_search\n",
"\n",
"# This NB may be running from various locations in the NB hierarchy.\n",
"# Need to search for other supporting NBs from the current directory upwards.\n",
"\n",
"%run {upward_file_search('utils/ui_styles.ipynb')}\n",
"%run {upward_file_search('utils/popup_message_ui.ipynb')}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "438f34f0-60e4-499d-b5e0-f5cdeea3fa49",
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"from datetime import datetime\n",
"from dateutil.relativedelta import relativedelta\n",
"import io\n",
"from itertools import islice\n",
"\n",
"import ipywidgets as widgets\n",
"import requests\n",
"from exasol.nb_connector.secret_store import Secrets\n",
"from exasol.nb_connector.connections import open_pyexasol_connection\n",
"\n",
"\n",
"def _transform_flights(pipe, src) -> None:\n",
"\n",
" for row in islice(src, 1, None):\n",
" fields = row.split(',')\n",
" fields[0] = datetime.strptime(fields[0], '%m/%d/%Y %I:%M:%S %p').strftime('%Y-%m-%d')\n",
" for i in [10, 12]:\n",
" fields[i] = 'False' if float(fields[i]) == 0. else 'True'\n",
" pipe.write(bytes(','.join(fields), encoding='utf-8'))\n",
"\n",
"\n",
"def _transform_airlines(pipe, src) -> None:\n",
"\n",
" for row in islice(src, 1, None):\n",
" fields = row.split(',', maxsplit=1)\n",
" al_name = fields[1].strip('\"\\n').split(':')[0]\n",
" fields[1] = f'\"{al_name}\"'\n",
" pipe.write(bytes(','.join(fields) + '\\n', encoding='utf-8'))\n",
"\n",
"\n",
"def _import_from_cloudfront(conf: Secrets, file_name: str, transform, table_name: str) -> None:\n",
"\n",
" # Read requested csv file from the cloudfront\n",
" response = requests.get(f'https://d1je7p5oh8pade.cloudfront.net/{file_name}')\n",
" response.raise_for_status()\n",
" content_stream = io.BytesIO(response.content)\n",
" f_src = io.TextIOWrapper(content_stream, encoding='utf-8')\n",
"\n",
" # Pre-process and import the data\n",
" with open_pyexasol_connection(conf, schema=conf.db_schema, compression=True) as pyexasol_conn:\n",
" pyexasol_conn.import_from_callback(transform, f_src, table=table_name)\n",
"\n",
"\n",
"def load_flights_data(conf: Secrets, months: List[str]) -> None:\n",
"\n",
" for mon in months:\n",
" file_name = f'US_FLIGHTS_{mon.replace(\" \", \"_\").upper()}.csv'\n",
" _import_from_cloudfront(conf, file_name, _transform_flights, 'US_FLIGHTS')\n",
"\n",
"\n",
"def load_airlines_data(conf: Secrets) -> None:\n",
"\n",
" _import_from_cloudfront(conf, 'US_AIRLINES.csv', _transform_airlines, 'US_AIRLINES')\n",
"\n",
"\n",
"def get_data_selection_ui(conf: Secrets) -> widgets.Widget:\n",
" \"\"\"\n",
" Builds a UI with a multi-select list of data periods and buttons for\n",
" loading the selected data.\n",
" \"\"\"\n",
"\n",
" ui_look = get_config_styles()\n",
"\n",
" start_date = datetime(year=2023, month=4, day=1)\n",
" months = [(start_date + relativedelta(months=i)).strftime('%b %Y')\n",
" for i in range(12)]\n",
"\n",
" data_selector = widgets.SelectMultiple(options=months, layout=ui_look.input_layout, style=ui_look.input_style)\n",
" flights_btn = widgets.Button(description='Load Flights', style=ui_look.button_style, layout=ui_look.button_layout)\n",
" airlines_btn = widgets.Button(description='Load Airlines', style=ui_look.button_style, layout=ui_look.button_layout)\n",
" header_lbl = widgets.Label(value='Data Periods', style=ui_look.header_style, layout=ui_look.header_layout)\n",
"\n",
" def load_flights(btn):\n",
" if data_selector.value:\n",
" try:\n",
" load_flights_data(conf, data_selector.value)\n",
" popup_message('Flights data has been loaded successfully')\n",
" data_selector.options = [opt for opt in data_selector.options \n",
" if opt not in data_selector.value]\n",
" btn.icon = 'check'\n",
" btn.disabled = not data_selector.options\n",
" except Exception as ex:\n",
" popup_message(f'Failed to load the flights data: {ex}')\n",
" else:\n",
" btn.icon = 'check'\n",
"\n",
" def load_airlines(btn):\n",
" try:\n",
" load_airlines_data(conf)\n",
" popup_message('Airlines data has been loaded successfully')\n",
" btn.disabled = True\n",
" except Exception as ex:\n",
" popup_message(f'Failed to load the airlines data: {ex}')\n",
"\n",
" def on_value_change(change):\n",
" flights_btn.icon = 'pen'\n",
"\n",
" flights_btn.on_click(load_flights)\n",
" airlines_btn.on_click(load_airlines)\n",
" data_selector.observe(on_value_change, names=['value'])\n",
"\n",
" group_items = [header_lbl, widgets.Box([data_selector], layout=ui_look.row_layout)]\n",
" items = [widgets.Box(group_items, layout=ui_look.group_layout), \n",
" widgets.Box([flights_btn, airlines_btn], layout=ui_look.row_layout)]\n",
" ui = widgets.Box(items, layout=ui_look.outer_layout)\n",
" return ui"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 0a4ffd9

Please sign in to comment.