Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: add GitHub action that automatically creates the derivative databases #81

Merged
merged 11 commits into from
Mar 3, 2023
27 changes: 27 additions & 0 deletions .github/workflows/create_db_derivatives.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Create and commit category, images and extended database files

on:
push:
branches: ["main"]
paths:
- "plane-alert-db.csv"
- "plane_images.txt"
- "blacklist.txt"

jobs:
createDerivativeDatabases:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.10"
- run: pip install -r ./scripts/requirements.txt

- name: Create category, images and extended database CSV files
run: python ./scripts/create_db_derivatives.py

- name: Commit category, images and extended database CSV files
uses: stefanzweifel/git-auto-commit-action@v4
with:
commit_message: "refactor: update derivative databases."
18 changes: 18 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"workbench.colorCustomizations": {
"activityBar.activeBackground": "#78a5a5",
"activityBar.background": "#78a5a5",
"activityBar.foreground": "#15202b",
"activityBar.inactiveForeground": "#15202b99",
"activityBarBadge.background": "#865986",
"activityBarBadge.foreground": "#e7e7e7",
"commandCenter.border": "#e7e7e799",
"sash.hoverBorder": "#78a5a5",
"tab.activeBorder": "#78a5a5",
"titleBar.activeBackground": "#5e8c8c",
"titleBar.activeForeground": "#e7e7e7",
"titleBar.inactiveBackground": "#5e8c8c99",
"titleBar.inactiveForeground": "#e7e7e799"
},
"peacock.color": "#5e8c8c"
}
9 changes: 9 additions & 0 deletions READ_BEFORE_MAKING_CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Read before making changes

Please only suggest/make any changes to the following files:

- [plane-alert-db.csv](plane-alert-db.csv): This is the main database file. All non-image changes should be done here.
- [blacklist.txt](blacklist.txt): This source file contains planes that will cause your Twitter account to be banned. Please use it with care.
- [plane_images.txt](plane_images.txt): You can add plane images in this source file.

All other files (except PIA) are generated from this file using the [.github/workflows/create_db_derivatives](.github/workflows/create_db_derivatives) GitHub action, and if you do not make your changes there, they will be overwritten and lost.
3 changes: 3 additions & 0 deletions _bin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Readme

This folder contains the old database files used before the derivative database creation GitHub action was added to the repository. This folder will be removed when we ensure the GitHub action is working correctly.
5,401 changes: 5,388 additions & 13 deletions plane-alert-mil-images.csv → _bin/badgers-best-images.csv
100755 → 100644

Large diffs are not rendered by default.

5,401 changes: 5,388 additions & 13 deletions plane-alert-mil.csv → _bin/badgers-best.csv
100755 → 100644

Large diffs are not rendered by default.

File renamed without changes.
12,851 changes: 12,851 additions & 0 deletions _bin/plane-alert-db.csv

Large diffs are not rendered by default.

49 changes: 49 additions & 0 deletions _bin/plane-alert-pia.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
$ICAO,$Registration,$Operator,$Type,$ICAO Type,#CMPG,$Tag 1,$#Tag 2,$#Tag 3,Category,$#Link,#ImageLink,#ImageLink2,#ImageLink3
A0FDDB,N50KL
A0FDF9,N9XJ
A0FDFB,N650HC
A0FE01,N658HC
A0FE02,N113CS
A0FE30,N279PH
A0FE68,N935EF
A0FE84,N952DA
A0FE85,N83EP
A0FE2E,N439PW
A0FE3B,N16DJ
A0FE3F,N725DT
A0FE6E,N437JD
A0FE9F,N17513
A0FEAB,N302AK
A0FEAF,N68KP
A0FEB4,N676JM
A0FEBA,N977V
A0FECB,N628TS
A0FED0,N68885
A0FED7,N650JR
A0FEE1,N6JR
A0FEEA,N8800E
A0FEED,N681HS
A0FEEE,N51TE
A0FEEF,N682HS
A0FEE7,N98AC
A0FEF0,N8100E
A0FEF1,N8000E
A0FEF2,N8200E
A0FEF4,N998PB
A0FEF6,N928SZ
A0FEF7,N634BE
A0FEFB,N32MJ
A0FEFC,N1980K
A0FF05,N542TP
A0FF09,N414KU
A0FF0B,N271DV
A0FF17,N311JX
A0FF20,N898TS
A0FF21,N68KJ
A0FF22,N621MM
A0FF24,N14KL
A0FF25,N758PB
A0FF27,N2E
A102F8,N586GA
A1DC4A,N711PV
A4E954,N88BK
12 changes: 12 additions & 0 deletions _bin/plane-alert-search-terms-to-do.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Ministry,216 added
National Guard,6 added
Force,4670 added (not added United States Air Force and Royal Air Force)
Navy,1460 added
Coast Guard,398 added
Army,2071 added
Marine Corps,185 added
Administration,64 added
Guardia,41 added
Policja,
Aero Flite,
Coulson,
7,202 changes: 7,202 additions & 0 deletions _bin/plane-alert-wip.csv

Large diffs are not rendered by default.

File renamed without changes.
3,338 changes: 0 additions & 3,338 deletions plane-alert-civ-images.csv

This file was deleted.

3,338 changes: 0 additions & 3,338 deletions plane-alert-civ.csv

This file was deleted.

1,593 changes: 0 additions & 1,593 deletions plane-alert-gov-images.csv

This file was deleted.

1,593 changes: 0 additions & 1,593 deletions plane-alert-gov.csv

This file was deleted.

862 changes: 0 additions & 862 deletions plane-alert-pol-images.csv

This file was deleted.

862 changes: 0 additions & 862 deletions plane-alert-pol.csv

This file was deleted.

12,860 changes: 12,860 additions & 0 deletions plane_images.txt

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Readme

This folder contains several scripts used in the GitHub actions:

- `create_db_derivatives`: A script that can be used to create the derivative databases based on the `plane-alert-db.csv`, `plane_images.txt` and `blacklist.txt` files.
- `create_images_reference`: A tiny little script that I used to create the new `plane_images.txt` file. This file will be removed when we are sure the file of the new image is correct.
58 changes: 58 additions & 0 deletions scripts/create_db_derivatives.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
"""This script creates (derivative) category and images CSV database files from the main
'plane-alert-db.csv' database file. The categories are created based on the 'CMPG'
column, while images are added using the 'plane_images.txt' reference file. It also
creates an extended database file using the 'blacklist.txt' file.
"""

import logging

import pandas as pd

logging.basicConfig(
format="%(asctime)s %(levelname)-8s [%(name)s] %(message)s", level=logging.INFO
)


if __name__ == "__main__":
logging.info("Reading the main csv file...")
df = pd.read_csv("plane-alert-db.csv")
logging.info("Main csv file read successfully.")

logging.info("Reading the images reference file...")
images_df = pd.read_csv("plane_images.txt")
logging.info("Images reference file read successfully.")

logging.info("Creating the category and category images csv files...")
for category in df["#CMPG"].unique():
if category != category: # Skip N/A values.
continue

# Create category csv files.
logging.info(f"Creating the '{category}' category csv file...")
category_df = df[df["#CMPG"] == category]
category_df.to_csv(f"plane-alert-{category.lower()}.csv", index=False)

# Create images csv files.
logging.info(f"Creating the '{category}' category images csv file...")
category_images_df = pd.merge(category_df, images_df, how="left", on="$ICAO")
category_images_df.to_csv(
f"plane-alert-{category.lower()}-images.csv", index=False
)
logging.info("Category and category images csv files created successfully.")

logging.info("Create extended database csv file...")
blacklist_df = pd.read_csv("blacklist.txt")
extended_df = pd.merge(df, blacklist_df, how="outer")
extended_df.to_csv("plane-alert-db-extended.csv", index=False)
logging.info("Extended database csv file created successfully.")

logging.info("Creating the extended database images csv file...")
extended_images_df = pd.merge(extended_df, images_df, how="left", on="$ICAO")
extended_images_df.to_csv("plane-alert-db-extended-images.csv", index=False)
logging.info("Extended database images csv file created successfully.")

logging.info("Creating the main database images csv file...")
main_images_df = pd.merge(df, images_df, how="left", on="$ICAO")
main_images_df["#CMPG"] = main_images_df["#CMPG"].fillna("#N/A")
main_images_df.to_csv("plane-alert-db-images.csv", index=False)
logging.info("Category and images csv files created successfully.")
96 changes: 96 additions & 0 deletions scripts/create_images_reference.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
"""This script retrieves the plane images in the 'plane-alert-db-images.csv' database
and 'planepix.txt' file. It stores these images in a new 'plane_images.txt' reference
file to use later to create the 'images' CSV database files.

This script can be removed if we know that the new GitHub action results are correct.
"""
import logging

import numpy as np
import pandas as pd

logging.basicConfig(
format="%(asctime)s %(levelname)-8s [%(name)s] %(message)s", level=logging.INFO
)

if __name__ == "__main__":
logging.info("Retrieve images from the 'plane-alert-db-images.csv' file...")
plane_alert_db_images = pd.read_csv("plane-alert-db-images.csv")
plane_alert_db_images = plane_alert_db_images[
["$ICAO", "#ImageLink", "#ImageLink2", "#ImageLink3"]
]
logging.info(f"Images retrieved ({plane_alert_db_images.shape[0]}).")

logging.info("Retrieve images from the 'planepix.txt' file...")
planepix_df = pd.read_csv("planepix.txt")
planepix_df.columns = ["$ICAO", "#ImageLink4"]
logging.info(f"Images retrieved ({planepix_df.shape[0]}).")

logging.info(
"Merge images from the 'plane-alert-db-images.csv' and 'planepix.txt' files..."
)
plane_alert_db_images = pd.merge(
plane_alert_db_images, planepix_df, how="outer", on="$ICAO"
)
plane_alert_db_images = plane_alert_db_images.replace(
"", np.nan
) # Replace empty strings with NaN.
logging.info(f"Images merged ({plane_alert_db_images.shape[0]}).")

logging.info("Remove duplicates from the merged images...")
plane_alert_db_images["#ImageLink4"] = plane_alert_db_images.apply(
lambda row: row["#ImageLink4"]
if row["#ImageLink4"]
not in [row["#ImageLink"], row["#ImageLink2"], row["#ImageLink3"]]
else np.nan,
axis=1,
)
logging.info(f"Images without duplicates ({plane_alert_db_images.shape[0]}).")

logging.info("Make sure that the image urls have the correct format...")
plane_alert_db_images[
["#ImageLink", "#ImageLink2", "#ImageLink3", "#ImageLink4"]
] = plane_alert_db_images[
["#ImageLink", "#ImageLink2", "#ImageLink3", "#ImageLink4"]
].apply(
lambda row: row.apply(
lambda x: x
if (isinstance(x, float) and np.isnan(x)) or x.startswith("https://")
else (
x.replace("http://", "https://")
if x.startswith("http://")
else "https://" + x
)
),
axis=1,
)
logging.info(f"Images with correct format ({plane_alert_db_images.shape[0]}).")

# Print new images.
logging.info("Check if there were new images in the 'planepix.txt' file...")
new_image_links = plane_alert_db_images[
~plane_alert_db_images["#ImageLink4"].isnull()
][["$ICAO", "#ImageLink4"]]
logging.info(
"New images found ({}):\n {}".format(
new_image_links.shape[0], new_image_links.head().to_string(index=False)
)
)

logging.info("Removing empty left image columns...")
columns = plane_alert_db_images.columns
plane_alert_db_images = plane_alert_db_images.apply(
lambda x: pd.Series(x.dropna().values), axis=1
)
logging.info("Empty left image columns removed.")

logging.info("Adding extra 'ImageLink' column if needed...")
if columns.shape[0] > plane_alert_db_images.columns.shape[0]:
logging.info("No extra 'ImageLink' column needed to be added.")
else:
logging.info("Extra '#ImageLink4' column added.")
plane_alert_db_images.columns = columns[: plane_alert_db_images.columns.shape[0]]

logging.info("Saving found images in 'plane_images.txt' file...")
plane_alert_db_images.to_csv("plane_images.txt", index=False)
logging.info("Images successfully saved in 'plane_images.txt' file.")