Libariy Seats over time

I want to understand the patterns in the occupation of the libary seats of my university

Method

Procurement of data

Setup of the machine

Since I need something to run code all day long for up to a year, I wanted to use a remote solution. I set up a EC2 Linux instance on AWS. I wanted to avoid Amazon because I personally dislike the company but for some reason I could not use the free tier of Oracle, Microsoft, Heroku and IBM.

The system requirements for this job are almost as low as it gets and there was only one option included in the freee tier anyways: t2.micro

Scraping

I wrote a simple python script "scrape.py" that is executed every 15 minutes on the Linux instance. It downloads the html of the university website and extracts the relevant information from it. It then extends a csv file "bib_seats.csv" with the relevant information. To run this scipt every 15 minutes, I created a file "scrape.cron" with the content "*/15 * * * * python3 bib_sears.py". This file with cron job(s) is activated with "crontab scrape.cron"

So in the end instance we have 3 files on the Linux instance:

scrape.py
scrape.cron
bib_seats.csv

This is nice and all but we also need to download "bib_seats.csv" somehow.

Downloading

I Sceduled a job on my local machine that downloads the csv file from the Linux instance using SCP. For this I wrote the file download.bat which is executed with Windows Tasks ("Aufgabenplanung" in German). This is done once per day for safety since something can allways go wrong (e.g. I could forget that the free tier runs out).

Domain Knowledge

I regularily checked the blog of the university to know of relevant events that might impact the data. One exapmle is a water damage which forced the main part of one library to close. This information is collected in domain_knowledge.txt

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
to_merge		to_merge
.gitignore		.gitignore
Desktop.ini		Desktop.ini
README.md		README.md
bib_seats.csv		bib_seats.csv
bib_seats.png		bib_seats.png
bib_seats_20240624.csv		bib_seats_20240624.csv
bib_seats_20241213.csv		bib_seats_20241213.csv
css_hd.py		css_hd.py
domain_knowledge.txt		domain_knowledge.txt
download_public.bat		download_public.bat
scrape.py		scrape.py
scrape2.py		scrape2.py
scrape_Public.bat		scrape_Public.bat
sleep.py		sleep.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Libariy Seats over time

Method

Procurement of data

Setup of the machine

Scraping

Downloading

Domain Knowledge

Data Analysis

Visualization

Seasonal Time Series Analysis

Insights

About

Releases

Packages

Languages

leowerne/bib_seats

Folders and files

Latest commit

History

Repository files navigation

Libariy Seats over time

Method

Procurement of data

Setup of the machine

Scraping

Downloading

Domain Knowledge

Data Analysis

Visualization

Seasonal Time Series Analysis

Insights

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages