You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create simple web scraper to automatically obtain links to all ISEL PDF timetables from the programme page (it can also include official programme name and degree level - licenciatura / mestrado).
To consider:
The current solution uses a ".properties" file for each programme that contains a couple of key-value pairs containing the PDF URL and an alert recipient email address. These files have to be updated manually.
Should the files used to store programme metadata and PDF URL be updated automatically as soon as a new URL is found?
Should we keep a file "history" of some sort?
If yes, should we fallback to previous working URL if a new one fails? (downloading or parsing? both?)
Should the source file format be changed to store this data in a more structured format like YAML, JSON or CSV?
Is there any other source of information that could be scraped that would prove useful for other components?
The text was updated successfully, but these errors were encountered:
Created a basic throwaway scraper as a practical proof of concept using Node JS.
ISEL's current website doesn't include ID attributes in most elements so DOM queries will have to rely on element type + class combinations and even a little filtering through href attributes.
Create simple web scraper to automatically obtain links to all ISEL PDF timetables from the programme page (it can also include official programme name and degree level - licenciatura / mestrado).
To consider:
The text was updated successfully, but these errors were encountered: