Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gedling Borough Council #448

Closed
3 tasks done
roberthunt opened this issue Nov 28, 2023 · 27 comments · Fixed by #506
Closed
3 tasks done

Gedling Borough Council #448

roberthunt opened this issue Nov 28, 2023 · 27 comments · Fixed by #506
Labels
council request A new council request

Comments

@roberthunt
Copy link

roberthunt commented Nov 28, 2023

Name of Council

Gedling Borough Council

Example Address/Postcode

Valeside Gardens

Additional Information

This one may be quite challenging, some facts:

  1. Only data available is static calendar PDFs covering a 12-month period.
  2. Scraping text from the PDFs is not helpful as it relies on cell background colours to identify days.
  3. Lookup only accepts a street name or partial street name (no postcodes).
  4. There is a reminder service but this is by email only and around 12 hours before collection.
  5. There seem to be a fixed number of calendars. For normal collections, 4 for every weekday, for garden waste 2 for every weekday.
  6. When a collection falls on a bank holiday it seems to shift to the previous Saturday.

Ideas

  1. We can resolve a street name to a specific calendar using the search.
  2. We should be able to predict the days based on the calendar.
  3. We need to know UK bank holidays to figure out shifted collections.

2022 / 2023

https://apps.gedling.gov.uk/refuse/search.aspx

Household (Black Bin) / Glass (Green Box) / Recycling (Green Bin)

At a glance, the G number seems to correlate to the week that the glass collection occurs (glass + recycling), starting in the first month (December). So WednesdayG3 would have glass collection 3rd week Dec 2022.

Collection pattern is [Household -> Recycling -> Household -> Recycling/Glass]

MondayG1.pdf
MondayG2.pdf
MondayG3.pdf
MondayG4.pdf
TuesdayG1.pdf
TuesdayG2.pdf
TuesdayG3.pdf
TuesdayG4.pdf
WednesdayG1.pdf
WednesdayG2.pdf
WednesdayG3.pdf
WednesdayG4.pdf
ThursdayG1.pdf
ThursdayG2.pdf
ThursdayG3.pdf
ThursdayG4.pdf
FridayG1.pdf
FridayG2.pdf
FridayG3.pdf
FridayG4.pdf

Garden Waste (Brown Bin)

Garden Waste A.pdf
Garden Waste B.pdf
Garden Waste C.pdf
Garden Waste D.pdf
Garden Waste E.pdf
Garden Waste F.pdf
Garden Waste G.pdf
Garden Waste H.pdf
Garden Waste I.pdf
Garden Waste J.pdf

Verification

  • I searched for similar issues at https://github.com/robbrad/UKBinCollectionData/issues?q=is:issue and found no duplicates
  • I have provided a tested working address/postcode/UPRN with bin collections available
  • I understand that this project is run by volunteer contributors and completion depends on numerous factors - even with a request, we cannot guarantee if/when your council will get a script
@roberthunt roberthunt added the council request A new council request label Nov 28, 2023
@sym0nd0
Copy link

sym0nd0 commented Dec 4, 2023

If this one can be scraped and included in this project, I'll be over the moon.

I've been battling with this nonsensical way of this data being provided and have asked Gedling on multiple occasions to either provide a simple list of dates or, better yet, an API for it but to no avail.

@dp247
Copy link
Collaborator

dp247 commented Dec 4, 2023

No promises but... have you got the URLs for those calendars 😉

@roberthunt
Copy link
Author

Yes, keep in mind they seem to re-use the URLs from year to year so they have recently swapped over to delivering the 2023/2024 calendar now. The files for last year are above though by way of reference in how they may change.

2023/2024

Household (Black Bin) / Glass (Green Box) / Recycling (Green Bin)

https://apps.gedling.gov.uk/refuse/data/MondayG1.pdf
https://apps.gedling.gov.uk/refuse/data/MondayG2.pdf
https://apps.gedling.gov.uk/refuse/data/MondayG3.pdf
https://apps.gedling.gov.uk/refuse/data/MondayG4.pdf
https://apps.gedling.gov.uk/refuse/data/TuesdayG1.pdf
https://apps.gedling.gov.uk/refuse/data/TuesdayG2.pdf
https://apps.gedling.gov.uk/refuse/data/TuesdayG3.pdf
https://apps.gedling.gov.uk/refuse/data/TuesdayG4.pdf
https://apps.gedling.gov.uk/refuse/data/WednesdayG1.pdf
https://apps.gedling.gov.uk/refuse/data/WednesdayG2.pdf
https://apps.gedling.gov.uk/refuse/data/WednesdayG3.pdf
https://apps.gedling.gov.uk/refuse/data/WednesdayG4.pdf
https://apps.gedling.gov.uk/refuse/data/ThursdayG1.pdf
https://apps.gedling.gov.uk/refuse/data/ThursdayG2.pdf
https://apps.gedling.gov.uk/refuse/data/ThursdayG3.pdf
https://apps.gedling.gov.uk/refuse/data/ThursdayG4.pdf
https://apps.gedling.gov.uk/refuse/data/FridayG1.pdf
https://apps.gedling.gov.uk/refuse/data/FridayG2.pdf
https://apps.gedling.gov.uk/refuse/data/FridayG3.pdf
https://apps.gedling.gov.uk/refuse/data/FridayG4.pdf

Garden Waste (Brown Bin)

https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20A.pdf
https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20B.pdf
https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20C.pdf
https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20D.pdf
https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20E.pdf
https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20F.pdf
https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20G.pdf
https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20H.pdf
https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20I.pdf
https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20J.pdf

@dp247
Copy link
Collaborator

dp247 commented Dec 4, 2023

Cheers. I've also sent an FOI request to the council for their data, so we may have two ways to go about it.

@sym0nd0
Copy link

sym0nd0 commented Dec 4, 2023

Cheers. I've also sent an FOI request to the council for their data, so we may have two ways to go about it.

That is genius! 😂

@dp247
Copy link
Collaborator

dp247 commented Dec 17, 2023

I got a response... they sent me PDF files

@sym0nd0
Copy link

sym0nd0 commented Dec 17, 2023

The joy of dealing with Gedling. 😂

When asked about API access, following their email alerts recently falling over and either sending people notifications for the wrong bin to be collected (even different bins to different individuals subscribed from the same house 🤦🏼‍♂️) or no email at all they've said

we are looking at options including an easier interface but, for now, we will continue with the email alerts, we're just having a few issues since we moved to a new system.

@robbrad
Copy link
Owner

robbrad commented Dec 17, 2023

The joy of dealing with Gedling. 😂

When asked about API access, following their email alerts recently falling over and either sending people notifications for the wrong bin to be collected (even different bins to different individuals subscribed from the same house 🤦🏼‍♂️) or no email at all they've said

we are looking at options including an easier interface but, for now, we will continue with the email alerts, we're just having a few issues since we moved to a new system.

Tell them they are welcome to open a pull request on this GitHub repository as an option.

@robbrad
Copy link
Owner

robbrad commented Dec 17, 2023

If we do decide to do something funky with the PDFs - please keep in mind

#493 (comment)

@skelt0
Copy link
Contributor

skelt0 commented Dec 20, 2023

I know it's not very 'smart' but would it be a half way house if we were just to hard code the data? You could still use the address lookup to check the right lookup data. It would mean once a year someone would have to grab the data and put it in a sensible format so someone can submit the changes. I'm happy to write the first version up - @roberthunt would you be happy checking in on this yearly to update the data or create a request for someone else to do it?

I know it's non ideal, but anything else seems like it'll either take much longer or not happen at all. And at least it gives the poor folk of Gedling HA interation?

@robbrad ?

@robbrad
Copy link
Owner

robbrad commented Dec 20, 2023

I'm okay with that. I know it's less than ideal, but can it be a JSON dictionary in the Python council file? The only reason I say this is if we start having extra files in the repository, it dilutes the structure we currently have. What do you think, @skelt0 ?

@skelt0
Copy link
Contributor

skelt0 commented Dec 20, 2023

Yeah ok! I'll see what I can pull together!

@robbrad
Copy link
Owner

robbrad commented Dec 20, 2023

This may or may not help you get the data out https://github.com/pymupdf/PyMuPDF

Other option if there is someway to go PDF to HTML then extract the data rather than hand typing it

@skelt0
Copy link
Contributor

skelt0 commented Dec 20, 2023

@robbrad - Check out an initial stab at this: https://github.com/skelt0/UKBinCollectionData/blob/feat-gedling-borough-council/uk_bin_collection/uk_bin_collection/councils/GedlingBoroughCouncil.py

The calendar data is pretty predictable as mentioned somewhere above so i've made a helper script to generate the dates based on three values. It makes it a million times quicker. I wouldn't like to predict that this predictive modelling will work in the future years though (even/odd weeks, and 1 in 4 for glass) so 50/50 on whether I save the helper script somewhere.

Let me know what you think and I can continue putting the data in. Currently the link above works for the supplied street's refuse data (Black, Blue and Glass bins).

Note: This isn't tidied up yet and the address is currently hardcoded.

@robbrad
Copy link
Owner

robbrad commented Dec 21, 2023

Looking good!

@skelt0
Copy link
Contributor

skelt0 commented Dec 28, 2023

@roberthunt - can you let me know how you get on with this? It's hand entered - i've tried to match all the changes due to bank holidays.

Also - I was wondering if the FoI process could be repeated - but asking for an accessible (for screen reader) version of the data? Surely they must need to supply this data in an accessible format when requested?

Anyway - hope this works out!

@jamesmacwhite
Copy link
Contributor

jamesmacwhite commented May 29, 2024

If it helps, I've converted the horrible PDFs into the iCal format and hosted them for use, as I already did this with my own schedule Wednesday G2. The schedules generally follow a consistent schedule with the exception of bank holidays being identified as changed collection days.

https://github.com/jamesmacwhite/gedling-borough-council-bin-calendars

If you want to argue the case on legal grounds, all councils fall under the Public Sector Bodies Websites and Mobile Applications (No. 2) Accessibility Regulations 2018 act, they are legally required to make content accessible. The fact the calendars provided were created after 2018, would mean they would be required to provide an alternative format. If you want to push the issue, they are technically not meeting accessibility regulations with the formats provided.

@sym0nd0
Copy link

sym0nd0 commented May 30, 2024

James, you're a superstar! Thanks for doing that and for sharing.

@jamesmacwhite
Copy link
Contributor

jamesmacwhite commented May 30, 2024

No worries! It was great to come across this project and that it exists to create an API layer when there is none. Unfortunately for Gedling Borough Council, the PDFs are the only data source outside of the email reminder service, but while the email service is better accessibility wise given it's HTML, this does not provide full schedule data, so it's either PDF or nothing, which is horrible and borderline on their accessibility statement as referenced.

You could in theory trigger an automation on the email reminder being received and parse out the data from that. The consistent properties like the sender from or subject are available.

From: GBC Bin Reminder Alert <news@comms.gedling.gov.uk>
Subject: We're collecting your bin tomorrow, please it out by 6am

The heading which contains the bin type is under a <h2> element but does not have a specific ID, there are also two <h2> elements, so you'd have to take the first occurrence and then parse our the all caps text as that's what they use for bin type.

image

I looked at this orginally, but by the time you've looked at the email automation/HTML scraping side of things with the fragile nature of DOM/HTML parsing and the fact the Garden Waste Collection service is completely outside of this, just converting all to iCal seems easier and at least reliable, providing the occurrence scheduling aligns to the original PDF, so that's what I ended up doing after seeing a few others around home automation having the same issues with Gedling. Who knew Gedling has 20 different bin collections!

We should still push Gedling Borough Council to look at this though long term, the PDFs themselves have and always will be print documents, which Gedling won't actually print anymore anyway due to cost/sustainability, so the format in my view is outdated. Clearly, if the email reminder service exists, they have some form of scheduling system behind the scenes, so it doesn't seem to far to publish official iCal calendars.

@jamesmacwhite
Copy link
Contributor

I've also mocked up a web page with all the iCal links for easy reference as well: https://jamesmacwhite.github.io/gedling-borough-council-bin-calendars/. I'm not going to go as far as buy a domain name for the site, but a static Jekyll site should make it easier, rather than messing around with the Raw button on GitHub.

@sym0nd0
Copy link

sym0nd0 commented Jun 2, 2024

Love that! Thanks again for your work on this, made my life a lot easier.

@jamesmacwhite
Copy link
Contributor

You're welcome. HTML and JSON formats are also provided, making the data more accessible and open!

@jamesmacwhite
Copy link
Contributor

Since #763 was merged, this project now leverages API data from gbcbincalendars.co.uk removing the static issue. There is still the requirement to create iCal data for each calendar each year, but this should have a lower maintenance burden, given using calendar occurrences, allows this to be done without individually listing every single date occurrence manually. The JSON data is expanded to provide the collection dates in full, which is generated from RRULE iCal data.

@robbrad
Copy link
Owner

robbrad commented Jun 29, 2024

Do we need to capture this process in the wiki at all?

And may I say, fabulous work @jamesmacwhite

@jamesmacwhite
Copy link
Contributor

Thanks. Glad it can be of use to other projects!

@jamesmacwhite
Copy link
Contributor

jamesmacwhite commented Jun 29, 2024

One thing for your wiki you might want to highlight. There's at least one case where a valid street name only returns data for one type of collection and not both. Odd right? Not sure how that's valid to be honest. I doubled checked this at the source and confirmed it's an oddity with Gedling's data.

Using Beswick Close as the example.

No refuse data is returned, yet it does have garden collection data.

I've confirmed Beswick Close is within the Gedling boundary, but that's not really a surprise when clearly you can have a garden collection calendar!

I happened to come across this as there's some Google Analytics tracking on searches, and I cross check some searches locally just to ensure they are returning data correctly and this is one that discovered this kind of scenario is possible. More Gedling fun. I updated my own search tool to handle the scenario. The API response of an empty array for collections with no data is valid, but I guess I never expected this to occur for just one type.

image

My suspicion is that it's due to being a relatively new built area in the past two years it could possibly be a data lag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
council request A new council request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants