Kerala Private Bus Schedule

Overview

This repository hosts a dataset of Kerala state's private bus schedule, initially available in PDF format, now conveniently converted to JSON. The dataset aims to offer detailed information about major bus stops, timings, vehicle number regarding private bus services in Kerala.

If you find any parsing errors, please create a pull request (PR) or a raise an issue

Structure

{
      "Vehicle Number": "KL 05 AQ 4567",
      "route": [
        "ALUVA BANK JUNCTION",
        "PULINCHODE SIGNAL JUNCTION",
        "COMPANYPADI",
        "MUTTOM"
      ],
      "schedule": [
        {
          "trip": 1,
          "stations": [
            {
              "station": "ALUVA BANK JUNCTION",
              "arrivalTime": "05:00 am",
              "departureTime": "05:00 am"
            },
            {
              "station": "PULINCHODE SIGNAL JUNCTION",
              "arrivalTime": "05:07 am",
              "departureTime": "05:07 am"
            },
            {
              "station": "COMPANYPADI",
              "arrivalTime": "05:15 am",
              "departureTime": "05:15 am"
            },
            {
              "station": "MUTTOM",
              "arrivalTime": "05:18 am",
              "departureTime": "05:18 am"
            }
          ]
        },
        {
          "trip": 2,
          "stations": [
            {
              "station": "MUTTOM",
              "arrivalTime": "05:19 am",
              "departureTime": "05:19 am"
            },
            {
              "station": "COMPANYPADI",
              "arrivalTime": "05:20 am",
              "departureTime": "05:20 am"
            },
            {
              "station": "PULINCHODE SIGNAL JUNCTION",
              "arrivalTime": "05:30 am",
              "departureTime": "05:30 am"
            },
            {
              "station": "ALUVA BANK JUNCTION",
              "arrivalTime": "05:55 am",
              "departureTime": "05:55 am"
            }
          ]
        }
      ]
    }

Disclaimer

This dataset is compiled from publicly available information and may not encompass the complete Kerala private bus schedule or accurate data(parsing errors). Users are advised to verify information from official sources or contact relevant authorities for the most accurate and up-to-date schedule details.

Conversion Process

Preprocessing:
- Adobe's PDF-to-Word online tool to convert the PDF to DOCX format and then back to PDF due to parsing errors in the original PDF.
Table Extraction:
- Employed the PDF Table Extractor tool to extract tabular data from the PDF files.
PDF Content to XML:
- Used the pdftohtml command-line tool with options -c -i -hidden -xml to convert the PDF content to XML format.Reference

Preprocessing Script:

To remove unnecessary fields or data inconsistencies from the extracted table data.

 import json
 file_path = 'input.json'  
 with open(file_path, 'r') as file:
     json_data = json.load(file)
 filtered_tables = []
 for entry in json_data['pageTables']:
     if 'tables' in entry:
         entry.pop('merges', None)
         entry.pop('merge_alias', None)
         entry.pop('width', None)
         filtered_tables.append(entry)

 json_data['pageTables'] = filtered_tables

 with open(file_path, 'w') as file:
     json.dump(json_data, file, indent=2)

 with open('input.json', 'w') as file:
     json.dump(json_data, file, indent=2)

Vehicle Information Processing :

Combine and organize vehicle numbers into the JSON format

 import json
 import xml.etree.ElementTree as ET

 file_path_json = 'input.json'
 with open(file_path_json, 'r') as file:
     json_data = json.load(file)

 file_path_xml = 'input.xml'
 tree = ET.parse(file_path_xml)
 root = tree.getroot()

 def extract_details(page_number):
     vehicle_number = ""
     
     for page in root.findall(".//page[@number='{}']".format(page_number)):
         for text in page.findall("./text/b"):
             text_content = text.text.strip() if text is not None and text.text is not None else ""
             if text_content.startswith("Vehicle Number"):
                 vehicle_number = text_content.split(":")[1].strip() if ":" in text_content else ""
         
     return vehicle_number


 # incase of vehicle number and  table are in different pages
 for entry in json_data['pageTables']:
     page_num = entry['page']
     vehicle_num = extract_details(page_num)
     prev_vehicle_num = extract_details(page_num-1)
     if vehicle_num=="" :
             if not prev_vehicle_num=="":
                 vehicle_num = prev_vehicle_num
         
         
     entry['vehicle_number'] = vehicle_num

 with open('input.json', 'w') as file:
     json.dump(json_data, file, indent=2)

Conversion to Formatted JSON :

 import json
 from datetime import datetime

 with open('input.json', 'r') as file:
     data = json.load(file)
 bus_schedule_data = []

 for page_table in data['pageTables']:
     header_row = page_table['tables'][0]
     print(page_table['page'])

     stations = [header_row[i] for i in range(len(header_row)) if i % 3 == 2]

     num_trips = page_table['height'] - 2  
     bus_schedules = []
     if page_table['height'] == 1:
         exit()


     route_name = stations  

     for trip_num in range(1, num_trips + 1):
         trip_info = {"trip": trip_num, "stations": []}
         trip_stations = []

         for i, station in enumerate(stations):
             departure_index = 3 + i * 3
             arrival_index = 2 + i * 3

             departure_time = page_table['tables'][trip_num + 1][departure_index].strip()
             arrival_time = page_table['tables'][trip_num + 1][arrival_index].strip()

             if departure_time or arrival_time:
                 if not departure_time:
                     departure_time = arrival_time
                 elif not arrival_time:
                     arrival_time = departure_time
                 station_info = {
                     "station": station,
                     "arrivalTime": arrival_time,
                     "departureTime": departure_time
                 }
                 trip_info["stations"].append(station_info)

         if trip_info["stations"]:
             bus_schedules.append(trip_info)

     for trip_info in bus_schedules:
         trip_info["stations"] = sorted(trip_info["stations"], key=lambda x: datetime.strptime(x["departureTime"], '%I:%M %p'))

     bus_schedule_data.append({
         "Vehicle Number": page_table['vehicle_number'],
         "route": route_name,
         "schedule": bus_schedules
     })

 output_data = {"busSchedules": bus_schedule_data}

 with open('output.json', 'w') as output_file:
     json.dump(output_data, output_file, indent=2)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
PDF		PDF
data conversion scripts		data conversion scripts
README.md		README.md
alappuzha.json		alappuzha.json
attingal.json		attingal.json
ernakulam.json		ernakulam.json
idukki.json		idukki.json
kannur.json		kannur.json
kottayam.json		kottayam.json
kozhikkode.json		kozhikkode.json
malappuram.json		malappuram.json
muvattupuzha.json		muvattupuzha.json
palakkad-1.json		palakkad-1.json
palakkad-2.json		palakkad-2.json
pathanamthitta.json		pathanamthitta.json
vadakara.json		vadakara.json
wayanad.json		wayanad.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kerala Private Bus Schedule

Overview

Structure

Disclaimer

Conversion Process

About

Releases

Packages

Contributors 3

Languages

amith-vp/Kerala-Private-Bus-Timing

Folders and files

Latest commit

History

Repository files navigation

Kerala Private Bus Schedule

Overview

Structure

Disclaimer

Conversion Process

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages