Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add/Remove Services and Routes in groups #112

Merged
merged 9 commits into from
Apr 6, 2022
Merged

Conversation

KasiaKoz
Copy link
Collaborator

@KasiaKoz KasiaKoz commented Apr 1, 2022

Before, each add/remove of a Service or Route object to/from Schedule was processed individually. When modifying PT Schedules on a larger scale this takes a long time to process. This change is not some huge smart re-write of existing methods, rather grabbing some low hanging fruit and wrapping it in a few loops that led to some major improvements. The biggest slow-down was (I think) due to updating the change log individually---these changes now take advantage of the big batch logging for changelog.

A test on a small test network (adding 10 services each with 3 routes inside)
Before:
Screenshot 2022-04-01 at 17 00 49
After:
Screenshot 2022-04-01 at 17 00 59
Removing services, Before:
Screenshot 2022-04-01 at 17 01 06
After:
Screenshot 2022-04-01 at 17 01 13

I attempted this to speed things up in NZ. Adding ~300 services used to take 40mins, now takes ~2mins 💪

@KasiaKoz KasiaKoz requested review from mfitz, ana-kop and Arupwkeu April 1, 2022 16:12
Copy link
Contributor

@mfitz mfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this performance win!

There are a couple of tests that need some minor tweaking (they currently cannot fail no matter what), so I'm not approving, but they will be quick fixes 👍

self._graph.graph['change_log'].remove(object_type='route', object_id=route_id, object_attributes=route_data)
if isinstance(route_ids, set):
route_ids = list(route_ids)
missing_ids = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks to be duplicated - if so, it could be refactored into a helper method


def test_adding_multiple_services_updates_changelog(schedule, services_to_add):
schedule.add_services(services_to_add)
list(schedule.change_log().iloc[-2:][['change_event', 'new_id']].to_records()) == [(0, 'add', 'new_service_0'), (1, 'add', 'new_service_1')]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to make this an assertion? If I change == to !=, I am pretty sure this test will still pass 😲

Also, it would be good to create the expectation directly from services_to_add if possible, rather than hardcoding it like this using implicit knowledge of that fixture.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♀️

@@ -851,6 +881,22 @@ def test_removing_service_updates_vehicles(schedule):
assert schedule.vehicles == {}


def test_multiple_services_are_no_longer_present_in_schedule_after_removing(schedule_graph):
s = Schedule(_graph=schedule_graph)
assert set(s.service_ids()) == {'service1', 'service2'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are relying on implicit knowledge of the content of the fixture, and it makes the test a little bit less clear than it could be. It feels nicer to me to query the fixture here for the service names we're going to pass to remove_services (and maybe assert that we don't have an empty list), rather than asserting that these magic values are in the schedule we built from the fixture.

Also, I don't think we really care what the route IDs are, so we don't need to pre-assert on those, and in doing so we have added another reason that this test could fail that isn't really the thing we want to test (it could fail if our assumption about the route IDs in the fixture was wrong, even if the code under test is correct).

def test_removing_multiple_services_updates_changelog(schedule_graph):
s = Schedule(_graph=schedule_graph)
s.remove_services(['service1', 'service2'])
list(s.change_log().iloc[-2:][['change_event', 'old_id']].to_records()) == [(0, 'remove', 'service1'), (1, 'remove', 'service2')]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send out the assertion search squad! 😄

I had a mess about with this test, or at least the other one like it, and when I turned this statement into an assertion it failed. I think it was to do with the types of the objects in the two lists we're comparing; one was holding tuples and the other was holding numpy record objects, so they looked the same, but failed equality comparison. I'm guessing there is a quick and easy "pandonic" way to make sure our expectation list has records, not tuples (or the actual list has tuples rather than records)?

assert set(s.route_ids()) == set()


def test_removing_multiple_services_updates_changelog(schedule_graph):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these test names 👍


def test_adding_multiple_routes_updates_changelog(schedule, services_to_add):
schedule.add_services(services_to_add)
list(schedule.change_log().iloc[-2:][['change_event', 'new_id']].to_records()) == [(0, 'add', 'route_to_add1'), (1, 'add', 'route_to_add2')]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another missing assert, I think?

@KasiaKoz KasiaKoz requested a review from mfitz April 5, 2022 10:06
Copy link
Contributor

@mfitz mfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smashing!

@@ -11,6 +12,24 @@ def ensure_dir(direc):
logging.warning(e)


def setify(value: Union[str, list, set]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't seem like persistence-related helper functions - maybe we should move them later on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I have a really hard time naming things 🥲 what name would you give them?

@@ -783,14 +783,21 @@ def services_to_add():


def test_multiple_services_are_present_in_schedule_after_adding(schedule, services_to_add):
existing_routes = set(schedule.route_ids())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is much nicer now 😄



def test_adding_multiple_services_updates_changelog(schedule, services_to_add):
schedule.add_services(services_to_add)
list(schedule.change_log().iloc[-2:][['change_event', 'new_id']].to_records()) == [(0, 'add', 'new_service_0'), (1, 'add', 'new_service_1')]
assert list(schedule.change_log().iloc[-len(services_to_add):][['change_event', 'new_id']].itertuples(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now with added asserts!


services_to_remove = list(s.service_ids())
s.remove_services(services_to_remove)
assert list(s.change_log().iloc[-len(services_to_remove):][['change_event', 'old_id']].itertuples(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

list(s.change_log().iloc[-2:][['change_event', 'old_id']].to_records()) == [(0, 'remove', '1'), (1, 'remove', '2')]
route_to_remove=['1', '2']
s.remove_routes(route_to_remove)
assert list(s.change_log().iloc[-len(route_to_remove):][['change_event', 'old_id']].itertuples(index=False,name=None)) == [('remove', r) for r in route_to_remove]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

assert persistence.setify('ABC') == {'ABC'}

def test_setifying_set_makes_no_difference():
assert persistence.setify({'ABC', 2}) == {'ABC', 2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect the original set, or a semantically equivalent copy of that set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it won't be the original set, but I only care about the contents 😄

@KasiaKoz KasiaKoz merged commit 3570bd8 into master Apr 6, 2022
@KasiaKoz KasiaKoz deleted the batch-route-service-mods branch April 6, 2022 12:45
@ana-kop
Copy link
Contributor

ana-kop commented Apr 6, 2022

40 minutes to 2 minutes is massive, well done @KasiaKoz !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants