Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nav 47 refactor raptor algorithm #68

Merged
merged 41 commits into from
Jun 24, 2024
Merged

Conversation

munterfi
Copy link
Member

Hi @clukas1 and @Brunner246,

A PR for the refactoring of the Raptor algorithm we started together.

Summary:

  • Rename Raptor components to be more consistent.
  • Factor out classes according to SRP:
    • Route scanner
    • Footpath relaxer
    • Label postprocessor
    • Objective class to hold the progress of the Raptor
  • Extract interfaces for the Raptor algorithm
  • Remove redundant Same Stop Transfer Generator
  • Bugfix, where departure time was doubled when starting with location

This PR goes across the project, so avoid starting new branches at the moment. Let's finish the refactoring first. I think it would make sense to discuss the changes in a short meeting.

Brunner246 and others added 22 commits June 21, 2024 12:05
- Legs --> Labels
- Departure / Arrival --> Source / Target
…on is only favored because same stop transfer time is subtracted.

Additionally add a method that reduces travel time be departing later (DEPARTURE) or arriving earlier (ARRIVAL) to reduce travel time by combining the last labels (when possible)
@munterfi munterfi requested review from clukas1 and Brunner246 June 22, 2024 14:25
@munterfi munterfi self-assigned this Jun 22, 2024
@munterfi
Copy link
Member Author

Eventually we could move the marked stops also into the Objective class?

Copy link
Member

@clukas1 clukas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work, now I can follow my own code better :)

Although, I don't see any deal breaking issues with this PR, which prevent approval. Some comments to discuss.

  • Connection - Comparable. I would not implement comparable for connections, I believe comparing them by arrival time does not make any sense, or at least comparisons by departure time, travel time or number of transfers are just as likely.
  • Raptor / Objective: I like having this objective container for handling query specific attributes. And I agree that the markedStops should be managed by the spawnFromStop method. However, maybe we should rename the Raptor class to Router or RaptorRouter (which implements the RaptorAlgorithm) and the Objective to Request (because in my opinion it's not only a objective container). The idea behind is, that we can accept requests via the interface through the raptor algorithm and keep all constant raptor relevant data in this class, however move all of the request processing to the request class instance (which will only keep data relevant to the request), this could reduce the argument passing around even more. As a result, spawnFromStops would also be moved to Request.
  • Impl: I now know why I like having interface names starting with i (or some other convention). Can we rename the raptor Interfaces to RaptorLeg / RaptorConnection and the implementations to Leg/Connection?


public interface RaptorAlgorithm {

static RaptorBuilder builder(int sameStopTransferTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really work looking forward. I think the raptor builder is too much implementation specific that we can keep it in the interface. Anyway at the moment the Raptor Builder is inside the Impl package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I think we will have to implement a factory for the Raptor.

@munterfi
Copy link
Member Author

munterfi commented Jun 24, 2024

Very nice work, now I can follow my own code better :)

@clukas1, thanks for the review and feedback!

Although, I don't see any deal breaking issues with this PR, which prevent approval. Some comments to discuss.

  • Connection - Comparable. I would not implement comparable for connections, I believe comparing them by arrival time does not make any sense, or at least comparisons by departure time, travel time or number of transfers are just as likely.

My rationale behind using Comparable is to enforce a consistent order, rather than relying on potentially non-deterministic or algorithm-specific internals. However, I agree that comparing by arrival time might not be the most logical choice. We could consider alternatives like comparing by departure time, travel time, or number of transfers. But I think enforcing comparable in the interface is good, to ensure those toughts are made.

  • Raptor / Objective: I like having this objective container for handling query specific attributes. And I agree that the markedStops should be managed by the spawnFromStop method. However, maybe we should rename the Raptor class to Router or RaptorRouter (which implements the RaptorAlgorithm) and the Objective to Request (because in my opinion it's not only a objective container). The idea behind is, that we can accept requests via the interface through the raptor algorithm and keep all constant raptor relevant data in this class, however move all of the request processing to the request class instance (which will only keep data relevant to the request), this could reduce the argument passing around even more. As a result, spawnFromStops would also be moved to Request.

Agree to renaming Raptor to RaptorRouter, see 91b0772.

However I vote against renaming Objective to Request since the Objective is something internal, which is not even used outside of the spawnFromStops method. In naming it, I have borrowed this term from the field of algorithmics. Because the objective (function) should be minimized. In our case, the arrival time and the number of transfers. I think this fits the terminology quite well, but perhaps there is a better term? From my point of view, a Request is something that is sent to the RaptorRouter from outside and answered with a Response object.

I may not fully understand the suggestion to move the spawnFromStops method. But I like the separation of the main loop in the method spawnFromStops and the objective which is optimized. So the main logic of the process is still in the RaptorRaptor router. This allows us to separate the control flow cleanly from the complex details, which are then handled by the RouteScanner, FootpathRelaxer and the Objective. To implement the range query, we can just reuse the same Objective and spawn from the stops again at the next later departures at the given stops.

  • Impl: I now know why I like having interface names starting with i (or some other convention). Can we rename the raptor Interfaces to RaptorLeg / RaptorConnection and the implementations to Leg/Connection?

Can you further elaborate on this? LegImpl should never be visible outside of the Raptor package. So only ch.naviqore.raptor.Leg should be visible from the outside, which is quite clear that it belongs to the Raptor. The user does not need to know if it is an interface or a concrete class. If I use a class with Impl postfix outside the package, then it triggers a weird feeling. That's why I like it somehow 😄

But you are also right about the Impl postfix (removed in 5dccb64), see this artice: https://www.baeldung.com/java-interface-naming-conventions

Another widespread pattern in enterprise Java applications comes from the Hungarian Notation. In this section, we’ll briefly talk about the preceding I and the Impl suffix patterns and why we should avoid them.

The preceding I pattern suggests that any interface name starts with the capital letter I, an abbreviation for Interface. This is more common in C# code, where we don’t have a keyword to differentiate interface implementation from inheritance. However, this wouldn’t be necessary in Java since we can differentiate implementation from inheritance by looking at the keywords implements and extends.

The Impl suffix pattern suggests naming the interface’s implementations instead of the interface itself. Hence, all implementation names end with Impl. This usually appears when we create an interface with a single implementation and we can’t find a name that truly represents its specialization. However, the word Impl doesn’t add anything since the class signature shows it’s implementing something.

Therefore, we must avoid naming interfaces and classes using the I or Impl patterns, for example, UserImpl, IUser, IIdentifiable, and IdentifiableImpl.

We should discuss these points in more detail at our meeting on Wednesday.

@clukas1
Copy link
Member

clukas1 commented Jun 24, 2024

My rationale behind using Comparable is to enforce a consistent order, rather than relying on potentially non-deterministic or algorithm-specific internals. However, I agree that comparing by arrival time might not be the most logical choice. We could consider alternatives like comparing by departure time, travel time, or number of transfers. But I think enforcing comparable in the interface is good, to ensure those toughts are made.

I would still prefer not to implement it, the problem is that by implementing it, in my opinion it has to be 100% clear why and how it is intended or at least easy to explain. In this case here, I think there are many logical ways to implement it and none over weighs the other, hence I think it's cleaner not to implement it to make it explicit that one has to define how one wants to sort / compare the connections.

Agree to renaming Raptor to RaptorRouter, see 91b0772.

Thanks

However I vote against renaming Objective to Request since the Objective is something internal, which is not even used outside of the spawnFromStops method. In naming it, I have borrowed this term from the field of algorithmics. Because the objective (function) should be minimized. In our case, the arrival time and the number of transfers. I think this fits the terminology quite well, but perhaps there is a better term? From my point of view, a Request is something that is sent to the RaptorRouter from outside and answered with a Response object.

I may not fully understand the suggestion to move the spawnFromStops method. But I like the separation of the main loop in the method spawnFromStops and the objective which is optimized. So the main logic of the process is still in the RaptorRaptor router. This allows us to separate the control flow cleanly from the complex details, which are then handled by the RouteScanner, FootpathRelaxer and the Objective. To implement the range query, we can just reuse the same Objective and spawn from the stops again at the next later departures at the given stops.

I get why you want to name it Objective, however I disagree since the objective is implemented in the algorithm (minimize arrivalTime/maximize departureTime) and can not really be changed by replacing the Objective, except for the TimeType (which by this logic would have to carry the name objective). Therefore, I would name the object containing all the route request specific details something like Request or Query since it will hold all details relevant to the specific request and this is also the reason why I would move spawnFromStops to this object. I regard the RaptorRouter something like a service, that holds all schedule relevant data to query and accept requests. The processing of the request can then be done by a temporary processing instance (Query, Request, or at the moment Objective).

Can you further elaborate on this? LegImpl should never be visible outside of the Raptor package. So only ch.naviqore.raptor.Leg should be visible from the outside, which is quite clear that it belongs to the Raptor. The user does not need to know if it is an interface or a concrete class. If I use a class with Impl postfix outside the package, then it triggers a weird feeling. That's why I like it somehow 😄

I agree, that I wouldn't want to see a Impl postfix outside of the package. But I also get a cold shiver down my back if I see Impl inside the package, so thanks for renaming. The reason I would also like to name the Interfaces RaptorLeg/RaptorConnection instead of Leg/Connection is that we would prevent working with two implementations of Leg/Connection on the service level, resulting in Type Annotations specifying the full definition (e.g. ch.naviqore.raptor.Leg instead of simply RaptorLeg. Java should introduce alias imports, that would also resolve this issue...

@@ -29,7 +29,7 @@ class Query {
private final TimeType timeType;

private final int[] targetStops;
private final int cutOffTime;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, to be picky. but cut off is not one word 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again, my bad. Turns out American English uses cutoff, just feels wrong (and I prefer British English).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define what dialect to use and then apply this consistently throughout our codebase.

Based on a short research (https://stackoverflow.com/questions/157807/gb-english-or-us-english), American English seems to be the preferred choice in programming.

There are also some British English that are kind of unusual for me:

American English British English
Color Colour
Initialize Initialise
Optimize Optimise
Center Centre
License Licence
Traveler Traveller
Program Programme

@@ -29,7 +29,7 @@ class Query {
private final TimeType timeType;

private final int[] targetStops;
private final int cutOffTime;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again, my bad. Turns out American English uses cutoff, just feels wrong (and I prefer British English).

@@ -131,7 +131,7 @@ private List<Query.Label[]> spawnFromStops(int[] sourceStopIndices, int[] target
// initially relax all source stops and add the newly improved stops by relaxation to the marked stops
Set<Integer> markedStops = query.initialize();
markedStops.addAll(footpathRelaxer.relaxInitial(sourceStopIndices));
markedStops = query.removeSubOptimalLabelsForRound(0, markedStops);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although not orthographically correct, camel casing in our company treats prefixes (sub, super, pre, post, etc.) as independent words. I generally prefer this because it improves readability but know that it's strictly speaking incorrect.

@@ -31,18 +31,18 @@ class FootpathRelaxer {

/**
* @param raptorRouter the current raptor instance for access to the data structures.
* @param objective the best time per stop and label per stop and round.
* @param query the best time per stop and label per stop and round.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The param description for query should be ~ object containing query configuration and intermediate working variables or something of this sort.

@@ -9,10 +9,10 @@
import java.util.*;

/**
* The objective stores the progress of the raptor algorithm. Each request needs a new objective instance.
* The query stores the progress of the raptor algorithm. Each request needs a new query instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query stores the configuration of the raptor routing request and holds working variables to store the progress of the raptor algorithm.

@@ -31,20 +31,20 @@ class RouteScanner {

/**
* @param raptorRouter the current raptor instance for access to the data structures.
* @param objective the best time per stop and label per stop and round.
* @param query the best time per stop and label per stop and round.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -15,7 +15,7 @@
@NoArgsConstructor(access = AccessLevel.PACKAGE)
@Getter
@ToString
class RaptorConnection implements Connection {
class RaptorConnection implements Connection, Comparable<Connection> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why implement the Comparable? I still don't see a reason to keep this.

Copy link
Member Author

@munterfi munterfi Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the problem that I try to explain, we don't use it, but we should.

We should sort all our returned connections in RaptorAlgorithm / Router, by calling sort before return in the LabelProcessor:

return connections.stream().sorted().toList();

Which will then implicitly call the compareTo required by the Comparable Interface.

If we do this now, most of our tests fail. This is not because the Raptor is not working, but simply due to sorting issues. The problem is that the way we implement the Raptor currently influences the order of the results, which is not very stable. Additionally, if we implement further versions (native or mcRaptor), this instability will cause further issues.

In my opinion, the necessity to sort connections is an essential property. Otherwise, the order is not transparent for the user.

- Query holds now the complete routing logic.
- Objective stores the labels and best times and serves as single point for their modification.
- Introduce raptor data interface.
@munterfi
Copy link
Member Author

munterfi commented Jun 24, 2024

@clukas1, thanks for the feedback! I tried to address most of it, feel free to continue. I will not work on the refactoring until our meeting on Wednesday to avoid conflicts.

I get why you want to name it Objective, however I disagree since the objective is implemented in the algorithm (minimize arrivalTime/maximize departureTime) and can not really be changed by replacing the Objective, except for the TimeType (which by this logic would have to carry the name objective). Therefore, I would name the object containing all the route request specific details something like Request or Query since it will hold all details relevant to the specific request and this is also the reason why I would move spawnFromStops to this object. I regard the RaptorRouter something like a service, that holds all schedule relevant data to query and accept requests. The processing of the request can then be done by a temporary processing instance (Query, Request, or at the moment Objective).

I did my best to move the main routing logic from the RaptorRouter to the Query object and encapsulate access to the Labels in the Objective object (you can rename this if you don't like the name, important for me is that the access to the labels and best times is encapsulated, which makes it a lot easier to track the routing algo in the debugger). Also introduced an internal RaptorData interface for access to the raptor data structures (interface segregation).

Additionally, I have discovered a strange issue in the getBestTime method:

// TODO: Strangely if this method is changed with the method above in 'getBestTimeForAllTargetStops' in the Query
, where I suspect that we are not updating the global best time when relaxing footpaths. To reproduce the issue, remove the _REMOVE prefix here:
int bestTimeForStop = objective.getBestTime_REMOVE(targetStopIdx);
and run all tests.

My interpretation is that we can remove this method and directly use getBestTime if we track the global best times correctly?

@clukas1
Copy link
Member

clukas1 commented Jun 24, 2024

@munterfi some final touches from my side. Good to merge in my opinion.

Summary of changes:

My interpretation is that we can remove this method and directly use getBestTime if we track the global best times correctly?

Actually both methods are needed (one of the reasons I introduced the transfer only test). The problem if you use the comparableBestTime as cut-off value all transfer legs will be removed at the end of the round. While writing this now, I just realized that the best way would have been to make add/subtract the same transfer time to route arrivals/departures instead of subtracting/adding them to transfers. Which would have solved this (improvement for a later issue)... However, I've renamed the methods and added a docstring to clarify when which one should be used in e2ce8f4.

This is exactly the problem that I try to explain, we don't use it, but we should.
We should sort all our returned connections in RaptorAlgorithm / Router, by calling sort before return in the LabelProcessor:
return connections.stream().sorted().toList();
Which will then implicitly call the compareTo required by the Comparable Interface.
If we do this now, most of our tests fail. This is not because the Raptor is not working, but simply due to sorting issues. The problem is that the way we implement the Raptor currently influences the order of the results, which is not very stable. Additionally, if we implement further versions (native or mcRaptor), this instability will cause further issues.
In my opinion, the necessity to sort connections is an essential property. Otherwise, the order is not transparent for the user.

Actually, they are sorted by default by the number of rounds (route trip legs) and arrival/departure time in descending/ascending order. I've reworded the docstring return value in the RaptorAlgorithm interface to be more explicit about this in c17df10

I did my best to move the main routing logic from the RaptorRouter to the Query object and encapsulate access to the Labels in the Objective object (you can rename this if you don't like the name, important for me is that the access to the labels and best times is encapsulated, which makes it a lot easier to track the routing algo in the debugger). Also introduced an internal RaptorData interface for access to the raptor data structures (interface segregation).

As you might have guessed, I disagreed with the Objective name. I've renamed it to StopLabelsAndTimes to be explicit about what it contains (44c6a2f). Hope you can also live with that name. Like the RaptorData to end with a positive comment.

@clukas1 clukas1 self-requested a review June 24, 2024 21:57
Copy link
Member

@clukas1 clukas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@munterfi cannot add you as reviewer since it's your PR. In my opinion this PR is good to merge. Please review my final touches and approve / merge or re-open the discussion.

@munterfi munterfi merged commit 5e4c58b into main Jun 24, 2024
2 checks passed
@munterfi munterfi deleted the NAV-47-Refactor-Raptor-Algorithm branch June 24, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants