Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ref duplicates #152

Open
nlehuby opened this issue Dec 10, 2019 · 5 comments
Open

Handle ref duplicates #152

nlehuby opened this issue Dec 10, 2019 · 5 comments

Comments

@nlehuby
Copy link
Collaborator

nlehuby commented Dec 10, 2019

Follow-up conversation from this PR.

For now, if we have multiple OSM route_masters with the same ref tag, we only keep the first one and discard the others.
But multiple lines with the same line number can be a thing. See examples in the PR in France and Ivory Coast.

OSM ref tag is used to set these values in routes.txt files:

  • route_short_name
  • route_id (in the default creator and in some others creators)

source: https://github.com/grote/osm2gtfs/wiki/Source-for-GTFS-values

GTFS spec does not forbid to have duplicates on route_short_name, but GTFS Best Practices discourages this practice.
But GTFS spec does forbid duplicates on route_id.

How should we handle that ?

For now, we discard the duplicates and only keep the first one. The result in case of ref duplicates is an incomplete GTFS.

We could also group all OSM lines with ref duplicates under the same GTFS route.
It can be a strong option, unless if

  • the lines have different route_long_name and actually represent a different public_transport line
  • the lines have different agency (example of bus lines Stigo 18 / Seine-et-Marne Express 18 that run in the same area)
  • the lines have different public transport mode (Paris have a Tram 2 and a Metro 2 operating in almost the same area)

We could keep them distinct and create a GTFS route for each one of them, but we then need to change the route_id construction process in the default creator. relation/{osm_id} or osm_id is a good option already used in some connectors.

What do you think ?

@prhod
Copy link
Collaborator

prhod commented Dec 10, 2019

IMHO, using ref as a route_id is easier to manipulate later the GTFS. The problem arise when several identical ref exists as @nlehuby said.
The different options I could came up with are :

  • suffixing the ref when several such lines exists : consistency of route_id cannot be guaranteed or at the coast of a tricky sort
  • using osm ID only for conflicting routes : same consistency problem when one of the route disappeared
  • have a parameter in constructor to specify the use of ref or relation_id accordingly to local constraints
  • deal with this problem at each constructor's choice

The 3rd option has my preference, and the 4th can be a fall back if necessary.
Hope this helps a little !

@pantierra
Copy link
Contributor

I think we do good to separate the conversation into:

  • Standard behaviour - what does osm2gtfs when it finds a duplicate for generating the GTFS route_id
  • Fallback solution(s) - how can local particularities being realized and made possible.

As the standard behaviour I like what you point our @nlehuby and group on the fly if they seem to be the same. This would be a much better behaviour. But still there are a lot of "unless if". And how we address them?

a. We append automatically the field that caused the difference (the agency, the type of transport, etc.) - this might dangerous and we don't have always a suitable value
b. We drop the second occurrence - as it is currently, which we didn't like.

In any case if there is any duplicate we probably want to keep the warning in place, so that people are informed to use one (of the) fallback solutions.

I'm still not so happy about neither a. nor b. Do you have any other better idea?


For fallbacks, which need manual interaction either with code or the configuration file, we have the two options (3 and 4 from @prhod) and we could implement both:

  1. Allow a user in a config file to specify the ref or relation_id field to be used for the route_id.
  2. Let the custom creators deal with any other particularity.

Personally, I thing we are ok, with giving this flexibility only to the creators (fallback option 4). This is because I dislike a bit to encourage anybody using the OSM relation_id for the GTFS route_id, only if people know really what they are doing - and then they can make their own creator 😄

@nlehuby
Copy link
Collaborator Author

nlehuby commented Dec 10, 2019

It is not the option that seems to win the vote but I prefer my last suggestion to use the relation id as route_id:

  • we already use osm id for the stops so it enhances the consistency
  • it is already unique so we don't have to worry about it
  • I find it helpful to debug
  • it's an elegant way to keep the relationship with OSM in the GTFS

So if I had to choose between @prhod options, I obviously choose the 3rd one.

For the standard behaviour, I think we need to append the osm line anyhow, because grouping is uncertain and discarding is too violent.
the first option of @prhod could actually do the trick: If you don't override with your own creator of with this hypothetical conf to say that you will use osm_id as route_id, the fist one will be "18", the next one could be "18_1", the next one could be "18_1_1".
That's not pretty, but that's simple enough and would work.

@pantierra
Copy link
Contributor

So, for the standard behaviour we can choose between the uncertain, the violent or the ugly? 😄 i follow you, @nlehuby to give preference to the ugly. Just wondering whether we want it to be a bit nicer:

  • append an increasing number: x_n. For example, 18_2, 18_3.
  • Give a warning to the user

For the fallback, the handling in creators (option 4) is there already by nature. And the question is whether we want to add a configuration option to exchange the ref for the relation's osm_id. I don't like it so much to make this a standard feature, but if you two think this is a good idea, then let's do it.

@nlehuby
Copy link
Collaborator Author

nlehuby commented Dec 12, 2019

To sum up, we have an agreement on the following:

  • in osm_connector, handle route_id duplicates by warning the user and appending something at the end
  • option: make it nicer by increasing some number in the appendix instead of foolishly appending
  • option: add a standard way to use osm id instead of ref to create route_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants