Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ISO 3166-2 country codes for creator directories #83

Merged
merged 1 commit into from
Mar 1, 2018
Merged

Introduce ISO 3166-2 country codes for creator directories #83

merged 1 commit into from
Mar 1, 2018

Conversation

pantierra
Copy link
Contributor

The more creator implementations we are getting, the more it will be needed to have a bit more consistent naming of those. Therefore I would like to introduce a ISO 3166-2 country code for each creator and focus - if possible - on the city name rather than the network or transport agency.

What do you think?

@AltNico
Copy link
Collaborator

AltNico commented Oct 25, 2017 via email

@pantierra
Copy link
Contributor Author

Yes, I also had in mind your thought, @AltNico. We could do the following:

Currently:

Everybody names the directory of the custom city implementation as they want. Some use the city name, like accra, some the network, like incofer, and some the operator, like fenix.

This Pull Request

Introduces a more standardized way: COUNTRYCODE-CITYNAME

However this is not enough, as @AltNico was describing, and also we have the first edge case here: cr-incofer, while "incofer" is the network, on purpose I didn't rename it to "San Jose" or anything, so to be not mixed up with bus networks.

Idea

I'd be happy to extend this Pull Request to handle multiple hyphens. In this case we can encourage to use: COUNTRYCODE-CITYNAME-NETWORK

This would bring us to:

  • br-florianopolis-sim (class name: BrFlorianopolisSim)
  • cr-valle-central-incofer (class name: CrValleCentralIncofer)
  • ni-managua-irtramma (class name: NiManaguaIrtramma)

The name will become longer but the naming more consistent and it perfectly handles the edge cases. What do you think about this one?

@Skippern
Copy link

Skippern commented Oct 25, 2017 via email

@grote
Copy link
Owner

grote commented Nov 5, 2017

I am not actually sure we need special conventions for creator naming at this point.

@pantierra
Copy link
Contributor Author

A separation by country code would be useful. If not this can quickly become long list of confusing names. It's not a big thing, but a nice improvement. I don't understand why this shouldn't go in...?

@grote
Copy link
Owner

grote commented Nov 29, 2017

I don't say it can't go in. I just think that it is still not a long list of confusing names. Can't we cross that bridge when we get there?

@pantierra
Copy link
Contributor Author

Sure, this is not urgent now. I'm happy to keep it in the pipeline until there are more creators.

As a constraint: it will be more difficult to change the names of the existing directories, the more creators and maintainers are involved. A good and solid practice can save us time later on.

@grote
Copy link
Owner

grote commented Nov 29, 2017

Indeed, but do we have a consensus on how to standardize the creator names?

Once we have one, I would like to see a test that checks that all existing creators adhere to that standard, so it gets automatically checked when new creators are added.

Also, we have now creator specific tests in master. These names should be standardized as well then.

@pantierra
Copy link
Contributor Author

pantierra commented Nov 29, 2017

Yes, better to come to an agreement before:

  • I would suggest: <ISO 3166-2 two letter country code>-<city/state/region>-<network/operator name> (requiring all three parts), like ni-managua-irtramma
  • All with small letters; only ASCII letters and numbers

@grote
Copy link
Owner

grote commented Nov 29, 2017

@nlehuby @jamescr what do you think about the name standardization @xamanu is proposing above?

I had some reservations about adding the <network/operator name>, because I am planning to merge GTFSs for different operators eventually, but this is probably a post-processing to be done outside the scope of osm2gtfs.

@ialokim
Copy link
Contributor

ialokim commented Nov 29, 2017

I agree that it is necessary to find a naming convention. Also, I think it's an important possibility to add the <network/operator name> to distinguish between different operators, but what about leaving this part optional, because in some cases there's no official network or one simple operator name.

In Estelí, the urban buses are served by different operators, but are coordinated by the local government. What would you propose as network in this case? ni-esteli-alcaldia? 🤔

I'm also not sure about the <city/state/region> part, because we are thinking of adding a GTFS configuration file for the national bus services.

@pantierra
Copy link
Contributor Author

  • Yes, I agree, maybe it would be good to leave the network/operator optional: <ISO 3166-2 two letter country code>-<city/state/region>[-<network/operator name>]
  • If you wanted to unite all of Nicargua into one I would name it ni-national-mit (mit = Ministerio de Transporte)

@AltNico
Copy link
Collaborator

AltNico commented Nov 30, 2017

Yes, I agree, maybe it would be good to leave the network/operator optional: <ISO 3166-2 two letter country code>-<city/state/region>[-<network/operator name>]

👍 for this because I think it is very important to at least the nation code part of the creators name. We should decide this yet, with only few creators, because later it will be harder to do.

@nlehuby
Copy link
Collaborator

nlehuby commented Nov 30, 2017

Hi,

<ISO 3166-2 two letter country code>-<city/state/region>[-<network/operator name>] seems good, but may not cover all cases.
For instance, I cannot choose a country code for these cases :

  • international coaches (if I want to create a Flixbus GTFS from OSM)
  • cross border lines (for instance in Euskadi)

but I guess eu-Europe-Flixbus and eu-Euskadi-Pesa would be ok.

We should not overthink this as we don't know yet how OSM2GTFS will be used...so I'm ok with this convention.
For Accra, it would be gh-accra, because there is no branded network, and many differents operators.

@pantierra
Copy link
Contributor Author

Seems like we have a consensus on the creator name standardization.

@pantierra
Copy link
Contributor Author

pantierra commented Dec 6, 2017

I'm happy to write the requested extension to this PR and the tests, but in order to not go too much in parallel things that are becoming a beast to rebase on each other, I'd prefer to see first those PRs getting in:

Thanks.

@jamescr
Copy link
Collaborator

jamescr commented Dec 20, 2017

As @grote I'm planning to incorporate bus routes to the generated GTFS. In this case the name for the creator will be cr-gam, GAM stands for Large Metropolitan Area (Gran Area Metropolitana).

@pantierra
Copy link
Contributor Author

Rerolled, now based on PR #99. This now includes also proper tests for naming of files and directories for additional creators.

@pantierra
Copy link
Contributor Author

Rebased on latest upstream master branch (which included #99 and other nice stuff)

@pantierra
Copy link
Contributor Author

pantierra commented Feb 2, 2018

This PR is months old and has been rebased several times. I included all remarks (on getting a consensus and including tests for this).

@grote, please let me know when you are considering to merge this. Then I will rebase again. Keeping it up to date without a perspective is a Sisyphus job, especially because this PR consists in some renaming.

@grote
Copy link
Owner

grote commented Feb 26, 2018

Sorry @xamanu, this branch has conflicts again :/

@pantierra
Copy link
Contributor Author

Yes, it has. As said in the last comment I will only prepare it again, when it has a clear perspective of being merged. Please confirm, and I'm happy to rebase it to the lastest master.

@grote
Copy link
Owner

grote commented Feb 26, 2018

Maybe then it is better to get some reviews from the others first since it touches all creators?

@pantierra
Copy link
Contributor Author

@nlehuby, @jamescr and @ialokim may I ask you for a review from your side, please. Thanks!

class_name_selector += part.capitalize()
else:
class_name_selector = selector.capitalize()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid to duplicate the class_name_selector creation code in each function ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 That would be much nicer indeed!

#### Description

There are 3 unittests for the Accra GTFS generation
There are 3 unit tests for the Accra GTFS generation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this is not an unit test, but an acceptance one.

The generated GTFS is checked against a reference GTFS file (accra_tests.zip.ref)
in the `tests/fixtures/accra/` folder. For the moment, only the size of each GTFS file is compared to the reference.
The generated GTFS is checked against a reference GTFS file (gh_accra_tests.zip.ref)
in the `tests/creators/fixtures_gh_accra/` folder. For the moment, only the size of each GTFS file is compared to the reference.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be tests/creators/fixtures/gh_accra/ and not tests/creators/fixtures_gh_accra/

Copy link
Contributor

@ialokim ialokim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xamanu, good work!

Please see my inline comments!

class_name_selector += part.capitalize()
else:
class_name_selector = selector.capitalize()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 That would be much nicer indeed!

"output_file": "data/ni-esteli.zip",
"selector": "esteli"
"output_file": "data/ni-esteli-gtfs.zip",
"selector": "ni_esteli"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you go for _ in the selector's case? I think it would be easier and more consistent to use - here (as in the output file, for example) too.

Copy link
Contributor Author

@pantierra pantierra Feb 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selector is meant to correspond to the file and directory name of the creator. In Python module names underscores are allowed, according to PEP8, but hyphens aren't. So, to me it seems much more consistent to use the underscore, according to the creators files and directory. Anyway the output file can be specified as wished, there is no rule here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand, thanks for your explanation!

' " doesn\'t fit the naming convention.')

# Check for existing configuration file
config_file_path = path + selector + "/" + selector + ".json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has it to be ni_esteli/ni_esteli.json for the config file? I think it would be a lot more simple to change this naming convention to only config.json in all the creators subfolders, so you don't have to type the (potentially long) selector name twice while running osm2gtfs.

I see this would be another change added to this PR, but as you're already renaming the json files, I think it would be the right place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with you. It would be nicer, but I wanted to keep this PR smallest possible. However as you are also suggesting it, I will apply it to the next rebase here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

path + selector):

for filename in inner_files:
# Check the pyhton files (and ignore the others)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*python

"This file is not well named: " + filename)

# Check for valid creators
split_filename = filename.split("_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for feed_info (in creators list above).

Copy link
Contributor Author

@pantierra pantierra Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing! I fixed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a bit ugly, but I'm okay with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to rename feed_info (as in transitfeed) to feedinfo, to avoid the different meaning of the underscore. However I suggest to discuss and apply this in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay for me! 👍

check = False
else:
return False
return check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, +1 for adding this test!

Correct me if I'm wrong but this only tests the filenames and not the class names as required by the CreatorsFactory. Not sure how difficult it would be to implement this tests too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the desired test.

@pantierra
Copy link
Contributor Author

Thank you very much @nlehuby and @ialokim for your reviews. I will incorporate your suggestions soon.

@pantierra
Copy link
Contributor Author

pantierra commented Feb 28, 2018

Here is a new version of this PR for your consideration. It includes the improvements based on the feedback by the reviews of @nlehuby and @ialokim, it is rebased on the latest upstream master and test checks passed successfully.

By the way, this PR is introducing the first tests for parts of osm2gtfs' core \o/

Copy link
Contributor

@ialokim ialokim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I've just one question.


# Snake to camel case to test for correct class names
classname = ''.join(x.capitalize() or
'_' for x in filename[:-3].split('_'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is because of my lack of understanding, but why exactly did you use this or '_' part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took it from this source, but it works also with out the or '_' part. So, I took it out.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't copy code whose copyright isn't yours into the project.

Copy link
Contributor Author

@pantierra pantierra Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder! You are very right!! I was wrongly assuming a public domain permission, which is not the case (only Creative Commons licensed).

Anyway, with the latest changes it is not coming from any particular source, but rather a common knowledge based on a web research and my own knowledge, the used code passage as it is - adapted to our code - can be found in variations in numerous public places and doesn't belong to one creator, but explicitly here myself.

@pantierra
Copy link
Contributor Author

pantierra commented Feb 28, 2018

Thank you for your review @grote and @ialokim. This PR seems to be quite ready to be merged in.

@jamescr: This PR renames the creator incofer to cr_gam, as you mentioned here earlier. Can you please confirm, that you are fine with this renaming?
@nlehuby: Can you have a look again and confirm the PR after applying the suggestions?

Thanks to everybody! 😄

Copy link
Contributor

@ialokim ialokim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, I'm absolutely fine with this now!

@pantierra pantierra mentioned this pull request Feb 28, 2018
Copy link
Collaborator

@nlehuby nlehuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@grote grote merged commit b4b8446 into grote:master Mar 1, 2018
@pantierra pantierra deleted the creator-hyphens branch April 3, 2018 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants