Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add logging of timing for total time, each main section and for each tile generation step #225

Merged
merged 17 commits into from
Nov 9, 2023

Conversation

alfh
Copy link
Contributor

@alfh alfh commented Nov 4, 2023

This PR…

Add logging of duration of various steps of the process, to be able to see where the focus for improvements should be, and to be able to detect any changes in performance from version to version, or from machine to machine, or from country to country.

Considerations and implementations

There are a couple of commits at the start of the history which are really general small fixups, I could have created a separate PR for them.

Ended up adding a Timings class for handling the actual duration and generating the text for logging, or doing the actual logging, based on PR comments from @treee111 .

I have just added simple logging, and have not looked for any general "metrics" python library.

Only wall time is logged.
Originally I also included CPU time in most places, but I think the CPU time it less interesting, since it is mainly other processes that are being spawned.
I have not added an CLI option to choose whether you want the logging or not, I assume it is quite interesting for most people.

I have added debug level logging for each tile, to get the duration taken for each tile.

Originally logged the time with 5 decimals, but reduced it to 3, even though that means some steps are shown as taking 0.000 seconds.

I have made one final restructuring of the commits, have kept the main parts of the commit from @treee111 for the PR review, a couple of minor suggestions there I splitted out, and combined with my commits, to have a cleaner commit history, with fewer "correction" commits.

How to test

  1. Run the CLI to generate a tile or a country, and observe the logging of duration. e.g. python -m wahoomc cli -co malta
  2. Run the CLI to generate a tile or a country, and observe the logging of duration. e.g. python -m wahoomc cli -co malta -v
  3. Run the unit tests with "python -m unittest -v" to see the logging.

Pull Request Checklist

@alfh alfh force-pushed the add-timins-logging branch from 10e12fc to 0562cc8 Compare November 4, 2023 18:07
@alfh
Copy link
Contributor Author

alfh commented Nov 4, 2023

I see that pylint now complains about too many local variables in merge_splitted_tiles_with_land_and_sea.

Unsure if I should just remove the logging there for now, or if the method should be splitted somehow, but that would be a general refactoring, and not as part of the adding of timings, I think.

Copy link
Owner

@treee111 treee111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this big PR!

I added some inline comments, in general I have these comments:

  1. A lot of copy and paste code - we should think of how to avoid that for changes concerning timing later on which should be reflected at all places. In the in-line comment I talked about a class to handle that.
  2. I tried to have a consistent logging for the tiles by outputting the log_tile before doing something to know where it crashed but only one line per "tile" or "tile/country" combination. For easier reporting and understanding logs. The green marked logs are OK for me, there you only added the timing. Lets think of how we can change the second line so that it is better readable and we only have one logging line per "tile" or "tile/country" combination. For all timing-logs maybe use a central logging in the class - see 1. ?
grafik

instead of having:

INFO:# Creating .map files for tiles
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 1 of 2) Coordinates: 137,100 / took 1.80000
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ (tile 2 of 2) Coordinates: 138,100 / took 3.23181
INFO:+ Creating .map files for tiles: OK, took 0.04103, 5.03245

that might suit me better:

INFO:# Creating .map files for tiles
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+  took 1.80000 sec.
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+  took 3.23181 sec.
INFO:+ Creating .map files for tiles: OK, took 0.04103, 5.03245
  1. Sometime you output one, sometimes two values. Could we add before/after the value what it represents, i.e. processing secs and CPU secs or other abbreviations more suitable to you?

  2. rebasing onto develop should make all unittests run green.

as said already, thanks for the effort 👍 - I am looking forward to get this further :-)
If you need some ideas concerning my comments, I could bring up my idea also "in code"!

wahoomc/downloader.py Outdated Show resolved Hide resolved
wahoomc/downloader.py Outdated Show resolved Hide resolved
wahoomc/main.py Show resolved Hide resolved
wahoomc/osm_maps_functions.py Outdated Show resolved Hide resolved
wahoomc/main.py Outdated Show resolved Hide resolved
wahoomc/main.py Outdated Show resolved Hide resolved
@alfh
Copy link
Contributor Author

alfh commented Nov 5, 2023

My considerations on the logging style

INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+  took 1.80000 sec.

vs

INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 1 of 2) Coordinates: 137,100 / took 1.80000

is that I wanted to be able to do a "grep" on the log output, and quickly find all tiles that were slow / fast / something else.

But it is also important to log the tile when starting to process it, in case there is some deadlock or something.

I was now considering a "log_tile_start" method, which does not log with a newline, and then a "log_tile_end" which then logs the "took .." section.
But it seems Python has no nice way of achieving that with the log framework.

I also saw that there are decorator pattern, which could possible be used for both the logging of the main tile info, and the duration at the end.

@alfh alfh force-pushed the add-timins-logging branch 2 times, most recently from 0562cc8 to 4b94c2b Compare November 5, 2023 09:42
@alfh
Copy link
Contributor Author

alfh commented Nov 5, 2023

I force pushed some updated, before I noticed that you had commented the PR, sorry about that.

Thanks for the feedback, I'll be looking into these wise comments now.

@alfh
Copy link
Contributor Author

alfh commented Nov 5, 2023

Maybe the tile duration could be logged as debug instead of info log level?
So only people really into the details would see it.

Not quite sure who such differences are handled in Python.

treee111 added a commit to alfh/wahooMapsCreator that referenced this pull request Nov 5, 2023
- like you wrote by yourself :-) treee111#225 (comment)
- plus removing all the calls of start_wall_time throughout the files
- log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion
- plus painting a "line" before the logging to be cosistent to the other parts of the log
treee111 added a commit to alfh/wahooMapsCreator that referenced this pull request Nov 5, 2023
- like you wrote by yourself :-) treee111#225 (comment)
- plus removing all the calls of start_wall_time throughout the files
- log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion
- plus painting a "line" before the logging to be cosistent to the other parts of the log
@treee111
Copy link
Owner

treee111 commented Nov 5, 2023

thanks for thinking about outputting the per tile details and giving your thought behind the logging :-)

My considerations on the logging style

INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+  took 1.80000 sec.

vs

INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 1 of 2) Coordinates: 137,100 / took 1.80000

is that I wanted to be able to do a "grep" on the log output, and quickly find all tiles that were slow / fast / something else.

But it is also important to log the tile when starting to process it, in case there is some deadlock or something.

I was now considering a "log_tile_start" method, which does not log with a newline, and then a "log_tile_end" which then logs the "took .." section. But it seems Python has no nice way of achieving that with the log framework.

Yesterday that was also my first thought. Adding "took .." or somehow replacing the whole line with the addition "took .." but that is not possible without hacks as you found out as well.

I also saw that there are decorator pattern, which could possible be used for both the logging of the main tile info, and the duration at the end.

I saw decorators also in another project and thought that they could be used for the timing stuff. Because I have never used them I'm not really sure how that would go and look...

Maybe the tile duration could be logged as debug instead of info log level? So only people really into the details would see it.

Not quite sure who such differences are handled in Python.

Very nice idea - i like! 🥇 That would mean, that all other log.debug messages would be outputted as well, but I think for your grep that would not matter.
Outputting as debug is easy and done in the repo as well. output it with log.debug and call wahooMapsCreator with -v.

Copy link
Owner

@treee111 treee111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice collaboration by the way :-)
looking forward to hear from you 👍

I've done three things:

  1. added inline comments
  2. pushed a commit to your branch (clearing up some inline comments already)
  3. addes a comment to the PR

You still might rebase your branch onto latest develop (pull develop first) to integrate our unittest PR 04f09ac. Don't worry about me, I'm confident that my Git will not get confused ;-)

wahoomc/timings.py Outdated Show resolved Hide resolved
wahoomc/timings.py Outdated Show resolved Hide resolved
wahoomc/osm_maps_functions.py Outdated Show resolved Hide resolved
wahoomc/main.py Outdated Show resolved Hide resolved
wahoomc/downloader.py Outdated Show resolved Hide resolved
alfh pushed a commit to alfh/wahooMapsCreator that referenced this pull request Nov 6, 2023
- like you wrote by yourself :-) treee111#225 (comment)
- plus removing all the calls of start_wall_time throughout the files
- log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion
- plus painting a "line" before the logging to be cosistent to the other parts of the log
@alfh alfh force-pushed the add-timins-logging branch from f965cc2 to 6bbb5d8 Compare November 6, 2023 19:52
@alfh
Copy link
Contributor Author

alfh commented Nov 6, 2023

@treee111 Rebased onto develop, and adjusted based on your comments.

I want to clean up the commit history somewhat, but ran out of time today, but still wanted to possibly get some comments to the way I added the debug logging for the tile timings.

I ended up not using the debug vs info logging functionality in Timings class, so I might as well remove that.

Unsure how you feel about the log_tile_info and log_tile_debug methods in osm_map_functions.

Getting the debug logging to display using the "cli" works with "-v", but when running the unit tests, that does not currently work is seem like.

@treee111
Copy link
Owner

treee111 commented Nov 7, 2023

@treee111 Rebased onto develop, and adjusted based on your comments.

👍 looks very promising

I want to clean up the commit history somewhat, but ran out of time today, but still wanted to possibly get some comments to the way I added the debug logging for the tile timings.

Due to that we only squash to develop, all commits will get into one anyway so I would be OK as-is now. Having a clean commit history also in PR's is something I also like, so feel free to clean it up if you have some spare time.

I ended up not using the debug vs info logging functionality in Timings class, so I might as well remove that.

If you don't use it remote it, so you can also remove the value from the constructor where you hand it over but do not use it acutally.
If one would use the debug vs. info functionality of Timings, I thought of this in the timings class
grafik
then giving the two return values to the log_tile function
grafik
but that does not work as we have a tuple now and not two separate parameters.
grafik

Could also be achieved but wouldn't work for situations where we do not need the True or False because INFO logging is fixed (could also be achieved but would make it even harder to maintain). Furthermore that would not be easy to read and maintain in my opinion.

Unsure how you feel about the log_tile_info and log_tile_debug methods in osm_map_functions.

I'd rather remove them as written and already committed here: 4468c7e. If you don't agree feel free to remove that commit and we talk about that one more time.
What I really like is your refactoring of log_tile method - nice!

Getting the debug logging to display using the "cli" works with "-v", but when running the unit tests, that does not currently work is seem like.

If I run unittests I normally have no output and do not need any. Do you want to check and compare the timing using unittests?
For which unittest files do you want it to work? For test_generated_files.py you could easy add it to the cli call in my opinion (haven't tried yet):

result = os.system(
f'python -m wahoomc cli -co {country} -tag tag-wahoo.xml -fp -c -md 9999 -nbc')

alfh pushed a commit to alfh/wahooMapsCreator that referenced this pull request Nov 8, 2023
- like you wrote by yourself :-) treee111#225 (comment)
- log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion
- plus painting a "line" before the logging to be cosistent to the other parts of the log
@alfh alfh force-pushed the add-timins-logging branch from 4468c7e to a852665 Compare November 8, 2023 17:17
@alfh
Copy link
Contributor Author

alfh commented Nov 8, 2023

This is how the logging looks like now

INFO:--------------------------------------------------------------------------------
INFO:--------------------------------------------------------------------------------
INFO:# check geofabrik.json file
INFO:# Input country: malta.
INFO:--------------------------------------------------------------------------------
INFO:+ Involved country: malta
INFO:--------------------------------------------------------------------------------
INFO:# check land_polygons.shp file
INFO:--------------------------------------------------------------------------------
INFO:# check countries .osm.pbf files
INFO:+ Checking for old maps and remove them
INFO:+ mapfile for malta: up-to-date.
INFO:--------------------------------------------------------------------------------
INFO:# Filter tags from country osm.pbf files
INFO:+ Filtering unwanted map objects out of map of malta
INFO:+ Filter tags from country osm.pbf files: OK, took 1.072 s
INFO:--------------------------------------------------------------------------------
INFO:# Generate land for each coordinate
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ Generate land for each coordinate: OK, took 0.625 s
INFO:--------------------------------------------------------------------------------
INFO:# Generate sea for each coordinate
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ Generate sea for each coordinate: OK, took 0.000 s
INFO:--------------------------------------------------------------------------------
INFO:# Split filtered country files to tiles
INFO:+ (tile 1 of 2) Coordinates: 137,100 / malta
INFO:+ (tile 2 of 2) Coordinates: 138,100 / malta
INFO:+ Split filtered country files to tiles: OK, took 2.043 s
INFO:--------------------------------------------------------------------------------
INFO:# Merge splitted tiles with land, elevation, and sea
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ Merge splitted tiles with land, elevation, and sea: OK, took 1.872 s
INFO:--------------------------------------------------------------------------------
INFO:# Creating .map files for tiles
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ Creating .map files for tiles: OK, took 2.846 s
INFO:--------------------------------------------------------------------------------
INFO:# Create: .map.lzma files
INFO:+ Country: malta
INFO:+ Copying .map.lzma tiles to output folders
INFO:+ Create .map.lzma files: OK, took 0.001 s
INFO:--------------------------------------------------------------------------------
INFO:# Create: .map files
INFO:+ Country: malta
INFO:+ Copying .map tiles to output folders
INFO:+ Create .map files: OK, took 0.001 s
INFO:--------------------------------------------------------------------------------
INFO:# Total time 8.476 s

When running with verbose, it looks like this (just an excerpt of the log)

INFO:# Generate sea for each coordinate
DEBUG:+ (tile 1 of 8) Coordinates: 122,69 / took 0.000 s
DEBUG:+ (tile 2 of 8) Coordinates: 122,70 / took 0.000 s
DEBUG:+ (tile 3 of 8) Coordinates: 122,71 / took 0.000 s
DEBUG:+ (tile 4 of 8) Coordinates: 122,72 / took 0.000 s
DEBUG:+ (tile 5 of 8) Coordinates: 123,69 / took 0.000 s
DEBUG:+ (tile 6 of 8) Coordinates: 123,70 / took 0.000 s
DEBUG:+ (tile 7 of 8) Coordinates: 123,71 / took 0.000 s
DEBUG:+ (tile 8 of 8) Coordinates: 123,72 / took 0.000 s
INFO:+ Generate sea for each coordinate: OK, took 0.000 s
INFO:--------------------------------------------------------------------------------
INFO:# Split filtered country files to tiles
INFO:+ (tile 1 of 8) Coordinates: 122,69 / faroe-islands
DEBUG:+ (tile 1 of 8) Coordinates: 122,69 / faroe-islands took 0.082 s
INFO:+ (tile 2 of 8) Coordinates: 122,70 / faroe-islands
DEBUG:+ (tile 2 of 8) Coordinates: 122,70 / faroe-islands took 0.555 s
INFO:+ (tile 3 of 8) Coordinates: 122,71 / faroe-islands
DEBUG:+ (tile 3 of 8) Coordinates: 122,71 / faroe-islands took 0.952 s
INFO:+ (tile 4 of 8) Coordinates: 122,72 / faroe-islands
DEBUG:+ (tile 4 of 8) Coordinates: 122,72 / faroe-islands took 0.097 s
INFO:+ (tile 5 of 8) Coordinates: 123,69 / faroe-islands
DEBUG:+ (tile 5 of 8) Coordinates: 123,69 / faroe-islands took 0.088 s
INFO:+ (tile 6 of 8) Coordinates: 123,70 / faroe-islands
DEBUG:+ (tile 6 of 8) Coordinates: 123,70 / faroe-islands took 0.796 s
INFO:+ (tile 7 of 8) Coordinates: 123,71 / faroe-islands
DEBUG:+ (tile 7 of 8) Coordinates: 123,71 / faroe-islands took 1.480 s
INFO:+ (tile 8 of 8) Coordinates: 123,72 / faroe-islands
DEBUG:+ (tile 8 of 8) Coordinates: 123,72 / faroe-islands took 0.508 s
INFO:+ Split filtered country files to tiles: OK, took 4.559 s
INFO:--------------------------------------------------------------------------------
INFO:# Merge splitted tiles with land, elevation, and sea
INFO:+ (tile 1 of 8) Coordinates: 122,69
DEBUG:--------------------------------------------------------------------------------
DEBUG:# Sorting land* osm files
DEBUG:+ Sorting land* osm files: OK
DEBUG:+ (tile 1 of 8) Coordinates: 122,69 / took 0.375 s
INFO:+ (tile 2 of 8) Coordinates: 122,70
DEBUG:--------------------------------------------------------------------------------
DEBUG:# Sorting land* osm files
DEBUG:+ Sorting land* osm files: OK
DEBUG:+ (tile 2 of 8) Coordinates: 122,70 / took 0.815 s
INFO:+ (tile 3 of 8) Coordinates: 122,71

Note that there is a bit excessive logging about the sorting, where three lines appear for each tile about sorting. That is "as-is", I will probably raise a PR to get rid of that, and only keep one line per tile about sorting

@alfh
Copy link
Contributor Author

alfh commented Nov 8, 2023

@treee111 I consider the PR ready for final review now, I currently do not know of anything outstanding.

alfh and others added 9 commits November 8, 2023 18:39
Use the introduced method for a consistent loggging
of which tile is being processed, also for the
elevation generation, seems like it was forgotten
to update that when the new method was introduced.
Include only wall time, since the CPU usage
of the Python code is neglible, since it mainly
spawns other processes.
- like you wrote by yourself :-) treee111#225 (comment)
- log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion
- plus painting a "line" before the logging to be cosistent to the other parts of the log
@alfh alfh force-pushed the add-timins-logging branch from 555c731 to 6c4b260 Compare November 8, 2023 17:39
@alfh alfh requested a review from treee111 November 8, 2023 17:40
@treee111
Copy link
Owner

treee111 commented Nov 9, 2023

Unsure how you feel about the log_tile_info and log_tile_debug methods in osm_map_functions.

I'd rather remove them as written and already committed here: 4468c7e. If you don't agree feel free to remove that commit and we talk about that one more time. What I really like is your refactoring of log_tile method - nice!

@alfh what do you think of my described change? You lost my commit in your current branch, restored it from my local repo here:
4468c7e

other than that I am with you and will approve this PR

@alfh
Copy link
Contributor Author

alfh commented Nov 9, 2023 via email

Copy link
Owner

@treee111 treee111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry I missed that commit, noticed it afterwards. I want to have it quite visible when reading code if logging goes to info or debug. So I prefer log_tile_info and log_tile_debug, and do not mind the "overhead" of two extra method. It also means pylint will not complain about too many parameters. But it is probably mainly a code style taste. I am OK with both approaches.

alright, so I am also OK with having it more visible what the debug level is.
I olny waht to make sure the comment doesn't get lost.

Let's get this merged! Thanks a lot 👍

@treee111 treee111 changed the title Add logging of timing for total time, for each main section, and for each tile generation step [FEAT] Add logging of timing for total time, each main section and for each tile generation step Nov 9, 2023
@treee111 treee111 added the enhancement New feature or request label Nov 9, 2023
@treee111 treee111 merged commit e5baae0 into treee111:develop Nov 9, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants