-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Add logging of timing for total time, each main section and for each tile generation step #225
Conversation
10e12fc
to
0562cc8
Compare
I see that pylint now complains about too many local variables in merge_splitted_tiles_with_land_and_sea. Unsure if I should just remove the logging there for now, or if the method should be splitted somehow, but that would be a general refactoring, and not as part of the adding of timings, I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this big PR!
I added some inline comments, in general I have these comments:
- A lot of copy and paste code - we should think of how to avoid that for changes concerning timing later on which should be reflected at all places. In the in-line comment I talked about a class to handle that.
- I tried to have a consistent logging for the tiles by outputting the log_tile before doing something to know where it crashed but only one line per "tile" or "tile/country" combination. For easier reporting and understanding logs. The green marked logs are OK for me, there you only added the timing. Lets think of how we can change the second line so that it is better readable and we only have one logging line per "tile" or "tile/country" combination. For all timing-logs maybe use a central logging in the class - see 1. ?
instead of having:
INFO:# Creating .map files for tiles
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 1 of 2) Coordinates: 137,100 / took 1.80000
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ (tile 2 of 2) Coordinates: 138,100 / took 3.23181
INFO:+ Creating .map files for tiles: OK, took 0.04103, 5.03245
that might suit me better:
INFO:# Creating .map files for tiles
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ took 1.80000 sec.
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ took 3.23181 sec.
INFO:+ Creating .map files for tiles: OK, took 0.04103, 5.03245
-
Sometime you output one, sometimes two values. Could we add before/after the value what it represents, i.e.
processing secs
andCPU secs
or other abbreviations more suitable to you? -
rebasing onto develop should make all unittests run green.
as said already, thanks for the effort 👍 - I am looking forward to get this further :-)
If you need some ideas concerning my comments, I could bring up my idea also "in code"!
My considerations on the logging style
vs
is that I wanted to be able to do a "grep" on the log output, and quickly find all tiles that were slow / fast / something else. But it is also important to log the tile when starting to process it, in case there is some deadlock or something. I was now considering a "log_tile_start" method, which does not log with a newline, and then a "log_tile_end" which then logs the "took .." section. I also saw that there are decorator pattern, which could possible be used for both the logging of the main tile info, and the duration at the end. |
0562cc8
to
4b94c2b
Compare
I force pushed some updated, before I noticed that you had commented the PR, sorry about that. Thanks for the feedback, I'll be looking into these wise comments now. |
Maybe the tile duration could be logged as debug instead of info log level? Not quite sure who such differences are handled in Python. |
- like you wrote by yourself :-) treee111#225 (comment) - plus removing all the calls of start_wall_time throughout the files - log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion - plus painting a "line" before the logging to be cosistent to the other parts of the log
- like you wrote by yourself :-) treee111#225 (comment) - plus removing all the calls of start_wall_time throughout the files - log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion - plus painting a "line" before the logging to be cosistent to the other parts of the log
ff54dac
to
f965cc2
Compare
thanks for thinking about outputting the per tile details and giving your thought behind the logging :-)
Yesterday that was also my first thought. Adding "took .." or somehow replacing the whole line with the addition "took .." but that is not possible without hacks as you found out as well.
I saw decorators also in another project and thought that they could be used for the timing stuff. Because I have never used them I'm not really sure how that would go and look...
Very nice idea - i like! 🥇 That would mean, that all other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice collaboration by the way :-)
looking forward to hear from you 👍
I've done three things:
- added inline comments
- pushed a commit to your branch (clearing up some inline comments already)
- addes a comment to the PR
You still might rebase your branch onto latest develop (pull develop first) to integrate our unittest PR 04f09ac. Don't worry about me, I'm confident that my Git will not get confused ;-)
- like you wrote by yourself :-) treee111#225 (comment) - plus removing all the calls of start_wall_time throughout the files - log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion - plus painting a "line" before the logging to be cosistent to the other parts of the log
f965cc2
to
6bbb5d8
Compare
@treee111 Rebased onto develop, and adjusted based on your comments. I want to clean up the commit history somewhat, but ran out of time today, but still wanted to possibly get some comments to the way I added the debug logging for the tile timings. I ended up not using the debug vs info logging functionality in Timings class, so I might as well remove that. Unsure how you feel about the log_tile_info and log_tile_debug methods in osm_map_functions. Getting the debug logging to display using the "cli" works with "-v", but when running the unit tests, that does not currently work is seem like. |
👍 looks very promising
Due to that we only squash to develop, all commits will get into one anyway so I would be OK as-is now. Having a clean commit history also in PR's is something I also like, so feel free to clean it up if you have some spare time.
If you don't use it remote it, so you can also remove the value from the constructor where you hand it over but do not use it acutally. Could also be achieved but wouldn't work for situations where we do not need the True or False because INFO logging is fixed (could also be achieved but would make it even harder to maintain). Furthermore that would not be easy to read and maintain in my opinion.
I'd rather remove them as written and already committed here: 4468c7e. If you don't agree feel free to remove that commit and we talk about that one more time.
If I run unittests I normally have no output and do not need any. Do you want to check and compare the timing using unittests? wahooMapsCreator/tests/test_generated_files.py Lines 192 to 193 in 199a13e
|
- like you wrote by yourself :-) treee111#225 (comment) - log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion - plus painting a "line" before the logging to be cosistent to the other parts of the log
4468c7e
to
a852665
Compare
This is how the logging looks like now
When running with verbose, it looks like this (just an excerpt of the log)
Note that there is a bit excessive logging about the sorting, where three lines appear for each tile about sorting. That is "as-is", I will probably raise a PR to get rid of that, and only keep one line per tile about sorting |
@treee111 I consider the PR ready for final review now, I currently do not know of anything outstanding. |
Use the introduced method for a consistent loggging of which tile is being processed, also for the elevation generation, seems like it was forgotten to update that when the new method was introduced.
Include only wall time, since the CPU usage of the Python code is neglible, since it mainly spawns other processes.
Include only wall time.
Only include the wall time used.
…_tiles_with_land_and_sea method
- like you wrote by yourself :-) treee111#225 (comment) - log_duration is only called once so cutting down the method makes it clearer and identical to stop_and_return in my opinion - plus painting a "line" before the logging to be cosistent to the other parts of the log
555c731
to
6c4b260
Compare
@alfh what do you think of my described change? You lost my commit in your current branch, restored it from my local repo here: other than that I am with you and will approve this PR |
Yes, sorry I missed that commit, noticed it afterwards.
I want to have it quite visible when reading code if logging goes to info or debug.
So I prefer log_tile_info and log_tile_debug, and do not mind the "overhead" of two extra method. It also means pylint will not complain about too many parameters.
But it is probably mainly a code style taste. I am OK with both approaches.
Alf
Få BlueMail for Android
9. nov. 2023, 17:21 kl. 17:21 skrev "Benjamin K." ***@***.***>:
…> > Unsure how you feel about the log_tile_info and log_tile_debug
methods in osm_map_functions.
>
> I'd rather remove them as written and already committed here:
[4468c7e](4468c7e).
If you don't agree feel free to remove that commit and we talk about
that one more time. What I really like is your refactoring of
`log_tile` method - nice!
>
@alfh what do you think of my described change? You lost my commit in
your current branch, restored it from my local repo here:
4468c7e
other than that I am with you and will approve this PR
--
Reply to this email directly or view it on GitHub:
#225 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry I missed that commit, noticed it afterwards. I want to have it quite visible when reading code if logging goes to info or debug. So I prefer log_tile_info and log_tile_debug, and do not mind the "overhead" of two extra method. It also means pylint will not complain about too many parameters. But it is probably mainly a code style taste. I am OK with both approaches.
alright, so I am also OK with having it more visible what the debug level is.
I olny waht to make sure the comment doesn't get lost.
Let's get this merged! Thanks a lot 👍
This PR…
Add logging of duration of various steps of the process, to be able to see where the focus for improvements should be, and to be able to detect any changes in performance from version to version, or from machine to machine, or from country to country.
Considerations and implementations
There are a couple of commits at the start of the history which are really general small fixups, I could have created a separate PR for them.
Ended up adding a Timings class for handling the actual duration and generating the text for logging, or doing the actual logging, based on PR comments from @treee111 .
I have just added simple logging, and have not looked for any general "metrics" python library.
Only wall time is logged.
Originally I also included CPU time in most places, but I think the CPU time it less interesting, since it is mainly other processes that are being spawned.
I have not added an CLI option to choose whether you want the logging or not, I assume it is quite interesting for most people.
I have added debug level logging for each tile, to get the duration taken for each tile.
Originally logged the time with 5 decimals, but reduced it to 3, even though that means some steps are shown as taking 0.000 seconds.
I have made one final restructuring of the commits, have kept the main parts of the commit from @treee111 for the PR review, a couple of minor suggestions there I splitted out, and combined with my commits, to have a cleaner commit history, with fewer "correction" commits.
How to test
python -m wahoomc cli -co malta
python -m wahoomc cli -co malta -v
Pull Request Checklist