Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ground_segmentation): add time_keeper #8585

Conversation

a-maumau
Copy link
Contributor

@a-maumau a-maumau commented Aug 22, 2024

Description

This PR will add time keeper feature for ground_segmentation.

The time tracking is set only for relatively long processing time functions/methods.

You need to modified the parameter file ground_segmentation.param.yaml in autoware_launch to run in autoware.


Sample output (updated from original comment)

# total time

- scan_ground_filter
100.00% faster_filter: total 3706.36 [ms], avg. 18.53 [ms], run count: 200
    ├── 67.09% convertPointcloudGridScan: total 2486.72 [ms], avg. 12.43 [ms], run count: 200
    │   ├── 50.49% scan: total 1871.41 [ms], avg. 9.36 [ms], run count: 200
    │   ├── 16.53% sort: total 612.49 [ms], avg. 3.06 [ms], run count: 200
    │   └── 0.08% rest: 2.82 [ms]
    ├── 28.57% classifyPointCloudGridScan: total 1059.02 [ms], avg. 5.30 [ms], run count: 200
    │   ├── 28.56% classify_loop: total 1058.42 [ms], avg. 5.29 [ms], run count: 200
    │   │   ├── 26.91% radial_loop: total 997.52 [ms], avg. 0.01 [ms], run count: 71999
    │   │   └── 1.64% rest: 60.90 [ms]
    │   └── 0.02% rest: 0.60 [ms]
    ├── 3.50% extractObjectPoints: total 129.54 [ms], avg. 0.65 [ms], run count: 200
    └── 0.84% rest: 31.07 [ms]

- ray_ground_filter
  100.00% filter: total 10171.14 [ms], avg. 36.46 [ms], run count: 279
    ├── 70.96% ConvertXYZIToRTZColor: total 7217.59 [ms], avg. 25.87 [ms], run count: 279
    ├── 13.44% ClassifyPointCloud: total 1366.64 [ms], avg. 4.90 [ms], run count: 279
    ├── 6.86% ExtractPointsIndices: total 698.06 [ms], avg. 2.50 [ms], run count: 279
    │   ├── 4.16% initializePointCloud2: total 423.15 [ms], avg. 0.76 [ms], run count: 558
    │   └── 2.70% rest: 274.91 [ms]
    └── 8.74% rest: 888.85 [ms]

- ransac_ground_filter
  100.00% filter: total 130212.22 [ms], avg. 1943.47 [ms], run count: 67
    ├── 0.07% transformPointCloud: total 89.05 [ms], avg. 0.66 [ms], run count: 134
    ├── 99.45% applyRANSAC: total 129490.73 [ms], avg. 1932.70 [ms], run count: 67
    ├── 0.07% extractPointsIndices: total 93.24 [ms], avg. 1.39 [ms], run count: 67
    ├── 0.00% getPlaneAffine: total 0.39 [ms], avg. 0.01 [ms], run count: 67
    └── 0.41% rest: 538.80 [ms]

Related links

Parent Issue:

  • Link

How was this PR tested?

Notes for reviewers

None.

Interface changes

None.

Effects on system behavior

None.

@github-actions github-actions bot added component:perception Advanced sensor data processing and environment understanding. (auto-assigned) tag:require-cuda-build-and-test labels Aug 22, 2024
Copy link

github-actions bot commented Aug 22, 2024

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@a-maumau a-maumau marked this pull request as draft August 22, 2024 09:38
@a-maumau a-maumau marked this pull request as ready for review August 23, 2024 01:57
@badai-nguyen badai-nguyen added the run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) label Aug 27, 2024
@badai-nguyen
Copy link
Contributor

badai-nguyen commented Aug 27, 2024

@a-maumau Thank you for your PR,
I think there are no time keeper for below functions

    │   ├── 0.02% checkBreakGndGrid: total 0.14 [ms], avg. 0.00 [ms], run count: 427548
    │   ├── 0.01% recheckGroundCluster: total 0.09 [ms], avg. 0.00 [ms], run count: 164117
    │   ├── 0.01% checkContinuousGndGrid: total 0.04 [ms], avg. 0.00 [ms], run count: 269943
    │   ├── 0.02% checkDiscontinuousGndGrid: total 0.12 [ms], avg. 0.00 [ms], run count: 354528
    │   └── 81.57% rest: 512.40 [ms]

Could you tell me how did you extract these result?

@technolojin
Copy link
Contributor

similar to #8597 (comment)

@a-maumau
Copy link
Contributor Author

a-maumau commented Aug 29, 2024

@badai-nguyen Sorry for my poor explanations and making some confusions.

The time tracking is set only for relatively long processing time functions/methods.

This meant "I remove some timekeeper from the code".

The reason I removed the timekeeper from some methods is that its processing time is relatively small compared to the other parts.
It was not intended in this PR, but I found that adding the timekeeper for some method make the entire process slow later.

For example, following two images are output from debug tools.
too_much_calling_of_timekeeper_comp_nomal

too_much_calling_of_timekeeper
The first image shows the visualization of processing_time_ms of the current code (no timekeeper). Second image is showing the case when the timekeeper added to the all methods.

In the second image, the processing time of scan_ground_filter (brown plot) ranged in like 160~240 ms. I think this is caused by too many call of some methods with timekeeper, I guess.

Checking the autoware_debug_tools's processing_time_visualizer,

100.00% faster_filter: total 238.61 [ms], avg. 238.61 [ms], run count: 1
    ├── 35.13% convertPointcloudGridScan: total 83.82 [ms], avg. 83.82 [ms], run count: 1
    │   ├── 0.52% get_point_from_global_offset: total 1.23 [ms], avg. 0.00 [ms], run count: 138102
    │   └── 34.61% rest: 82.59 [ms]
    ├── 64.21% classifyPointCloudGridScan: total 153.22 [ms], avg. 153.22 [ms], run count: 1
    │   ├── 0.03% get_point_from_global_offset: total 0.07 [ms], avg. 0.00 [ms], run count: 138102
    │   ├── 0.00% initializeFirstGndGrids: total 0.00 [ms], avg. 0.00 [ms], run count: 358
    │   ├── 0.03% calcGridSize: total 0.06 [ms], avg. 0.00 [ms], run count: 88036
    │   ├── 0.03% checkBreakGndGrid: total 0.07 [ms], avg. 0.00 [ms], run count: 35212
    │   ├── 0.04% recheckGroundCluster: total 0.08 [ms], avg. 0.00 [ms], run count: 14935
    │   ├── 0.02% checkContinuousGndGrid: total 0.04 [ms], avg. 0.00 [ms], run count: 24707
    │   ├── 0.01% checkDiscontinuousGndGrid: total 0.02 [ms], avg. 0.00 [ms], run count: 28117
    │   └── 64.07% rest: 152.88 [ms]
    ├── 0.44% extractObjectPoints: total 1.06 [ms], avg. 1.06 [ms], run count: 1
    └── 0.22% rest: 0.51 [ms]
100.00% faster_filter: total 24.13 [ms], avg. 24.13 [ms], run count: 1
    ├── 72.51% convertPointcloudGridScan: total 17.50 [ms], avg. 17.50 [ms], run count: 1
    ├── 24.45% classifyPointCloudGridScan: total 5.90 [ms], avg. 5.90 [ms], run count: 1
    ├── 2.50% extractObjectPoints: total 0.60 [ms], avg. 0.60 [ms], run count: 1
    └── 0.53% rest: 0.13 [ms]

This result is not from the same point in the rosbag file, so it is hard compare the result, but the ratio of processing times of convertPointcloudGridScan and classifyPointCloudGridScan got largely changed.
At least, we can say the overall processing time has reduced if we remove the timekeeper from some methods.

@a-maumau
Copy link
Contributor Author

@technolojin It seems it is taking time for the for-loop which depends on the input point cloud number, and this might indicate the sample result of rest in the top PR comment comes from too many timekeeper.

For example in the case of adding timekeeper only for few methods,

100.00% faster_filter: total 24.29 [ms], avg. 24.29 [ms], run count: 1
    ├── 64.59% convertPointcloudGridScan: total 15.69 [ms], avg. 15.69 [ms], run count: 1
    ├── 31.13% classifyPointCloudGridScan: total 7.56 [ms], avg. 7.56 [ms], run count: 1
    ├── 3.76% extractObjectPoints: total 0.91 [ms], avg. 0.91 [ms], run count: 1
    └── 0.52% rest: 0.13 [ms]
  • convertPointcloudGridScan

    • for (size_t global_offset = 0; global_offset + in_cloud_point_step <= in_cloud_data_size;
      global_offset += in_cloud_point_step) {
      get_point_from_global_offset(in_cloud, input_point, global_offset);
      auto x{input_point.x - x_shift}; // base on front wheel center
      auto radius{static_cast<float>(std::hypot(x, input_point.y))};
      auto theta{normalizeRadian(std::atan2(x, input_point.y), 0.0)};
      // divide by vertical angle
      auto radial_div{static_cast<size_t>(std::floor(theta * inv_radial_divider_angle_rad))};
      uint16_t grid_id = 0;
      if (radius <= grid_mode_switch_radius_) {
      grid_id = static_cast<uint16_t>(radius * inv_grid_size_m);
      } else {
      auto gamma{normalizeRadian(std::atan2(radius, virtual_lidar_z_), 0.0f)};
      grid_id = grid_id_offset + gamma * inv_grid_size_rad;
      }
      current_point.grid_id = grid_id;
      current_point.radius = radius;
      current_point.point_state = PointLabel::INIT;
      current_point.orig_index = point_index;
      // radial divisions
      out_radial_ordered_points[radial_div].emplace_back(current_point);
      ++point_index;
      }
      : 8.962 ms
    • for (size_t i = 0; i < radial_dividers_num_; ++i) {
      std::sort(
      out_radial_ordered_points[i].begin(), out_radial_ordered_points[i].end(),
      [](const PointData & a, const PointData & b) { return a.radius < b.radius; });
      }
      : 4.043 ms
    • 8.962 + 4.043 ~ 13 ms (total is 15.69ms)
  • classifyPointCloudGridScan

    • the top loop take: 7.541 ms (total is 7.56 ms)

@badai-nguyen
Copy link
Contributor

@badai-nguyen Sorry for my poor explanations and making some confusions.

The time tracking is set only for relatively long processing time functions/methods.

This meant "I remove some timekeeper from the code".

The reason I removed the timekeeper from some methods is that its processing time is relatively small compared to the other parts. It was not intended in this PR, but I found that adding the timekeeper for some method make the entire process slow later.

For example, following two images are output from debug tools. too_much_calling_of_timekeeper_comp_nomal

too_much_calling_of_timekeeper The first image shows the visualization of processing_time_ms of the current code (no timekeeper). Second image is showing the case when the timekeeper added to the all methods.

In the second image, the processing time of scan_ground_filter ranged in like 160~240 ms. I think this is caused by too many call of some methods with timekeeper, I guess.

Checking the autoware_debug_tools's processing_time_visualizer,

100.00% faster_filter: total 238.61 [ms], avg. 238.61 [ms], run count: 1
    ├── 35.13% convertPointcloudGridScan: total 83.82 [ms], avg. 83.82 [ms], run count: 1
    │   ├── 0.52% get_point_from_global_offset: total 1.23 [ms], avg. 0.00 [ms], run count: 138102
    │   └── 34.61% rest: 82.59 [ms]
    ├── 64.21% classifyPointCloudGridScan: total 153.22 [ms], avg. 153.22 [ms], run count: 1
    │   ├── 0.03% get_point_from_global_offset: total 0.07 [ms], avg. 0.00 [ms], run count: 138102
    │   ├── 0.00% initializeFirstGndGrids: total 0.00 [ms], avg. 0.00 [ms], run count: 358
    │   ├── 0.03% calcGridSize: total 0.06 [ms], avg. 0.00 [ms], run count: 88036
    │   ├── 0.03% checkBreakGndGrid: total 0.07 [ms], avg. 0.00 [ms], run count: 35212
    │   ├── 0.04% recheckGroundCluster: total 0.08 [ms], avg. 0.00 [ms], run count: 14935
    │   ├── 0.02% checkContinuousGndGrid: total 0.04 [ms], avg. 0.00 [ms], run count: 24707
    │   ├── 0.01% checkDiscontinuousGndGrid: total 0.02 [ms], avg. 0.00 [ms], run count: 28117
    │   └── 64.07% rest: 152.88 [ms]
    ├── 0.44% extractObjectPoints: total 1.06 [ms], avg. 1.06 [ms], run count: 1
    └── 0.22% rest: 0.51 [ms]
100.00% faster_filter: total 24.13 [ms], avg. 24.13 [ms], run count: 1
    ├── 72.51% convertPointcloudGridScan: total 17.50 [ms], avg. 17.50 [ms], run count: 1
    ├── 24.45% classifyPointCloudGridScan: total 5.90 [ms], avg. 5.90 [ms], run count: 1
    ├── 2.50% extractObjectPoints: total 0.60 [ms], avg. 0.60 [ms], run count: 1
    └── 0.53% rest: 0.13 [ms]

This result is not from the same point in the rosbag file, so it is hard compare the result, but the ratio of processing times of convertPointcloudGridScan and classifyPointCloudGridScan got largely changed. At least, we can say the overall processing has reduced if we remove the timekeeper from some methods.

@a-maumau thank you for your clarification.

  • It makes sense to me know since I also got quite the same result at my side
    Screenshot from 2024-08-27 11-27-55

  • It's also a good point to known that timekeeper quite takes time to process. ❤️

@a-maumau
Copy link
Contributor Author

I added more timekeepers to make the processing time clearer for scan_groud_filter in f70830b .

Sample output:

100.00% faster_filter: total 3706.36 [ms], avg. 18.53 [ms], run count: 200
    ├── 67.09% convertPointcloudGridScan: total 2486.72 [ms], avg. 12.43 [ms], run count: 200
    │   ├── 50.49% scan: total 1871.41 [ms], avg. 9.36 [ms], run count: 200
    │   ├── 16.53% sort: total 612.49 [ms], avg. 3.06 [ms], run count: 200
    │   └── 0.08% rest: 2.82 [ms]
    ├── 28.57% classifyPointCloudGridScan: total 1059.02 [ms], avg. 5.30 [ms], run count: 200
    │   ├── 28.56% classify_loop: total 1058.42 [ms], avg. 5.29 [ms], run count: 200
    │   │   ├── 26.91% radial_loop: total 997.52 [ms], avg. 0.01 [ms], run count: 71999
    │   │   └── 1.64% rest: 60.90 [ms]
    │   └── 0.02% rest: 0.60 [ms]
    ├── 3.50% extractObjectPoints: total 129.54 [ms], avg. 0.65 [ms], run count: 200
    └── 0.84% rest: 31.07 [ms]

Copy link
Contributor

@technolojin technolojin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed focusing on the timekeeper implementation.

@a-maumau
Copy link
Contributor Author

Output change at 45de8e7 :

100.00% faster_filter: total 2810.32 [ms], avg. 18.25 [ms], run count: 154
    ├── 69.23% convertPointcloudGridScan: total 1945.50 [ms], avg. 12.63 [ms], run count: 154
    │   ├── 51.90% horizontal_angle_grouping: total 1458.67 [ms], avg. 9.47 [ms], run count: 154
    │   ├── 17.24% sort: total 484.54 [ms], avg. 3.15 [ms], run count: 154
    │   └── 0.08% rest: 2.29 [ms]
    ├── 26.85% classifyPointCloudGridScan: total 754.67 [ms], avg. 4.90 [ms], run count: 154
    ├── 3.12% extractObjectPoints: total 87.62 [ms], avg. 0.57 [ms], run count: 154
    └── 0.80% rest: 22.52 [ms]

Copy link
Contributor

@technolojin technolojin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are more on details.

Other then those, LGTM.

Copy link

codecov bot commented Aug 30, 2024

Codecov Report

Attention: Patch coverage is 41.02564% with 46 lines in your changes missing coverage. Please review.

Project coverage is 25.23%. Comparing base (67265bb) to head (a19b5e5).
Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
...round_segmentation/src/scan_ground_filter/node.cpp 42.85% 24 Missing and 8 partials ⚠️
...ground_segmentation/src/ray_ground_filter/node.cpp 38.46% 3 Missing and 5 partials ⚠️
...und_segmentation/src/ransac_ground_filter/node.cpp 33.33% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8585      +/-   ##
==========================================
- Coverage   25.23%   25.23%   -0.01%     
==========================================
  Files        1325     1328       +3     
  Lines       98572    98729     +157     
  Branches    38088    38163      +75     
==========================================
+ Hits        24874    24910      +36     
- Misses      71156    71227      +71     
- Partials     2542     2592      +50     
Flag Coverage Δ *Carryforward flag
differential 11.14% <41.02%> (?)
total 25.22% <ø> (-0.01%) ⬇️ Carriedforward from 67265bb

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@a-maumau
Copy link
Contributor Author

a-maumau commented Sep 2, 2024

@technolojin The reason for adding the new block (scope) in those lines might not be that clear. Would it be alright to remove that explanation?

@technolojin
Copy link
Contributor

@technolojin The reason for adding the new block (scope) in those lines might not be that clear. Would it be alright to remove that explanation?

I do not think the new blocks are not clear.
It is possible to separate those by methods, name it by its functionality(, and measure the time).

This process (separate a long code to blocks) is meaningful and useful to maintain a complex code readable and manageable.

@a-maumau a-maumau force-pushed the mau/feat/perception/ground_segmentation branch from 18e4ab2 to 2c68ffb Compare September 2, 2024 03:08
Copy link
Contributor

@YoshiRi YoshiRi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@technolojin technolojin enabled auto-merge (squash) September 9, 2024 04:34
@technolojin technolojin disabled auto-merge September 9, 2024 04:34
Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>
Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>
Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>
Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>
Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>
Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>
Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>
Copy link
Contributor

@Shin-kyoto Shin-kyoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-maumau

Can you resolve conflicts?

@a-maumau
Copy link
Contributor Author

a-maumau commented Sep 9, 2024

@Shin-kyoto I resolved the conflict.

@technolojin technolojin force-pushed the mau/feat/perception/ground_segmentation branch from d8fd91a to a19b5e5 Compare September 9, 2024 05:41
@technolojin technolojin enabled auto-merge (squash) September 9, 2024 05:43
@technolojin
Copy link
Contributor

@Shin-kyoto can you review again please?

@technolojin technolojin merged commit dabeaa6 into autowarefoundation:main Sep 10, 2024
30 of 32 checks passed
emuemuJP pushed a commit to arayabrain/autoware.universe.origin that referenced this pull request Sep 10, 2024
* add time_keeper

Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>

* add timekeeper option

Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>

* add autoware_universe_utils

Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>

* fix topic name

Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>

* add scope and timekeeper

Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>

* remove debug code

Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>

* remove some timekeeper and mod block comment

Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>

---------

Signed-off-by: a-maumau <maumaumaumaumaumaumaumaumaumau@gmail.com>
Signed-off-by: emuemuJP <k.matsumoto.0807@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) tag:require-cuda-build-and-test
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

5 participants