Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] Fix forceflush may wait for ever #2584

Merged

Conversation

owent
Copy link
Member

@owent owent commented Mar 10, 2024

Fixes #2574

May also fixes #2583

Changes

  • Using a sequence to check if a complete notification includes the datas when call ForceFlush.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@owent owent requested a review from a team March 10, 2024 09:52
@owent
Copy link
Member Author

owent commented Apr 30, 2024

@ThomsonTan Is there any problem in this PR? Maybe we can close #2553 if this PR is merged. It fixes the same problems as #2553 do and also fix the simular problems in trace and logs.

@@ -79,21 +79,25 @@ bool BatchLogRecordProcessor::ForceFlush(std::chrono::microseconds timeout) noex
// Now wait for the worker thread to signal back from the Export method
std::unique_lock<std::mutex> lk_cv(synchronization_data_->force_flush_cv_m);

synchronization_data_->is_force_flush_pending.store(true, std::memory_order_release);
std::uint64_t current_sequence =
synchronization_data_->force_flush_pending_sequence.fetch_add(1, std::memory_order_release) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the fetch_add here just the same as synchronization_data_->force_flush_pending_sequence++?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, synchronization_data_->force_flush_pending_sequence++ use std::memory_order_seq_cst , we can use std::memory_order_release here.

@ThomsonTan
Copy link
Contributor

BTW, can you explain in brief that why the sequence notification can break the potential forever wait? Thanks.

@owent
Copy link
Member Author

owent commented May 8, 2024

BTW, can you explain in brief that why the sequence notification can break the potential forever wait? Thanks.

When more than one threads call ForceFlush concurrency, is_force_flush_notified may be set true only once in background thread and one of the threads call ForceFlush will wait is_force_flush_notified for ever.

Copy link

codecov bot commented May 8, 2024

Codecov Report

Attention: Patch coverage is 95.94595% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 87.70%. Comparing base (497eaf4) to head (d3df9a4).
Report is 65 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2584      +/-   ##
==========================================
+ Coverage   87.12%   87.70%   +0.58%     
==========================================
  Files         200      190      -10     
  Lines        6109     5852     -257     
==========================================
- Hits         5322     5132     -190     
+ Misses        787      720      -67     
Files Coverage Δ
...pentelemetry/sdk/logs/batch_log_record_processor.h 100.00% <100.00%> (ø)
...ude/opentelemetry/sdk/trace/batch_span_processor.h 100.00% <100.00%> (ø)
sdk/src/logs/batch_log_record_processor.cc 89.21% <96.16%> (+1.71%) ⬆️
...metrics/export/periodic_exporting_metric_reader.cc 81.32% <95.00%> (+1.98%) ⬆️
sdk/src/trace/batch_span_processor.cc 94.08% <96.16%> (+1.65%) ⬆️

... and 57 files with indirect coverage changes

@marcalff marcalff changed the title Fix forceflush may wait for ever [SDK] Fix forceflush may wait for ever May 23, 2024
Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

This area also changed recently when is_shutdown was fixed to be an atomic instead of a plain boolean.

I was concerned with potential overlap and collisions, but it turns out the two fixes are really independent, which is better.

LGTM.

@marcalff
Copy link
Member

Adding ok-to-merge.

Waiting to give @lalitb and @esigo a chance to comment, and planning to merge early next week.

@marcalff marcalff added the ok-to-merge The PR is ok to merge (has two approves or raised by a maintainer/approver and has one approve) label May 23, 2024
@marcalff marcalff merged commit 605c3e8 into open-telemetry:main May 27, 2024
49 checks passed
malkia added a commit to malkia/opentelemetry-cpp that referenced this pull request May 27, 2024
[SDK] Fix forceflush may wait for ever (open-telemetry#2584)
@owent owent deleted the fix_hangs_too_long_when_calling_forceflush branch May 28, 2024 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-merge The PR is ok to merge (has two approves or raised by a maintainer/approver and has one approve)
Projects
None yet
4 participants