Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: print exceptions when they happen, and summarize failed collections at the end #1053

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

wdconinc
Copy link
Contributor

@wdconinc wdconinc commented Oct 5, 2023

Briefly, what does this PR introduce?

Instead of silently eating the exception, this modifies the JEventProcessorPODIO to print the failed collection and its exception when it first happens, and then prints the failed collections again at the end, all using the error log stream. That results in log streams such as:

[acts_init] [info] DD4Hep geometry converted!
[acts_init] [info] Checking surfaces...
[acts_init] [info] Loading magnetic field...
[acts_init] [info] ActsGeometryProvider initialization complete
[JEventProcessorPODIO] [error] Failed to get CentralTrackSeedingResults due to exception: Object already in a collection. Cannot add it to a second collection
[INFO] Status: 7 events processed  7.0 Hz (0.1 Hz avg)
[INFO] Status: 21 events processed  14.0 Hz (0.4 Hz avg)
[INFO] Status: 33 events processed  12.0 Hz (0.7 Hz avg)
[INFO] Status: 45 events processed  12.0 Hz (0.9 Hz avg)
[INFO] Status: 61 events processed  16.0 Hz (1.2 Hz avg)
[INFO] Status: 70 events processed  9.0 Hz (1.3 Hz avg)
[INFO] Status: 80 events processed  10.0 Hz (1.5 Hz avg)
[INFO] Status: 89 events processed  9.0 Hz (1.6 Hz avg)
[INFO] Status: 98 events processed  9.0 Hz (1.7 Hz avg)
[INFO] Status: 100 events processed  8.8 Hz (1.8 Hz avg)
[INFO] All workers have stopped.
[INFO] Merging threads ...
[JEventProcessorPODIO] [error] Failed to get the following collections at least once:
[JEventProcessorPODIO] [error] Failed collection 'CentralTrackSeedingResults'
[INFO] JArrow: Finalized JEventProcessor 'JEventProcessorPODIO'

(obtained with $PWD/bin/eicrecon -Ppodio:output_include_collections=CentralTrackSeedingResults -Ppodio:output_file=rec_e_1GeV_20GeV_craterlake.edm4eic.root sim_e_1GeV_20GeV_craterlake.edm4hep.root with reverted #1050)

We may be able to go further and not allow any exceptions at all now. I don't think we need that functionality anymore since we now expect algorithms to catch their exceptions and we don't have any unused factories.

What kind of change does this PR introduce?

Please check if this PR fulfills the following:

  • Tests for the changes have been added
  • Documentation has been added / updated
  • Changes have been communicated to collaborators

Does this PR introduce breaking changes? What changes might users need to make to their code?

No.

Does this PR change default behavior?

No.

@nathanwbrei
Copy link
Contributor

Some thoughts:

  1. Thanks for moving the log messages to the spdlog. I'm tempted to add the option to redirect all JANA-internal logging to spdlog. (This might be a good task for our new hire, Ayan...) It would be nice to have as many messages as possible formatted the same for the sake of searching and filtering them.

  2. I personally wouldn't mind going even further and terminating execution upon encountering an exception. (I honestly thought you or Dmitry had already done this!) It's not all that possible to catch and recover from an exception when stuff is happening asynchronously and on different threads. Maybe we could add an option to terminate on exception simply for the CI.

@wdconinc
Copy link
Contributor Author

wdconinc commented Oct 5, 2023

Maybe we could add an option to terminate on exception simply for the CI.

I was thinking in this direction as well. Essentially I was going to suggest a boolean parameter allow_failing_collections inside JEventProcessorPODIO that configures whether to fail on exceptions (well, rethrows the exception, and JANA2 should then fail).

@wdconinc wdconinc temporarily deployed to github-pages October 5, 2023 20:03 — with GitHub Actions Inactive
@wdconinc
Copy link
Contributor Author

wdconinc commented Oct 5, 2023

I'm tempted to add the option to redirect all JANA-internal logging to spdlog.

I think it would be interesting to develop an interface where someone can provide a JANA2 compatible logger that is then used for all JANA2 logging (and if not provided it falls back to the current default). We already do something like that with Acts::Logger. For JANA2 my question would be when we pass this to JANA2: we can't exactly wait until we have a Log_service.

@wdconinc
Copy link
Contributor Author

wdconinc commented Oct 6, 2023

Moved Nathan's comment 2 to separate issue, #1056, to be implemented in different PR.

@@ -281,41 +281,47 @@ void JEventProcessorPODIO::Process(const std::shared_ptr<const JEvent> &event) {
// that are determined by hash, we have to ensure they are reproducible
// even if the collections are filled in unpredictable order (or not at
// all). See also below, at "TODO: NWB:".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From CI, it doesn't look like any exceptions are triggered by missing collection id's. So we can delete this, perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am certainly open to making it a requirement that factories have to catch exceptions due to missing collections, instead of catching this in the event processor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like #1069

@@ -32,4 +32,6 @@ class JEventProcessorPODIO : public JEventProcessor {
std::vector<std::string> m_collections_to_write; // derived from above config. parameters
std::vector<std::string> m_collections_to_print;

private:
std::set<std::string> m_failed_collections;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've submitted a few PRs for HcalEndcapP. With those merged, we could fail immediately instead of just making warnings.

github-merge-queue bot pushed a commit that referenced this pull request Oct 9, 2023
That detector was removed from the geometry configurations long time
ago.

cc #1053
@veprbl veprbl temporarily deployed to github-pages October 11, 2023 05:54 — with GitHub Actions Inactive
@wdconinc wdconinc force-pushed the 1052-exception-emitted-in-a-jchainfactorytprocess-gets-silently-absorbed branch from b1f956b to f4f5741 Compare November 25, 2023 16:20
auto-merge was automatically disabled April 20, 2024 00:14

Merge queue setting changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception emitted in a JChainFactoryT::Process() gets silently absorbed
3 participants