Skip NextBus update if integration is still loading #123564

drozycki · 2024-08-11T04:19:38Z

Fixes a race between the loading thread and
update thread leading to an unrecoverable error

Proposed change

The NextBus integration has a race condition at HA instance start between the thread which sequentially loads configured entities and the update thread which iterates through the set of loaded entities and makes NextBus API calls. The integration never recovers from this error, and will never update again until the HA instance is restarted. The likelihood of encountering this race condition increases as you increase the number of entities.

My fix checks the hass object state and returns an empty dict if it is not yet running. I have confirmed that this fixes the issue and that all automated tests pass.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes [NextBus] Integration crashes after startup when several entities configured #123562
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Ruff (ruff format homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

home-assistant · 2024-08-11T04:19:44Z

Hey there @ViViDboarder, mind taking a look at this pull request as it has been labeled with an integration (nextbus) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of nextbus can trigger bot actions by commenting:

@home-assistant close Closes the pull request.
@home-assistant rename Awesome new title Renames the pull request.
@home-assistant reopen Reopen the pull request.
@home-assistant unassign nextbus Removes the current integration label and assignees on the pull request, add the integration domain after the command.
@home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
@home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

gjohansson-ST · 2024-08-11T12:57:13Z

homeassistant/components/nextbus/coordinator.py

@@ -52,6 +52,10 @@ async def _async_update_data(self) -> dict[str, Any]:
        """Fetch data from NextBus."""
        self.logger.debug("Updating data from API. Routes: %s", str(self._route_stops))

+        if self.hass.state is not CoreState.running:


It seems quite wierd that this integration would be dependent on HA startup (are we fixing a symptom?).

However it would be better to use from homeassistant.helpers.start import async_at_started in __init__.py to delay the startup of this integration rather than returning some empty dict here. See example in speedtestdotnet.

I updated the PR to use this pattern instead. I verified that this resolves the issue on my production HA instance.

home-assistant · 2024-08-11T12:57:19Z

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

joostlek · 2024-08-11T13:07:33Z

I'd be interested to see the logs of this happening to fully understand why this is happening

ViViDboarder · 2024-08-12T21:31:20Z

Same. As far as I can tell, this module isn't doing anything unusual with the way the coordinator works. Unless maybe there is a limit to the number of HTTP requests executed sequentially in an executor job, but that doesn't seem like waiting on startup would do anything to resolve it.

drozycki · 2024-08-14T08:14:54Z

The log is in the linked issue #123562. The update thread iterates over the RouteStop set and throws when it detects that another thread has mutated the set during iteration. I don't understand the codebase well enough to be certain that set population happens during startup, but I didn't see a good way to wait until the component config has finished loading. Maybe there still is a race condition, but the added delay on the update thread is sufficient to avoid it on my system at my scale (10x RouteStop).

joostlek · 2024-08-14T11:11:51Z

I think we should re-evaluate the logic within nextbus instead. The issue you are experiencing could also happen the moment someone reloads the integration. So I'd rather opt for a more robust solution that would remove this issue completely.

@ViViDboarder Why does every agency have its own coordinator as opposed to per entry?

ViViDboarder · 2024-08-15T21:04:17Z

Ah. I see. So they share a coordinator because of the way the previous API was structured where you could fetch multiple predictions in a single call if they shared an agency.

That’s not relevant now.

However, I was recently informed that we can fetch multiple routes predictions if they share a stop (found via a bug report because when we fetch for a route it returned all for the stop).

Refactoring to share a coordinator per Agency-StopID would be the best approach, but will require another change to the client library.

All that said each coordinator should only run once though, so there should be no parallel mutations of the set of stops, unless refresh is being triggered after each stop is added, not after completing the entity initializations.

Should figure out if that is indeed the cause because a refactor to use stop based coordinators could still expose the same issue.

ViViDboarder · 2024-08-15T21:08:56Z

Another possible patch would be to have the inline function that is being run in the executor and iterating over the stops access the stops via a parameter rather than from the coordinator directly using a shallow copy, that way if the set is mutated (a new route is added) while an update is happening, it will not run into any issue.

joostlek · 2024-08-15T21:15:35Z

Yes you are refreshing after every integration addition

joostlek · 2024-08-15T21:18:22Z

But if there isn't any limit, let's just do not do smart things

Fixes a race between the loading thread and update thread leading to an unrecoverable error

drozycki · 2024-08-16T04:12:18Z

I updated this PR to use a local copy of _route_stops and avoid the race condition that way. Manual testing and the pytest suite all pass.

gjohansson-ST · 2024-08-16T11:38:24Z

I updated this PR to use a local copy of _route_stops and avoid the race condition that way. Manual testing and the pytest suite all pass.

Much better to solve the issue. Some questions still outstanding but that can be handled in follow-up so if you're fine with this PR don't forget to click "Ready for review" 👍

IamTheFij · 2024-08-16T18:16:45Z

Were you able to verify this solves the problem? I'm pretty sure it should, but once you've confirmed click "Ready for Review" as @gjohansson-ST said.

homeassistant/components/nextbus/coordinator.py

Changes addressed

home-assistant bot added bugfix cla-signed integration: nextbus small-pr PRs with less than 30 lines. Quality Scale: No score labels Aug 11, 2024

gjohansson-ST added this to the 2024.8.2 milestone Aug 11, 2024

gjohansson-ST previously requested changes Aug 11, 2024

View reviewed changes

home-assistant bot marked this pull request as draft August 11, 2024 12:57

drozycki force-pushed the race branch from 07772ba to e793a20 Compare August 14, 2024 09:00

drozycki marked this pull request as ready for review August 14, 2024 09:04

home-assistant bot requested a review from gjohansson-ST August 14, 2024 09:04

gjohansson-ST marked this pull request as draft August 14, 2024 18:57

drozycki added 3 commits August 15, 2024 20:16

Skip NextBus update if integration is still loading

b4bbef7

Fixes a race between the loading thread and update thread leading to an unrecoverable error

Use async_at_started

8c6796e

Use local copy of _route_stops to avoid NextBus race condition

a854587

drozycki force-pushed the race branch from e793a20 to a854587 Compare August 16, 2024 03:50

frenck removed this from the 2024.8.2 milestone Aug 16, 2024

drozycki marked this pull request as ready for review August 18, 2024 11:06

joostlek reviewed Aug 18, 2024

View reviewed changes

homeassistant/components/nextbus/coordinator.py Outdated Show resolved Hide resolved

Update homeassistant/components/nextbus/coordinator.py

acf3921

joostlek approved these changes Aug 18, 2024

View reviewed changes

joostlek added this to the 2024.8.3 milestone Aug 18, 2024

joostlek merged commit 04b0760 into home-assistant:dev Aug 18, 2024
26 checks passed

drozycki deleted the race branch August 19, 2024 02:13

github-actions bot locked and limited conversation to collaborators Aug 20, 2024

balloob added the cherry-picked label Aug 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip NextBus update if integration is still loading #123564

Skip NextBus update if integration is still loading #123564

drozycki commented Aug 11, 2024 •

edited

Loading

home-assistant bot commented Aug 11, 2024

gjohansson-ST Aug 11, 2024

drozycki Aug 14, 2024

home-assistant bot commented Aug 11, 2024

joostlek commented Aug 11, 2024

ViViDboarder commented Aug 12, 2024

drozycki commented Aug 14, 2024 •

edited

Loading

joostlek commented Aug 14, 2024

ViViDboarder commented Aug 15, 2024

ViViDboarder commented Aug 15, 2024

joostlek commented Aug 15, 2024

joostlek commented Aug 15, 2024

drozycki commented Aug 16, 2024

gjohansson-ST commented Aug 16, 2024

IamTheFij commented Aug 16, 2024

Skip NextBus update if integration is still loading #123564

Skip NextBus update if integration is still loading #123564

Conversation

drozycki commented Aug 11, 2024 • edited Loading

Proposed change

Type of change

Additional information

Checklist

home-assistant bot commented Aug 11, 2024

gjohansson-ST Aug 11, 2024

Choose a reason for hiding this comment

drozycki Aug 14, 2024

Choose a reason for hiding this comment

home-assistant bot commented Aug 11, 2024

joostlek commented Aug 11, 2024

ViViDboarder commented Aug 12, 2024

drozycki commented Aug 14, 2024 • edited Loading

joostlek commented Aug 14, 2024

ViViDboarder commented Aug 15, 2024

ViViDboarder commented Aug 15, 2024

joostlek commented Aug 15, 2024

joostlek commented Aug 15, 2024

drozycki commented Aug 16, 2024

gjohansson-ST commented Aug 16, 2024

IamTheFij commented Aug 16, 2024

drozycki commented Aug 11, 2024 •

edited

Loading

drozycki commented Aug 14, 2024 •

edited

Loading