Appropriate retry mechanism when Zaptec cloud is not working #90

sveinse · 2024-01-14T10:28:00Z

Discussion: What is the appropriate action when Zaptec cloud is not responding or returns error?

Currently the Zaptec integration will log them to the HA system log. This can easily be interpreted that its the zaptec integration which is at fault. Typical errors that are observed are timeouts and HTTP error code 500 (internal server error). When this happens, quite many errors and warnings are logged.

The Home Assistant integration Quality scale states that a integration should (for silver):

Handles internet unavailable. Log a warning once when unavailable, log once when reconnected.
Handles device/service unavailable. Log a warning once when unavailable, log once when reconnected.

So what should we do? Do we need to implement some way to determine Zaptec cloud's "healthyness" in order for our integration to reduce the error logging?

svenakela · 2024-01-21T11:22:44Z

Could be solved by setting an attribute on the binary_sensor when an API call fails. Then log the problem. When the API is working again the time stamp can be cleared and a reconnected is logged.
That would make it easy to visualize if the API is down as well.

I've had some issue last two nights with charging stopped several times and I suspect APIs. Could have been detected with an error time stamp.

Bluhme1 · 2024-01-31T06:41:37Z

Agree. Zaptec API has been unstable during the last few days. Error 500.
Has given a lot of errors in the log

sveinse · 2024-01-31T12:27:19Z

Interestingly, there already exists entities for this: The "* Charger", "* Circuit" and "* Installation" entities. This entity exists to provide a place to store the legacy attributes and only reads "Connected". But it would be a very useful extension to this that they indeed show the status of the API connection. In hindsight it kinda seems obvious to have this feature.

Definitely something to put on the list! Thanks.

Bluhme1 · 2024-02-23T16:16:22Z

Still getting Time Outs like:

Denne fejl stammer fra en brugerdefineret integration.

Logger: custom_components.zaptec.api
Source: custom_components/zaptec/api.py:855
Integration: Zaptec EV charger (documentation)
First occurred: 16.52.35 (1 occurrences)
Last logged: 16.52.35

Request to https://api.zaptec.com/api/installation/bf68a41d-d2f9-44f5-bfc6-20bf42e645a8 failed (attempt 1): TimeoutError:

Perhaps the Zaptec integration should be more patient before reporting an error. After a short while the connection is established

Bluhme1 · 2024-05-05T17:00:46Z

Hi @sveinse

Did you get any further in your considerations to solve this issue. A pity that a very useful integration produces so many errors

JTT22 · 2024-07-29T15:19:41Z

How do we get someone working on this my logs are unusable...

Hellowlol · 2024-07-29T16:33:17Z

Do you have api debug on? Anyway, it think the proper way way here so is to retry using a librays

JTT22 · 2024-07-30T09:35:28Z

I get mostly just timeout errors or the below thats flooding:

custom_components.zaptec.api.RequestError: GET request to https://api.zaptec.com/api/circuits/KEYREMOVED failed with status 500: <ClientResponse(https://api.zaptec.com/api/circuits/KEYREMOVED) [500 Internal Server Error]>
<CIMultiDictProxy('Date': 'Tue, 30 Jul 2024 08:32:55 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'Request-Context': ‘appId=cid-KEYREMOVED', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains')>

sveinse · 2024-07-30T09:56:57Z

I'm sorry, I haven't had time to look at this.

Currently the design logs every time an API request to Zaptec cloud fails, so when the connection is unstable it will result in many log entries. I agree that this should be changed. The HA quality scale requires for Silver that "Handles device/service unavailable. Log a warning once when unavailable, log once when reconnected."

sveinse · 2024-08-07T10:58:57Z

@Bluhme1, @JTT22 Thank you for your patience. I've merged a fix into master that should reduce the logging to a minimum. Can you please test the "master" version and see if this fixed for you, please?

Hellowlol · 2024-08-07T21:32:49Z

The best way to deal with this is to add retry on 500 errors so you sjampoen the exception and just retry later. Can someone post the entire blueprint?

sveinse · 2024-08-08T08:21:46Z

@Hellowlol In my experience, when Zaptec cloud starts giving 500, it will do so for quite a while. Nevertheless, I created #111 which will retry up to 5 times before giving up. What do you think? If this makes sense I can merge it to master and we can have it tested by the users that are most prone to this problem.

The main objective of this issue was to control the extensive logging that the API was producing when Zaptec cloud fails. We can't really do anything about availability of the Zaptec cloud. With the fix in #110, the API will no longer log itself, but rather leave it to the HA UpdateCoordinator to handle available/not-available messages.

svenakela · 2024-08-08T08:47:41Z

I nice thing to do on retries is an exponential delay until next retry. Otherwise all clients will spam the API when it least need a flood of requests. It can be done with the backoff decorator for example, or by code.

sveinse · 2024-08-08T08:54:14Z

Yes, that is a good idea. However, there are limits to how long time these backoffs may be. A full poll from the HA zaptec integration to the Zaptec cloud requires quite many requests. By default this happens every 60 seconds, and any backoffs that enters delays into minutes, will probably interfere with HAs update interval. I'm not sure how the HA UpdateCoordinator responds to that.

sveinse · 2024-10-18T23:25:05Z

The PR has now been updated with an exponential backoff mechanism.

* Retry on API error code 500 (#90) * Implement expenential backoff on requests * Fix typing bug in set_authentication_required() * Clean up code

sveinse · 2024-10-22T18:50:43Z

A fix for the issue have been pushed to master. Please install the "master" version and test if this fixes the observed problems. @svenakela @Bluhme1 @JTT22

sveinse · 2024-10-22T19:15:12Z

It seems HACS have removed the option of downloading "master", so I've created a beta/pre-release "0.7.2b1" available for download and test.

sveinse added the enhancement New feature or request label Jan 31, 2024

c0mplex1 mentioned this issue Aug 6, 2024

Charger, Circuit or Installation (500 internal server error) #109

Closed

sveinse added a commit to sveinse/zaptec that referenced this issue Aug 7, 2024

Fix API logging (custom-components#90)

b917506

sveinse mentioned this issue Aug 7, 2024

Fix API logging (#90) #110

Merged

sveinse added a commit that referenced this issue Aug 7, 2024

Fix API logging (#90) (#110)

288ccb4

sveinse added a commit to sveinse/zaptec that referenced this issue Aug 8, 2024

Retry on API error code 500 (custom-components#90)

827e94b

sveinse added a commit to sveinse/zaptec that referenced this issue Aug 8, 2024

Retry on API error code 500 (custom-components#90)

0d1969c

sveinse mentioned this issue Aug 8, 2024

Retry on API error code 500 (#90) #111

Merged

sveinse added a commit to sveinse/zaptec that referenced this issue Oct 18, 2024

Retry on API error code 500 (custom-components#90)

f31ba32

sveinse changed the title ~~Appropriate action when Zaptec cloud is not working~~ Appropriate retry mechanism when Zaptec cloud is not working Oct 20, 2024

sveinse mentioned this issue Oct 20, 2024

Release 0.7.2 #130

Merged

13 tasks

sveinse added a commit that referenced this issue Oct 20, 2024

Retry on API error code 500 (#111)

8ca7d6e

* Retry on API error code 500 (#90) * Implement expenential backoff on requests * Fix typing bug in set_authentication_required() * Clean up code

sveinse closed this as completed in #111 Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appropriate retry mechanism when Zaptec cloud is not working #90

Appropriate retry mechanism when Zaptec cloud is not working #90

sveinse commented Jan 14, 2024

svenakela commented Jan 21, 2024

Bluhme1 commented Jan 31, 2024

sveinse commented Jan 31, 2024 •

edited

Loading

Bluhme1 commented Feb 23, 2024

Bluhme1 commented May 5, 2024

JTT22 commented Jul 29, 2024

Hellowlol commented Jul 29, 2024

JTT22 commented Jul 30, 2024

sveinse commented Jul 30, 2024

sveinse commented Aug 7, 2024

Hellowlol commented Aug 7, 2024

sveinse commented Aug 8, 2024

svenakela commented Aug 8, 2024

sveinse commented Aug 8, 2024

sveinse commented Oct 18, 2024

sveinse commented Oct 22, 2024

sveinse commented Oct 22, 2024

Appropriate retry mechanism when Zaptec cloud is not working #90

Appropriate retry mechanism when Zaptec cloud is not working #90

Comments

sveinse commented Jan 14, 2024

svenakela commented Jan 21, 2024

Bluhme1 commented Jan 31, 2024

sveinse commented Jan 31, 2024 • edited Loading

Bluhme1 commented Feb 23, 2024

Bluhme1 commented May 5, 2024

JTT22 commented Jul 29, 2024

Hellowlol commented Jul 29, 2024

JTT22 commented Jul 30, 2024

sveinse commented Jul 30, 2024

sveinse commented Aug 7, 2024

Hellowlol commented Aug 7, 2024

sveinse commented Aug 8, 2024

svenakela commented Aug 8, 2024

sveinse commented Aug 8, 2024

sveinse commented Oct 18, 2024

sveinse commented Oct 22, 2024

sveinse commented Oct 22, 2024

sveinse commented Jan 31, 2024 •

edited

Loading