Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The sensor stops updating after some time and requires a restart of the server #12

Closed
b-per opened this issue Dec 5, 2019 · 35 comments
Closed

Comments

@b-per
Copy link

b-per commented Dec 5, 2019

Hi,

I have been using this custom component for the past week as the official one stopped reporting data after a couple of hours after each restart.

The custom component works really well for me but usually, after running HA for 2-3 days the sensor stops updating as well. I am running HA in a docker container on my Raspberry Pi 3.

If I connect to my docker container and run hcitool lescan --duplicates and a hcidump --raw I can still see some data in my terminal.

For now, my only temporary solution is to restart my docker container to get the sensor to update itself again. Is there anything else I could do, or any command I could run to make the sensor work again without restarting?

Ben

@b-per b-per changed the title The sensor stops updating after some time and require a restart of the server The sensor stops updating after some time and requires a restart of the server Dec 5, 2019
@Magalex2x14
Copy link
Collaborator

Hello! And the log is nothing suspicious, right?

  1. OK. Firstly let's try to install latest beta - you can do this by enable "Show beta" option on the custom component page in HACS (there are quite a few changes that will be in the release soon).
    69920862-fcaf7680-149d-11ea-9f93-82576528ff45
    69920869-0507b180-149e-11ea-8e1b-b3b042fabc3b
    Let's see if that helps. Among other changes, the RSSI display for each sensor (attribute) was added to the beta, which helps to determine the presence of reception problems. According to my statistics, if RSSI is less than -95dBm, then you should expect problems with updating the sensor readings. But this, it seems to me, is not your case (since restarting helps).

  2. It is worth checking whether another component using Bluetooth is turned on (for example, BT presence tracker). I do not exclude the possibility of conflict (there were messages hinting at this in the forum). It is worth trying to disable such a component, and check whether this solves the problem.

I will think further, and write if additional ideas appear...

@Magalex2x14
Copy link
Collaborator

Magalex2x14 commented Dec 6, 2019

Personally, I sometimes have a problem with the BT-interface, not related to the operation of the component and HA (very rarely. Moreover, it usually occurs when nobody is in the house 🤷‍♂️, syslog then begins to fill up with BT-errors). My workaround is described here. I think it's worth checking if juggling the BT interface helps, rather than restarting the container. But I can’t say exactly how to do it correctly in the case of the container, I have no experience with Docker. Perhaps we faced a problem in this area...

@b-per
Copy link
Author

b-per commented Dec 8, 2019

Thanks for the answer.

I have installed the Beta and the latest reading gives me a rssi of -74.

To answer your first question, yes, I am using the bluetooth also for other components, more specifically for miflora (that uses the same code base as the official mitemp_bt component) and Switchbot. But what is interesting is that when my mitemp_bt sensor based on the custom component starts not updating the other bluetooth components continue to work.

I looked at your post. Docker on RPI is slightly different and I can restart the bluetooth through the command sudo systemctl restart bluetooth. But even running this command doesn't make the component work again once it starts failing.

I also followed this comment when I was trying to fix the official component and made my bluetooth auto restart every couple of hours. I might try to deactivate this behaviour to see if it fixes the problem.

Would activating debug mode for the custom component provide me more info on why it is failing after some time?

@Magalex2x14
Copy link
Collaborator

Would activating debug mode for the custom component provide me more info on why it is failing after some time?

I don’t think so... Most likely, we will see the normal operation of the component. But you can try. I think the problem is that hcidump for some reason stops receiving BT LE advertisements, and starting hcitool after the problem occurs leads to nothing. That is, the component receives an "empty" dump, where there is nothing to process... At this moment, the component does not receive any information about the result of the execution of hcitool and hcidump it calls (except for the dump itself). I’ll think about how to do it right.

@Magalex2x14
Copy link
Collaborator

@camfrout please try latest 0.4.0-beta.2 (just released) - there are some changes that may affect. And turn on debugging for the component - let's see at what stage the data are missing (does the component really get nothing, or are entities not updated for some reason).

@b-per
Copy link
Author

b-per commented Dec 8, 2019

Sure. I just installed 0.4.0-beta.2 and kept the debug logs activated.
I'll wait for the sensor to stop updating and will let you know.

@b-per
Copy link
Author

b-per commented Dec 10, 2019

Hi.
I added mitemp_bt: debug to my config file but didn't see any debug log related to mitemp_bt in Home Assistant. The component failed after just a few hours last time and the error that I saw in the logs was the following:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/mitemp_bt/sensor.py", line 573, in update_ble
    discover_ble_devices(config)
  File "/config/custom_components/mitemp_bt/sensor.py", line 302, in discover_ble_devices
    data = parse_raw_message(msg)
  File "/config/custom_components/mitemp_bt/sensor.py", line 185, in parse_raw_message
    xvalue_length = int(data[xdata_point+4:xdata_point+6], 16)
ValueError: invalid literal for int() with base 16: '

@Magalex2x14
Copy link
Collaborator

Hmm... Very interesting ...
The first oddity is that before the line on which the error occurred, there should have been a lot of debug-messages (a lot! after a few hours).
The second oddity is that there cannot be an apostrophe in the dump (especially since each message from the dump is checked for validity for several reasons before that)...
Need to think. Thanks!

P.S. please verify if you enabled debugging like this, for example:

logger:
  default: warn
  logs:
    custom_components.mitemp_bt: debug

@b-per
Copy link
Author

b-per commented Dec 10, 2019

Thanks!
I didn't know that it was custom_components.mitemp_bt: debug that needed to be added instead ofmitemp_bt: debug. My logs are now flooded with debug data :-) I will let you know if I see some useful information there. On a side note, it seems that the component also manages to read the data from my multiple mifora plant sensors and correctly ignore those from being processed (but it means that your component could be spined off to also read and store the data from the plant sensors).

@b-per
Copy link
Author

b-per commented Dec 10, 2019

It seems to fail more than before now. Just a few minutes after restarting HA it has stopped updating. Here is what I have in the log:

2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] NEW DATA: {'rssi': -79, 'mac': '4C65A8D8793E', 'type': 'LYWSDCGQ', 'packet': 246, 'temperature': 21.5, 'humidity': 59.4}
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] NEW DATA: {'rssi': -89, 'mac': 'C47C8D62E9B2', 'type': 'HHCCJCY01', 'packet': 96, 'temperature': 19.7}
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] DUPLICATE: {'rssi': -75, 'mac': 'C47C8D626CED', 'type': 'HHCCJCY01', 'packet': 179, 'illuminance': 9}, IGNORING!
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] DUPLICATE: {'rssi': -73, 'mac': 'C47C8D626CED', 'type': 'HHCCJCY01', 'packet': 179, 'illuminance': 9}, IGNORING!
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] NEW DATA: {'rssi': -80, 'mac': '4C65A8D8793E', 'type': 'LYWSDCGQ', 'packet': 247, 'temperature': 21.5, 'humidity': 59.3}
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] DUPLICATE: {'rssi': -59, 'mac': 'C47C8D6280CA', 'type': 'HHCCJCY01', 'packet': 148, 'illuminance': 172}, IGNORING!
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] DUPLICATE: {'rssi': -77, 'mac': 'C47C8D62E9B2', 'type': 'HHCCJCY01', 'packet': 96, 'temperature': 19.7}, IGNORING!
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] DUPLICATE: {'rssi': -73, 'mac': 'C47C8D626CED', 'type': 'HHCCJCY01', 'packet': 179, 'illuminance': 9}, IGNORING!
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] DUPLICATE: {'rssi': -59, 'mac': 'C47C8D6280CA', 'type': 'HHCCJCY01', 'packet': 148, 'illuminance': 172}, IGNORING!
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] DUPLICATE: {'rssi': -62, 'mac': 'C47C8D6280CA', 'type': 'HHCCJCY01', 'packet': 148, 'illuminance': 172}, IGNORING!
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] NEW DATA: {'rssi': -72, 'mac': 'C47C8D626CED', 'type': 'HHCCJCY01', 'packet': 180, 'moisture': 9}
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] DUPLICATE: {'rssi': -73, 'mac': 'C47C8D62E9B2', 'type': 'HHCCJCY01', 'packet': 96, 'temperature': 19.7}, IGNORING!
2019-12-10 21:19:15 DEBUG (SyncWorker_16) [custom_components.mitemp_bt.sensor] DUPLICATE: {'rssi': -73, 'mac': 'C47C8D626CED', 'type': 'HHCCJCY01', 'packet': 180, 'moisture': 9}, IGNORING!
2019-12-10 21:19:15 ERROR (MainThread) [homeassistant.core] Error doing job: Future exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/mitemp_bt/sensor.py", line 573, in update_ble
    discover_ble_devices(config)
  File "/config/custom_components/mitemp_bt/sensor.py", line 302, in discover_ble_devices
    data = parse_raw_message(msg)
  File "/config/custom_components/mitemp_bt/sensor.py", line 185, in parse_raw_message
    xvalue_length = int(data[xdata_point+4:xdata_point+6], 16)
ValueError: invalid literal for int() with base 16: ''

Could it be that the code tries to parse the answer from another sensor rather than the temperature one and then gets an error because the data format is different?

@Magalex2x14
Copy link
Collaborator

Yes, 0.4.0 beta supports all (known) BLE broadcasting Xiaomi sensors.
Duplicate data from MiFlora is ignored (for some reason it constantly spams with identical packets).
Let's do this: start a dump for a few minutes, as I described in #7 , and send it to me - I'll take a look at it.

@b-per
Copy link
Author

b-per commented Dec 10, 2019

Here are the logs after running for about 1 min: https://www.dropbox.com/s/6ub9ueabv1vs9d5/dump.txt?dl=0

@Magalex2x14
Copy link
Collaborator

Received, thank you. Today later I’ll take a look and will let you know.

@Magalex2x14
Copy link
Collaborator

@camfrout @Ernst79
OK, guys... Perhaps I found the reason for our troubles... Used by component SpooledTemporaryFile contains a bug, because of which there is a chance of damage the dump... By the solution I see a rollback to a simple TemporaryFile, but this can increase the number of file system write operations (which may be important for devices like raspberryPI with SD cards. This can be bypassed by mounting /tmp as tmpfs, i.e. /tmp will be located in memory on OS level, which is useful in any case, as it seems to me).
Unfortunately, I have not yet found another good way to redirect dump output, since the PIPE buffer is limited in size, and even my minute dump doesn’t fit there - that is, you need to receive the dump in batches, and this is multi-threading and the probability of blocking code execution, and my recent experiments with multithreading were unsuccessful (my knowledge is still scarce).
Summarizing:
I will soon push a new beta with a rollback to the TemporaryFile, and we will see if this helps.

Magalex2x14 added a commit that referenced this issue Dec 10, 2019
instead of bugged(#12) SpooledTemporaryFile
@Magalex2x14
Copy link
Collaborator

Done. Please try v0.4.0-beta.3

@Ernst79
Copy link
Collaborator

Ernst79 commented Dec 10, 2019

Thanks, updated. Will let you know if it keeps working.

@Ernst79
Copy link
Collaborator

Ernst79 commented Dec 10, 2019

@Magalex2x14 Unfortunately, same error again with newest beta 3.

Error doing job: Future exception was never retrieved
Traceback (most recent call last):
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/homeassistant/.homeassistant/custom_components/mitemp_bt/sensor.py", line 575, in update_ble
    discover_ble_devices(config)
  File "/home/homeassistant/.homeassistant/custom_components/mitemp_bt/sensor.py", line 304, in discover_ble_devices
    data = parse_raw_message(msg)
  File "/home/homeassistant/.homeassistant/custom_components/mitemp_bt/sensor.py", line 185, in parse_raw_message
    xvalue_length = int(data[xdata_point+4:xdata_point+6], 16)
ValueError: invalid literal for int() with base 16: ''

Seems that it is having an empty value, where it expects an integer.

@Magalex2x14
Copy link
Collaborator

It expects hexadecimal number (base 16). I will dig further.

@Magalex2x14
Copy link
Collaborator

I will prepare a code that will catch this exception and save the contents of the dump to a file so you can send it to me ...

@Magalex2x14
Copy link
Collaborator

OK. Can you, please, replace
line 185 xvalue_length = int(data[xdata_point+4:xdata_point+6], 16)
with

        try:
            xvalue_length = int(data[xdata_point+4:xdata_point+6], 16)
        except ValueError as error:
            _LOGGER.critical("ValueError exception: %s", error)
            _LOGGER.critical("DATA: '%s'", data)
            _LOGGER.critical("HEX? xvalue_length: '%s'", data[xdata_point+4:xdata_point+6])
            _LOGGER.critical("xvalue_typecode: '%s'", xvalue_typecode)
            _LOGGER.critical("xdata_length: '%s'", xdata_length)
            _LOGGER.critical("xdata_point: '%s'", xdata_point)
            _LOGGER.critical("result: '%s'", result)
            break

Wait for the error occurs. Then you can reinstall component using HACS to revert changes.

@Ernst79
Copy link
Collaborator

Ernst79 commented Dec 10, 2019

Running it now, will keep you posted. Are you reverting the last change, or do you keep the TemporaryFile ?

@Magalex2x14
Copy link
Collaborator

After reading, I’m not sure now whether to return) The last posts in the topic about a bug for about a week... In addition, I’m not sure now that SpooledTemporaryFile really works with memory in our case:

This function operates exactly as TemporaryFile() does, except that data is spooled in memory until the file size exceeds max_size, or until the file’s fileno() method is called, at which point the contents are written to disk and operation proceeds as with TemporaryFile().

The fileno() method seems to be called in subprocess.popen (which runs our hcidump).

@doul59

This comment has been minimized.

@Ernst79
Copy link
Collaborator

Ernst79 commented Dec 10, 2019

What does your esphome config has to do with this component? Which component is not working? Esphome or mitemp_bt?

@b-per
Copy link
Author

b-per commented Dec 11, 2019

I have also installed the latest beta and changed the code as you asked. It has now been running for a few hours and is still getting updated without any problem but I could find this in the log anyway:

2019-12-11 15:20:16 CRITICAL (SyncWorker_0) [custom_components.mitemp_bt.sensor] ValueError exception: invalid literal for int() with base 16: ''
2019-12-11 15:20:16 CRITICAL (SyncWorker_0) [custom_components.mitemp_bt.sensor] DATA: '043E2802010000CA80628D7CC41C020106030295FE141695FE71209800FECA80628D7CC40D09100100C604'
2019-12-11 15:20:16 CRITICAL (SyncWorker_0) [custom_components.mitemp_bt.sensor] HEX? xvalue_length: ''
2019-12-11 15:20:16 CRITICAL (SyncWorker_0) [custom_components.mitemp_bt.sensor] xvalue_typecode: 'C6'
2019-12-11 15:20:16 CRITICAL (SyncWorker_0) [custom_components.mitemp_bt.sensor] xdata_length: '8.0'
2019-12-11 15:20:16 CRITICAL (SyncWorker_0) [custom_components.mitemp_bt.sensor] xdata_point: '82'
2019-12-11 15:20:16 CRITICAL (SyncWorker_0) [custom_components.mitemp_bt.sensor] result: '{'rssi': 4, 'mac': 'C47C8D6280CA', 'type': 'HHCCJCY01', 'packet': 254}'

@Ernst79
Copy link
Collaborator

Ernst79 commented Dec 11, 2019

Mine has also given the error twice last night, but continues to work.

2019-12-11 04:14:19 CRITICAL (SyncWorker_15) [custom_components.mitemp_bt.sensor] ValueError exception: invalid literal for int() with base 16: ''
2019-12-11 04:14:19 CRITICAL (SyncWorker_15) [custom_components.mitemp_bt.sensor] DATA: '043E22020100009BB8DDA8654C16020106121695FE5020AA01759BB8DDA8654C0A04003E24'
2019-12-11 04:14:19 CRITICAL (SyncWorker_15) [custom_components.mitemp_bt.sensor] HEX? xvalue_length: ''
2019-12-11 04:14:19 CRITICAL (SyncWorker_15) [custom_components.mitemp_bt.sensor] xvalue_typecode: '3E'
2019-12-11 04:14:19 CRITICAL (SyncWorker_15) [custom_components.mitemp_bt.sensor] xdata_length: '7.0'
2019-12-11 04:14:19 CRITICAL (SyncWorker_15) [custom_components.mitemp_bt.sensor] xdata_point: '70'
2019-12-11 04:14:19 CRITICAL (SyncWorker_15) [custom_components.mitemp_bt.sensor] result: '{'rssi': 36, 'mac': '4C65A8DDB89B', 'type': 'LYWSDCGQ', 'packet': 117}'

@Magalex2x14
Copy link
Collaborator

Thanks, guys. This is just what I need.

@Ernst79
Copy link
Collaborator

Ernst79 commented Dec 11, 2019

@Magalex2x14 One thing I notice is that with the last change (try..., except...), the code at least continues to operate, due to the break. I reverted back to the beta3, and now it fails and stops right after the error occurs. So, even if you can't figure out what happens, we can always include some error handling and continue I guess.

@Magalex2x14
Copy link
Collaborator

Yes I understand. Also considered this option. But I decided to first look at the "broken" packets. And they are strange, they do not fit in a well-known format. Today I’ll decide what to do with it.

@Magalex2x14
Copy link
Collaborator

0.4.0-beta.4 is published (improved malformed or unknown messages detection). I decided to leave TemporaryFile (not spooled), as it is looks more reliable, and it will become unnecessary when we solve #9 and #14.

@Ernst79
Copy link
Collaborator

Ernst79 commented Dec 12, 2019

It's running for 24 hours now, without any errors or stopping sensor. Seems to be solved for me.

@b-per
Copy link
Author

b-per commented Dec 15, 2019

Hi!

I was running on the beta 3 with the change of code recommended here: #12 (comment) and it worked for several days without any issue.

I now moved to the beta 4 and got the following error after about 2 hours. The component then stops updating and I need to restart HA.

2019-12-15 14:31:06 ERROR (MainThread) [homeassistant.core] Error doing job: Future exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/mitemp_bt/sensor.py", line 590, in update_ble
    discover_ble_devices(config)
  File "/config/custom_components/mitemp_bt/sensor.py", line 534, in discover_ble_devices
    setattr(sensors[2], "_state", condstate_mean)
IndexError: list index out of range

@Magalex2x14
Copy link
Collaborator

@camfrout here is beta.5... Please, try it, and let me know if there are messages from custom_component.mitemp_bt in the log. I added the processing of your exception with the corresponding message - the component will not die.
Thank you for feedback!

@Ernst79
Copy link
Collaborator

Ernst79 commented Dec 22, 2019

@camfrout version 0.4.0 (final) has been released. Can you confirm it is solved on your end and close the issue?

@b-per
Copy link
Author

b-per commented Dec 23, 2019

Hey. I've been running version 0.4.0 for more than 12 hours without any issue so far so I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants