Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple and quick duplicate check #908

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Simple and quick duplicate check #908

wants to merge 4 commits into from

Conversation

merbanan
Copy link
Owner

Rudimentary tested with:

rtl_433 -q -Y 1 -r rtl_433_tests/tests/honeywell_activlink/01/secret-press_868.428M_250k.cu8

It happily eats the duplicates and I am quite sure that this will filter out 99% of other signal duplicates. If you have several sensors of the same protocol that cycles and produces duplicates then eventually you will get some signals filtered that shouldn't.

There is an option to add selective filtering per protocol also. That way every protocol decoder should be able to output one one signal per sample. Doing it globally might filter some protocols that output different messages one after the other.

@merbanan
Copy link
Owner Author

@zuckschwerdt the code seems to work but it is not tested in the field. Do you know any protocols where this would break functionality?

@zuckschwerdt
Copy link
Collaborator

This is a much requested feature and code looks great. I can't think up where this might break a decoder. I'll run this for a day here.

@zuckschwerdt
Copy link
Collaborator

@merbanan it looks like you accidentally put a rtl_433_tests commit on top of this?

@zuckschwerdt
Copy link
Collaborator

The registered flex decoders have protocol_num==0, would it perhaps be better to use the r_device pointer as hash key? The hash function will continue to work, 101 is choosen well, e.g. (uintptr_t)&cfg.devices[i] % 101 shows a uniform distribution.

@merbanan
Copy link
Owner Author

Gotta add that to git ignore.

@zuckschwerdt
Copy link
Collaborator

I have checked out rtl_433_tests below rtl_433 also. Git seems to ignore directories which are a repo of their own.

@merbanan
Copy link
Owner Author

It did not work for my git version, and I updated the code according to you suggestions.

@zuckschwerdt
Copy link
Collaborator

Perhaps the (int)(uintptr_t) cast should be commented with something like, "intentionally drop the high order bits on LP64".

Playing with this I noticed that this silences all messages from one decoder for a time. We should perhaps also hash some data portion, like "id" -- if we can get at that? Maybe add a helper in data.c to to iterate and hash well known field values?

@merbanan
Copy link
Owner Author

It silences it for the argument value of seconds. We could add a much more advanced dupcheck but that will require much more parsing. Which equals more cpu time. I have been pondering to add a dupcheck flag to specific decoders whose transmitting counter parts emit long signals. Then we could add this filtering by default for those signals. And then we could add use -Y 0 to disable the duplicate filer och -Y >0 to globally filer duplicates. It is like 10 more lines of code and the out of the box behavior would be 1 signal gives 1 message with overrides when needed.

So what do you think is best? Do we want that flag to be available for decoders? It might be miss used. But I think that the user experience will be better if we implement it.

The special cast is there to silence a warning, the reduction of bits is a side effect but I'll add a comment.

@zuckschwerdt
Copy link
Collaborator

People might have multiple sensors that only differ in the ID, and there are sensors that send multiple packets of different type (wind, rain, ...) and not even differ in the id but other payload.

We could implement a data_hash function to traverse the data, pick up all values and hash that.
It could be built similar to a data output.

Or we could retain the last send message per decoder and do a tree compare.

Another idea (like your flag I guess), have the decoders which are known for long repeats add a "hash" themselves and store that. It will likely just be the whole 24 bits of message :) With no hash always pass through, otherwise if the hash matches throttle.
That would require adding that to all decoders but also put the "hash" where it is most efficient and perhaps even allow that value to be passed downstream for collation there.

@merbanan
Copy link
Owner Author

The thing is I don't want to do a data_make() to create a structure that we later need to parse just so we can be able to hash it properly. We need to add the thing that is to be hashed in the proper place so that it is fast to get later. So no tree compare just 32bits with as high entropy as possible.

If we add a member to data_t we can populate it after data_make() in the relevant decoders. Then later it is just a de-reference and possible hash to know if it is a duplicate. I think that addresses all your points. If so I will implement that.

@gabrielklein
Copy link

Sorry to ask, what is the status of this pull request?

@zuckschwerdt
Copy link
Collaborator

It's still recommended to do duplicate (and plausibility) filtering in your consumer. The idea here needs more thought and possibly a change to all existing decoders.

rtl_433's tasks is to decode every possible radio packet and give it to you. Coalescing and aggregation is an interesting topic but likely dependant on the specific use-case.

If you have some ideas and don't want to touch the C code, perhaps prototype a Python filter script (examples) for discussion.

@gabrielklein
Copy link

I understand your point and agree that "multi-firing" is often part of the protocol.

I saw a few different cases in my items.

  • Multi-firing is used as a way to do some redundancy. As an example I have a temperature module that send a temperature 5-6 times every minute. => We should send it only once.
  • Multi-firing is used as a way to have a "still pressing" approach. => We should send the "start" and the "end" in some case, just the start in other cases.

It's often just a repetition of the same message.

@zuckschwerdt
Copy link
Collaborator

If the data packet is repeated in the same transmission we already only output a single event. When there is too much time between repeats we can't catch that in a single decoder call though. We'd need to suppress subsequent events. But that asks for which decoders and what message contents do we consider something a "repeat". That's not easy to capture and decide.

There are ideas like maybe some opt-in from decoders: a decoder could add a "hash" key if dups are expected and the hash should then associate those. But that needs some prototyping and testing.

@gabrielklein
Copy link

I totally agree - I think we cannot avoid some manual configuration at some point :( This project is trying to build a "babel tower" (and is quite successful) - but I already see that some of my sensors not well decoded and I have to blacklist some decoders :(

When I see the signal of some of my devices, it's just an id basically. Everything could be decoded as an id.

@zuckschwerdt
Copy link
Collaborator

some of my sensors not well decoded and I have to blacklist some decoders :(

We already filter out the worst offenders (prone to false positives). That's why we really don't recommend to use -G. If you see systematic false positives open an issue with samples.

When I see the signal of some of my devices, it's just an id basically. Everything could be decoded as an id.

I don't follow. Do you mean beacons, motion, doorbell-style senders?

@gabrielklein
Copy link

Yes, I have a doorbell and a few motion sensors :)

  1. Lot of problems with : [86] Wireless Smoke and Heat Detector GS 558 and some of these just have an id.
  2. I can get an id with urh (that is similar to the id detected by rtl_433 in a -vvvv mode).

But I think the discussion is out of the scope of this ticket now. Sorry for that!

@merbanan
Copy link
Owner Author

The idea is to add dupcheck for specific decoders that has known repeats larger then the signal length. And then maybe also globally for all signals. One idea I had is that maybe the dupcheck only add a duplicate tag. Then it is non distructive and we help the consumer in its filtering process.

Anyway dupcheck would be a nice option for people who would like to use it.

@unnilennium
Copy link

there is already an option to translate value to SI, this is helpful if you have some sensors output in F and some in C.

I would be more than happy to have this filter added as an option, you can enable if you want the same way you can do for converting output to SI.

I have few sensors with this issue :

acurite_txr_decode: row 0 bits 57, bytes 8 
acurite_txr_decode: Parity: 1000000
{"time" : "2019-09-03 22:29:20", "model" : "Acurite tower sensor", "id" : 11303, "sensor_id" : 11303, "channel" : "C", "temperature_C" : 17.000, "humidity" : 55, "battery_low" : 0}
acurite_txr_decode: row 1 bits 57, bytes 8 
acurite_txr_decode: Parity: 1000000
{"time" : "2019-09-03 22:29:20", "model" : "Acurite tower sensor", "id" : 11303, "sensor_id" : 11303, "channel" : "C", "temperature_C" : 17.000, "humidity" : 55, "battery_low" : 0}
acurite_txr_decode: row 2 bits 57, bytes 8 
acurite_txr_decode: Parity: 1000000
{"time" : "2019-09-03 22:29:20", "model" : "Acurite tower sensor", "id" : 11303, "sensor_id" : 11303, "channel" : "C", "temperature_C" : 17.000, "humidity" : 55, "battery_low" : 0}
acurite_986_decode: row 0 bits 2, bytes 5 
acurite_986_decode: row 1 bits 0, bytes 5 
acurite_986_decode: row 2 bits 0, bytes 5 
acurite_986_decode: row 3 bits 0, bytes 5 
acurite_986_decode: row 4 bits 42, bytes 5 
acurite_986_decode: reversed: {40} 06 60 f7 00 15 : 00000110 01100000 11110111 00000000 00010101 
acurite_986_decode: sensor 0x60f7 - 1R: 6 F
{"time" : "2019-09-03 22:30:38", "model" : "Acurite 986 Sensor", "id" : 24823, "channel" : "1R", "temperature_C" : -14.444, "battery" : "OK", "status" : 0}
acurite_986_decode: row 5 bits 0, bytes 5 
acurite_986_decode: row 6 bits 0, bytes 5 
acurite_986_decode: row 7 bits 0, bytes 5 
acurite_986_decode: row 8 bits 39, bytes 5 
acurite_986_decode: reversed: {40} 06 60 f7 00 15 : 00000110 01100000 11110111 00000000 00010101 
acurite_986_decode: sensor 0x60f7 - 1R: 6 F
{"time" : "2019-09-03 22:30:38", "model" : "Acurite 986 Sensor", "id" : 24823, "channel" : "1R", "temperature_C" : -14.444, "battery" : "OK", "status" : 0}

@bendur
Copy link

bendur commented Nov 30, 2023

Just chiming in that I'd love to have a flag that filters out duplicates. I guess I could do it on the client side but I'd rather not have to do that.

@Mart124
Copy link

Mart124 commented Jan 15, 2024

Just chiming in that I'd love to have a flag that filters out duplicates. I guess I could do it on the client side but I'd rather not have to do that.

+1 😉 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wip Work In Progress (for PRs only)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants