Sound Markers #13

ankushdineshrana · 2021-01-20T12:40:14Z

ankushdineshrana
Jan 20, 2021

Hi Georgi,

First of all kudos to you for bringing such an amazing work. I myself tried various publicly available open source offerings on Data-Over-Sound. But its only ggwave that works seamlessly.

Now comes the point of discussion:

Georgi, I read that you are using sound markers to record/analyze the sound captured between them. Can you put some more light on it as in how it actually works in your code and can we customize it as per our needs?

Thanks a lot.

Regards,
Ankush

Answered by ggerganov

Jan 20, 2021

Hi Ankush,

First - thank you very much for the feedback! Such type of information is very useful to me since I am only capable of testing ggwave on a limited amount of devices and environments. Knowing if it works or not for other people will help to improve the robustness of the protocol.

The sound markers currently used in ggwave are the main thing I would like to potentially change in the future. The main reason is that I am not happy with how they currently sound for audible protocols. Still, I think they are quite robust and for now they seem to work pretty OK.

The main purpose of the sound markers is to provide a cheap (in terms of CPU) way to determine if there is an incoming audio…

View full answer

ggerganov · 2021-01-20T20:43:16Z

ggerganov
Jan 20, 2021
Maintainer

Hi Ankush,

First - thank you very much for the feedback! Such type of information is very useful to me since I am only capable of testing ggwave on a limited amount of devices and environments. Knowing if it works or not for other people will help to improve the robustness of the protocol.

The sound markers currently used in ggwave are the main thing I would like to potentially change in the future. The main reason is that I am not happy with how they currently sound for audible protocols. Still, I think they are quite robust and for now they seem to work pretty OK.

The main purpose of the sound markers is to provide a cheap (in terms of CPU) way to determine if there is an incoming audio message. As audio samples are being continuously captured, ggwave looks for begin and end markers and records everything in-between so it can be analyzed after the recording is completed. The begin and end sound markers are slightly different.

The begin marker uses the first 32 frequency bins: F0 + 00*df to F0 + 31*df. We produce a waveform which consists of 16 frequencies:

F0 + 00*df
F0 + 03*df
F0 + 04*df
F0 + 07*df
F0 + 08*df
F0 + 11*df
F0 + 12*df
F0 + 15*df
F0 + 16*df
F0 + 19*df
F0 + 20*df
F0 + 23*df
F0 + 24*df
F0 + 27*df
F0 + 28*df
F0 + 31*df

ggwave/src/ggwave.cpp

Lines 435 to 444 in 2811934

    
           if (frameId < m_nMarkerFrames) { 
        
               nFreq = m_nBitsInMarker; 
        
               for (int i = 0; i < m_nBitsInMarker; ++i) { 
        
                   if (i%2 == 0) { 
        
                       ::addAmplitudeSmooth(bit1Amplitude[i], m_outputBlock, m_sendVolume, 0, samplesPerFrameOut, frameId, m_nMarkerFrames); 
        
                   } else { 
        
                       ::addAmplitudeSmooth(bit0Amplitude[i], m_outputBlock, m_sendVolume, 0, samplesPerFrameOut, frameId, m_nMarkerFrames); 
        
                   } 
        
               }

The end marker consists again of 16 frequencies - the ones that are missing in the begin marker:

ggwave/src/ggwave.cpp

Lines 492 to 499 in 2811934

    
           int fId = frameId - ((m_nMarkerFrames + m_nPostMarkerFrames) + ((m_sendDataLength + m_nECCBytesPerTx)/m_txProtocol.bytesPerTx + 2)*m_txProtocol.framesPerTx); 
        
           for (int i = 0; i < m_nBitsInMarker; ++i) { 
        
               if (i%2 == 0) { 
        
                   addAmplitudeSmooth(bit0Amplitude[i], m_outputBlock, m_sendVolume, 0, samplesPerFrameOut, fId, m_nMarkerFrames); 
        
               } else { 
        
                   addAmplitudeSmooth(bit1Amplitude[i], m_outputBlock, m_sendVolume, 0, samplesPerFrameOut, fId, m_nMarkerFrames); 
        
               } 
        
           }

The purpose of encoding the markers in this way is that one can easily detect the markers. For each frame of input samples, compute the FFT and analyze the 32 bins. To detect a begin marker:

ggwave/src/ggwave.cpp

Lines 723 to 738 in 2811934

    
           for (int i = 0; i < m_nBitsInMarker; ++i) { 
        
               double freq = bitFreq(rxProtocol.second, i); 
        
               int bin = std::round(freq*m_ihzPerSample); 
        
               if (i%2 == 0) { 
        
                   if (m_sampleSpectrum[bin] <= 3.0f*m_sampleSpectrum[bin + m_freqDelta_bin]) --nDetectedMarkerBits; 
        
               } else { 
        
                   if (m_sampleSpectrum[bin] >= 3.0f*m_sampleSpectrum[bin + m_freqDelta_bin]) --nDetectedMarkerBits; 
        
               } 
        
           } 
        
           if (nDetectedMarkerBits == m_nBitsInMarker) { 
        
               m_markerFreqStart = rxProtocol.second.freqStart; 
        
               isReceiving = true; 
        
               break; 
        
           }

And to detect an end marker:

ggwave/src/ggwave.cpp

Lines 759 to 777 in 2811934

    
           for (const auto & rxProtocol : getTxProtocols()) { 
        
               int nDetectedMarkerBits = m_nBitsInMarker; 
        
               for (int i = 0; i < m_nBitsInMarker; ++i) { 
        
                   double freq = bitFreq(rxProtocol.second, i); 
        
                   int bin = std::round(freq*m_ihzPerSample); 
        
                   if (i%2 == 0) { 
        
                       if (m_sampleSpectrum[bin] >= 3.0f*m_sampleSpectrum[bin + m_freqDelta_bin]) nDetectedMarkerBits--; 
        
                   } else { 
        
                       if (m_sampleSpectrum[bin] <= 3.0f*m_sampleSpectrum[bin + m_freqDelta_bin]) nDetectedMarkerBits--; 
        
                   } 
        
               } 
        
               if (nDetectedMarkerBits == m_nBitsInMarker) { 
        
                   isEnded = true; 
        
                   break; 
        
               } 
        
           }

This detection is robust because we are comparing the relative strength of two neighboring frequencies to determine if we have a begin or end marker. So no background noise estimation is necessary.

We emit the markers for certain amount of time - currently 16 frames. The shorter amount of time - the lower probability to detect the marker. So it is a balance between robustness and speed.

If we use less than 16 frequencies then again - robustness is reduced because a random noise fluctuation can accidentally produce a begin marker. I found that 16 frequencies behaves good enough to have a small number of false positives.

It is possible to make the marker parameters part of the protocol parameters so that the user can configure to some extend the markers. But as I mentioned, I want to find some other way to create encode the markers so that the audio is less annoying, so that is way I am still hesitating to do that.

1 reply

ankushdineshrana Jan 25, 2021
Author

True that, sound markers performed robustly during my trials as well. Having a configurable markers could really be a nice feature. Thanks for your detailed answer @ggerganov . This will definitely help me to develop a better understanding now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sound Markers #13

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Sound Markers #13

ankushdineshrana Jan 20, 2021

Replies: 1 comment · 1 reply

ggerganov Jan 20, 2021 Maintainer

ankushdineshrana Jan 25, 2021 Author

ankushdineshrana
Jan 20, 2021

Replies: 1 comment 1 reply

ggerganov
Jan 20, 2021
Maintainer

ankushdineshrana Jan 25, 2021
Author