Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SAPI5 & MSSP voices use WavePlayer (WASAPI) #17592

Merged
merged 11 commits into from
Jan 10, 2025

Conversation

gexgd0419
Copy link
Contributor

@gexgd0419 gexgd0419 commented Jan 7, 2025

Link to issue number:

Closes #13284

Summary of the issue:

Currently, SAPI5 and MSSP voices use their own audio output mechanisms, instead of using the WavePlayer (WASAPI) inside NVDA.

This may make them less responsive compared to eSpeak and OneCore voices, which are using the WavePlayer, or compared to other screen readers using SAPI5 voices, according to my test result.

This also gives NVDA less control of audio output. For example, audio ducking logic inside WavePlayer cannot be applied to SAPI5 voices, so additional code is required to compensate for this.

Description of user facing changes

SAPI5 and MSSP voices will be changed to use the WavePlayer, which may make them more responsive (have less delay).

According to my test result, this can reduce the delay by at least 50ms.

This haven't trimmed the leading silence yet. If we do that also, we can expect the delay to be even less.

Description of development approach

Instead of setting self.tts.audioOutput to a real output device, do the following:

  • create an implementation class SynthDriverAudioStream to implement COM interface IStream, which can be used to stream in audio data from the voices.
  • Use an SpCustomStream object to wrap SynthDriverAudioStream and provide the wave format.
  • Assign the SpCustomStream object to self.tts.AudioOutputStream, so SAPI will output audio to this stream instead.

Each time an audio chunk needs to be streamed in, ISequentialStream_RemoteWrite will be called, and we just feed the audio to the player. IStream_RemoteSeek can also be called when SAPI wants to know the current byte position of the stream (dlibMove should be zero and dwOrigin should be STREAM_SEEK_CUR in this case), but it is not used to actually "seek" to a new position. IStream_Commit can be called by MSSP voices to "flush" the audio data, where we do nothing. Other methods are left unimplemented, as they are not used when acting as an audio output stream.

Previously, comtypes.client.GetEvents was used to get the event notifications. But those notifications will be routed to the main thread via the main message loop. According to the documentation of ISpNotifySource:

Note that both variations of callbacks as well as the window message notification require a window message pump to run on the thread that initialized the notification source. Callback will only be called as the result of window message processing, and will always be called on the same thread that initialized the notify source. However, using Win32 events for SAPI event notification does not require a window message pump.

Because the audio data is generated and sent via IStream on a dedicated thread, receiving events on the main thread can make synchronizing events and audio difficult.

So here SapiSink is changed to become an implementation of ISpNotifySink. Notifications received via ISpNotifySink are "free-threaded", sent on the original thread instead of being routed to the main thread.

  • To connect the sink, use ISpNotifySource::SetNotifySink.
  • To get the actual event that triggers the notification, use ISpEventSource::GetEvents. Events can contain pointers to objects or memory, so they need to be freed manually.

Finally, all audio ducking related code are removed. Now WavePlayer should be able to handle audio ducking when using SAPI5 and MSSP voices.

There should be no change to public APIs.

Testing strategy:

Tested the delay of some built-in SAPI5 voices.

Audio ducking seemed to be working.

Stability of this is not proven yet, which needs further tests.

Known issues with pull request:

None yet

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@coderabbitai summary

Copy link
Collaborator

@LeonarddeR LeonarddeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is marvelous work! Great job

source/synthDrivers/sapi5.py Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Show resolved Hide resolved
@gexgd0419 gexgd0419 marked this pull request as ready for review January 8, 2025 10:43
@gexgd0419 gexgd0419 requested a review from a team as a code owner January 8, 2025 10:43
@gexgd0419 gexgd0419 requested a review from seanbudd January 8, 2025 10:43
@cary-rowen cary-rowen mentioned this pull request Jan 9, 2025
5 tasks
Copy link
Member

@seanbudd seanbudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gexgd0419 ! Mostly minor review items

source/synthDrivers/sapi5.py Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Show resolved Hide resolved
source/synthDrivers/sapi5.py Show resolved Hide resolved
@seanbudd seanbudd marked this pull request as draft January 9, 2025 03:04
@gexgd0419 gexgd0419 marked this pull request as ready for review January 9, 2025 13:38
Copy link
Member

@seanbudd seanbudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gexgd0419 !

source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
@seanbudd seanbudd merged commit 631156c into nvaccess:master Jan 10, 2025
3 of 4 checks passed
@github-actions github-actions bot added this to the 2025.1 milestone Jan 10, 2025
@burmancomp
Copy link
Contributor

When using windows 10 22h2, and finnish mikropuhe 5.1 I get following error:

IO - speech.speech.speak (06:49:48.828) - MainThread (11288):
Speaking ['Taskbar', CancellableSpeech (still valid)]
ERROR - comtypes._comobject.call_without_this (06:49:48.837) - Dummy-3 (4908):
Exception in ISequentialStream.RemoteWrite implementation:
Traceback (most recent call last):
File "comtypes_comobject.pyc", line 178, in call_without_this
ValueError: NULL pointer access

Unfortunately this makes tts voice very stuttering.

@seanbudd
Copy link
Member

@burmancomp can you please open an issue with a full debug log?

@gexgd0419
Copy link
Contributor Author

# Method implementations could check for and return E_POINTER
# themselves.  Or an error will be raised when
# 'outargs[i][0] = value' is executed.
# for a in outargs:
#     if not a:
#         return E_POINTER

I guess a null pointer was passed in as the pcbWritten parameter, which is allowed.

But how should I check the out parameter when I'm using the high level implementation?

@gexgd0419
Copy link
Contributor Author

I think that using the low level implementation is more appropriate here.

Passing NULL to the out parameter is explicitly allowed, and it shouldn't return an error in this case.

@LeonarddeR
Copy link
Collaborator

That sounds pretty reasonable to me. I'm pretty sure however that the error we're getting now comes from an exception handler in comtypes, I"m not sure how exception handling works in the low level implementation. We must avoid harsh crashes of NVDA.

@gexgd0419
Copy link
Contributor Author

Seems that both call_without_this and call_with_this have some kind of try...except structure to handle exceptions.

def call_with_this(*args, **kw):
    try:
        result = mth(*args, **kw)
    except ReturnHRESULT as err:
        (hresult, text) = err.args
        return ReportError(text, iid=interface._iid_, clsid=clsid, hresult=hresult)
    except (COMError, WindowsError) as details:
        _error(
            "Exception in %s.%s implementation:",
            interface.__name__,
            mthname,
            exc_info=True,
        )
        return HRESULT_FROM_WIN32(winerror(details))
    except E_NotImplemented:
        _warning("Unimplemented method %s.%s called", interface.__name__, mthname)
        return E_NOTIMPL
    except:
        _error(
            "Exception in %s.%s implementation:",
            interface.__name__,
            mthname,
            exc_info=True,
        )
        return ReportException(E_FAIL, interface._iid_, clsid=clsid)
    if result is None:
        return S_OK
    return result

It will log the error when an exception happens.

cary-rowen pushed a commit to cary-rowen/nvda that referenced this pull request Jan 11, 2025
Closes nvaccess#13284

Summary of the issue:
Currently, SAPI5 and MSSP voices use their own audio output mechanisms, instead of using the WavePlayer (WASAPI) inside NVDA.

This may make them less responsive compared to eSpeak and OneCore voices, which are using the WavePlayer, or compared to other screen readers using SAPI5 voices, according to my test result.

This also gives NVDA less control of audio output. For example, audio ducking logic inside WavePlayer cannot be applied to SAPI5 voices, so additional code is required to compensate for this.

Description of user facing changes
SAPI5 and MSSP voices will be changed to use the WavePlayer, which may make them more responsive (have less delay).

According to my test result, this can reduce the delay by at least 50ms.

This haven't trimmed the leading silence yet. If we do that also, we can expect the delay to be even less.

Description of development approach
Instead of setting self.tts.audioOutput to a real output device, do the following:

create an implementation class SynthDriverAudioStream to implement COM interface IStream, which can be used to stream in audio data from the voices.
Use an SpCustomStream object to wrap SynthDriverAudioStream and provide the wave format.
Assign the SpCustomStream object to self.tts.AudioOutputStream, so SAPI will output audio to this stream instead.
Each time an audio chunk needs to be streamed in, ISequentialStream_RemoteWrite will be called, and we just feed the audio to the player. IStream_RemoteSeek can also be called when SAPI wants to know the current byte position of the stream (dlibMove should be zero and dwOrigin should be STREAM_SEEK_CUR in this case), but it is not used to actually "seek" to a new position. IStream_Commit can be called by MSSP voices to "flush" the audio data, where we do nothing. Other methods are left unimplemented, as they are not used when acting as an audio output stream.

Previously, comtypes.client.GetEvents was used to get the event notifications. But those notifications will be routed to the main thread via the main message loop. According to the documentation of ISpNotifySource:

Note that both variations of callbacks as well as the window message notification require a window message pump to run on the thread that initialized the notification source. Callback will only be called as the result of window message processing, and will always be called on the same thread that initialized the notify source. However, using Win32 events for SAPI event notification does not require a window message pump.

Because the audio data is generated and sent via IStream on a dedicated thread, receiving events on the main thread can make synchronizing events and audio difficult.

So here SapiSink is changed to become an implementation of ISpNotifySink. Notifications received via ISpNotifySink are "free-threaded", sent on the original thread instead of being routed to the main thread.

To connect the sink, use ISpNotifySource::SetNotifySink.
To get the actual event that triggers the notification, use ISpEventSource::GetEvents. Events can contain pointers to objects or memory, so they need to be freed manually.
Finally, all audio ducking related code are removed. Now WavePlayer should be able to handle audio ducking when using SAPI5 and MSSP voices.
seanbudd pushed a commit that referenced this pull request Jan 12, 2025
This is a fix for the NULL pointer access error introduced by #17592 and reported in this comment.

According to Microsoft's documentation, the pcbWritten parameter in ISequentialStream::Write and the plibNewPosition parameter in IStream::Seek can be NULL, in which case the function should ignore the output parameter and succeed.

Description of user facing changes
None

Description of development approach
ISequentialStream_RemoteWrite and IStream_RemoteSeek are changed to use the low level implementation. This makes checking the output parameter easier. Then, check if the output pointer is NULL before assigning the output value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve the responsiveness of onecore voices and sapi voices
4 participants