-
-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New speech framework including callbacks, beeps, sounds, profile switches and prioritized queuing #7599
Conversation
1. Add an onDone argument to WavePlayer.feed which accepts a function to be called when the provided chunk of audio has finished playing. Speech synths can simply feed audio up to an index and use the onDone callback to be accurately notified when the index is reached. 2. Add a buffered argument to the WavePlayer constructor. If True, small chunks of audio will be buffered to prevent audio glitches. This avoids the need for tricky buffering across calls in the synth driver if the synth provides fixed size chunks and an index lands near the end of a previous chunk. It is also useful for synths which always provide very small chunks.
…within speech sequences. 1. Allow triggers to specify that handlers watching for config profile switches should not be notified. In the case of profile switches during speech sequences, we only want to apply speech settings, not switch braille displays. 2. Add some debug logging for when profiles are activated and deactivated.
…nce splits during speech sequences, as well as prioritized queuing. Changes for synth drivers: - SynthDrivers must now accurately notify when the synth reaches an index or finishes speaking using the new `synthIndexReached` and `synthDoneSpeaking` extension points in the `synthDriverHandler` module. The `lastIndex` property is deprecated. See below regarding backwards compatibility for SynthDrivers which do not support these notifications. - SynthDrivers must now support `PitchCommand` if they which to support capital pitch change. - SynthDrivers now have `supportedCommands` and `supportedNotifications` attributes which specify what they support. - Because there are some speech commands which trigger behaviour unrelated to synthesizers (e.g. beeps, callbacks and profile switches), commands which are passed to synthesizers are now subclasses of `speech.SynthCommand`. Central speech manager: - The core of this new functionality is the `speech._SpeechManager` class. It is intended for internal use only. It is called by higher level functions such as `speech.speak`. - It manages queuing of speech utterances, calling callbacks at desired points in the speech, profile switching, prioritization, etc. It relies heavily on index reached and done speaking notifications from synths. These notifications alone trigger the next task in the flow. - It maintains separate queues (`speech._ManagerPriorityQueue`) for each priority. As well as holding the pending speech sequences for that priority, each queue holds other information necessary to restore state (profiles, etc.) when that queue is preempted by a higher priority queue. - See the docstring for the `speech._SpeechManager` class for a high level summary of the flow of control. New/enhanced speech commands: - `EndUtteranceCommand` ends the current utterance at this point in the speech. This allows you to have two utterances in a single speech sequence. - `CallbackCommand` calls a function when speech reaches the command. - `BeepCommand` produces a beep when speech reaches the command. - `WaveFileCommand` plays a wave file when speech reaches the command. - The above three commands are all subclasses of `BaseCallbackCommand`. You can subclass this to implement other commands which run a pre-defined function. - `ConfigProfileTriggerCommand` applies (or stops applying) a configuration profile trigger to subsequent speech. This is the basis for switching profiles (and thus synthesizers, speech rates, etc.) for specific languages, math, etc. - `PitchCommand`, `RateCommand` and `VolumeCommand` can now take either a multiplier or an offset. In addition, they can convert between the two on demand, which makes it easier to handle these commands in synth drivers based on the synth's requirements. They also have an `isDefault` attribute which specifies whether this is returning to the default value (as configured by the user). Speech priorities: `speech.speak` now accepts a `priority` argument specifying one of three priorities: `SPRI_NORMAL` (normal priority), `SPRI_NEXT` (speak after next utterance of lower priority)or `SPRI_NOW` (speech is very important and should be spoken right now, interrupting lower priority speech). Interrupted lower priority speech resumes after any higher priority speech is complete. Refactored functionality to use the new framework: - Rather than using a polling generator, spelling is now sent as a single speech sequence, including `EndUtteranceCommand`s, `BeepCommand`s and `PitchCommand`s as appropriate. This can be created and incorporated elsewhere using the `speech.getSpeechForSpelling` function. - Say all has been completely rewritten to use `CallbackCommand`s instead of a polling generator. The code should also be a lot more readable now, as it is now classes with methods for the various stages in the process. Backwards compatibility for old synths: - For synths that don't support index and done speaking notifications, we don't use the speech manager at all. This means none of the new functionality (callbacks, profile switching, etc.) will work. - This means we must fall back to the old code for speak spelling, say all, etc. This code is in the `speechCompat` module. - This compatibility fallback is considered deprecated and will be removed eventually. Synth drivers should be updated ASAP. Deprecated/removed: - `speech.getLastIndex` is deprecated and will simply return None. - `IndexCommand` should no longer be used in speech sequences passed to `speech.speak`. Use a subclass of `speech.BaseCallbackCommand` instead. - In the `speech` module, `speakMessage`, `speakText`, `speakTextInfo`, `speakObjectProperties` and `speakObject` no longer take an `index` argument. No add-ons in the official repository use this, so I figured it was safe to just remove it rather than having it do nothing. - `speech.SpeakWithoutPausesBreakCommand` has been removed. Use `speech.EndUtteranceCommand` instead. No add-ons in the official repository use this. - `speech.speakWithoutPauses.lastSentIndex` has been removed. Use a subclass of `speech.BaseCallbackCommand` instead. No add-ons in the official repository use this.
This is necessary to handle events from SAPI 5, as one of the parameters is a decimal which is not supported by our existing (very outdated) version of comtypes . comtypes has now been added as a separate git submodule.
Urgh. Accidentally hit submit before I was ready. I've updated the PR description with (very lengthy) details. :) |
Here's a test I was using with say all to make sure it was moving the cursor correctly and breaking utterances where I expected. The text "New utterance" should literally be at the start of a new utterance when you hear it.
|
Some notes re unit testing:
|
Somehow, /source/comInterfaces/_944DE083_8FB8_45CF_BCB7_C477ACB2F897_0_1_0.py ended up in this. I thought those were created upon building and that they were part of gitignore, but it seems it already existed in the tracked source tree. |
No, that comInterface is intentionally part of the repo now, since we can't rely on everyone running the latest Windows 10 build and thus having all of the interfaces in their typelib. It had to be re-built for updated comtypes. |
I posted a brain dump on the wiki with implementation ideas for some of the more tricky use cases. I don't think these should be considered for this PR, but I'm linking it here so we have a reference. |
Sayall isn't working for me in firefox with this.
As soon as a is called, escape, and start a sayall. |
Recording of above sayall repeat forever. |
Lower priority speech has to restart from some known point, otherwise
speech could be lost. Currently I believe it restarts from the most
recently reached index.
The idea will be in future to simply add more indexes within the text.
Having said this though, queuing high priority speech on repeat with
such a short gap is not a great experience either way.
|
This fails for me. This is meant to simulate the reporting of a notification. |
Does anyone else running this have problems with sayall in browsers? |
@derekriemer commented on 14 Sep 2017, 08:13 GMT+10:
Can you be more specific about how it "fails"? It works just fine for me. I hear the notification sound with the message. Tested with espeak and oneCore. @derekriemer commented on 14 Sep 2017, 08:14 GMT+10:
Can you be more specific? Again, it works just fine for me. Tested in Firefox with eSpeak. |
Oh blerg. Both of those things fail with eSpeak if you have automatic language switching turned on. If you use oneCore (or turn off auto language switching with eSpeak), it works as expected. It looks like eSpeak fails to notify about marks (indexes) if they're immediately followed by a language change. That'll need to be fixed in eSpeak (or worked around somehow). |
@jcsteh commented on Sep 13, 2017, 5:48 PM MDT:
I hear nothing but the text |
@jcsteh commented on Sep 13, 2017, 5:54 PM MDT:
confirm |
I'm not sure but some time ago I mentioned that Espeak kept switching
languages when not appropriate, and turned the language switching off as a
result.
Brian
bglists@blueyonder.co.uk
Sent via blueyonder.
Please address personal email to:-
briang1@blueyonder.co.uk, putting 'Brian Gaff'
in the display name field.
----- Original Message -----
From: "Derek Riemer" <notifications@github.com>
To: "nvaccess/nvda" <nvda@noreply.github.com>
Cc: "Subscribed" <subscribed@noreply.github.com>
Sent: Thursday, September 14, 2017 7:12 AM
Subject: Re: [nvaccess/nvda] New speech framework including callbacks,
beeps, sounds, profile switches and prioritized queuing (#7599)
… ***@***.*****](https://github.com/jcsteh) commented on [Sep 13, 2017, 5:54
PM MDT](#7599 (comment)
"2017-09-13T23:54:30Z - Replied by Github Reply Comments"):
> Oh blerg. Both of those things fail with eSpeak if you have automatic
> language switching turned on. If you use oneCore (or turn off auto
> language switching with eSpeak), it works as expected.
>
> It looks like eSpeak fails to notify about marks (indexes) if they're
> immediately followed by a language change. That'll need to be fixed in
> eSpeak (or worked around somehow).
confirm
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#7599 (comment)
|
Hello!
Trying the new version, threshold-17245,with Vocalizer Expressive driver
3.0.14, the following error occurs after changing the synth to Vocalizer
Expressive...
WARNING - eventHandler._EventExecuter.next (01:12:20.823):
Could not execute function event_gainFocus defined in
appModules.thunderbird module; kwargs: {}
Traceback (most recent call last):
File "eventHandler.pyc", line 100, in next
File "appModules\thunderbird.pyc", line 28, in event_gainFocus
File "eventHandler.pyc", line 107, in next
File "extensionPoints\util.pyc", line 185, in callWithSupportedKwargs
File "NVDAObjects\behaviors.pyc", line 179, in event_gainFocus
File "NVDAObjects\__init__.pyc", line 1030, in event_gainFocus
File "NVDAObjects\__init__.pyc", line 918, in reportFocus
File "speech.pyc", line 384, in speakObject
File "speech.pyc", line 320, in speakObjectProperties
TypeError: patchedSpeak() got an unexpected keyword argument 'priority'
ERROR - scriptHandler.executeScript (01:12:23.177):
error executing script: <bound method
Dynamic_EditableTextWithAutoSelectDetectionBrokenFocusedStateDocumentEditorMozillaIAccessible.script_caret_moveByLine
of
<NVDAObjects.Dynamic_EditableTextWithAutoSelectDetectionBrokenFocusedStateDocumentEditorMozillaIAccessible
object at 0x04FADFF0>> with gesture u'seta acima'
Traceback (most recent call last):
File "scriptHandler.pyc", line 192, in executeScript
File "editableText.pyc", line 185, in script_caret_moveByLine
File "editableText.pyc", line 144, in _caretMovementScriptHelper
File "editableText.pyc", line 130, in _caretScriptPostMovedHelper
File "speech.pyc", line 1032, in speakTextInfo
TypeError: patchedSpeak() got an unexpected keyword argument 'priority'
Curiously, Braille also stops working...
Rui Fontes
|
Hello!
Some more bugs, now using Windows Core:
1 - When typing a capital letter NVDA gaves the following error:
Input: kb(desktop):shift+d
DEBUG - synthDrivers.oneCore.SynthDriver.cancel (01:23:44.095):
Cancelling
IO - speech.speak (01:23:44.095):
Speaking [PitchCommand(offset=30), CharacterModeCommand(True), u'D',
PitchCommand(), EndUtteranceCommand()]
ERROR - eventHandler.executeEvent (01:23:44.095):
error executing event: typedCharacter on
<NVDAObjects.Dynamic_EditableTextWithAutoSelectDetectionBrokenFocusedStateDocumentEditorMozillaIAccessible
object at 0x04F89670> with extra args of {'ch': u'D'}
Traceback (most recent call last):
File "eventHandler.pyc", line 155, in executeEvent
File "eventHandler.pyc", line 92, in __init__
File "eventHandler.pyc", line 100, in next
File "NVDAObjects\__init__.pyc", line 977, in event_typedCharacter
File "speech.pyc", line 713, in speakTypedCharacters
File "speech.pyc", line 148, in speakSpelling
File "speech.pyc", line 570, in speak
File "speech.pyc", line 2260, in speak
File "speech.pyc", line 2389, in _pushNextSpeech
File "synthDrivers\oneCore.pyc", line 206, in speak
File "speechXml.pyc", line 229, in convertToXml
File "speechXml.pyc", line 156, in generateXml
File "speechXml.pyc", line 242, in generateBalancerCommands
File "speechXml.pyc", line 218, in generateBalancerCommands
File "synthDrivers\oneCore.pyc", line 49, in convertPitchCommand
AttributeError: '_OcSsmlConverter' object has no attribute '_pitch'
2 - In spite of having NVDA configured to announce the words typed,
nothing is announced.
I have verified through the log file set for bebug.
Rui Fontes
|
@ruifontes the traceback you are reporting for vocalizer is a problem
with your Vocalizer driver. Your driver patches speech.speak, but
speech.speak has changed the arguments it takes. Specifically your
patched speak function should take an optional keyword argument called
'priority'.
Note that we don't really support people patching functions like this...
but in this case adding that argument should fix your problem.
I will investigate the OneCore traceback.
|
OneCore is most probably a mistake on my end, so I can also look into it?
|
Hmm, OneCore is an interesting one. It tries to send pitch using ssml, but in this case, it shouldn't. I think it should be fixed as follows:
However then still, I'm afraid things won't work as expected when sending commands, as they aren't processed in the speak function. I'm afraid I'm too unfamiliar with synthesizer drivers to fix this ASAP. |
Are you saying OneCore shouldn't use SSML for PitchCommand, etc.? If so,
why? SSML is the ideal fit for inline speech prosody commands.
|
Isn't the problem with SSML that it doesn't support the full rate and pitch range that is supported with the prosody commands? Or am I just misunderstanding something? |
Using SSML to set the base pitch/rate is certainly weird, since SSML is
meant to be relative. And the fact that it is relative also means we're
affected by Windows Settings, which is bad.
On the other hand, for speech commands, SSML is totally appropriate, since
those commands are meant for inline, relative adjustments.
|
Thanks for the clarification. I will provide a pr that fixes this.
|
Link to issue number:
Fixes #4877. Fixes #1229.
Summary of the issue:
We want to be able to easily and accurately perform various actions (beep, play sounds, switch profiles, etc.) during speech. We also want to be able to have prioritized speech which interrupts lower priority speech and then have the lower priority speech resume. This is required for a myriad of use cases, including switching to specific synths for specific languages (#279), changing speeds for different languages (#4738), audio indication of spelling errors when reading text (#4233), indication of links using beeps (#905), reading of alerts without losing other speech forever (#3807, #6688) and changing speech rate for math (#7274). Our old speech code simply sends utterances to the synthesizer; there is no ability to do these things. Say all and speak spelling continually poll the last index, but this is ugly and not feasible for other features.
Description of how this pull request fixes the issue:
Enhance nvwave to simplify accurate indexing for speech synthesizers.
Enhancements to config profile triggers needed for profile switching within speech sequences.
Add support for callbacks, beeps, sounds, profile switches and utterance splits during speech sequences, as well as prioritized queuing.
Changes for synth drivers:
synthIndexReached
andsynthDoneSpeaking
extension points in thesynthDriverHandler
module. ThelastIndex
property is deprecated. See below regarding backwards compatibility for SynthDrivers which do not support these notifications.PitchCommand
if they which to support capital pitch change.supportedCommands
andsupportedNotifications
attributes which specify what they support.speech.SynthCommand
.Central speech manager:
speech._SpeechManager
class. It is intended for internal use only. It is called by higher level functions such asspeech.speak
.speech._ManagerPriorityQueue
) for each priority. As well as holding the pending speech sequences for that priority, each queue holds other information necessary to restore state (profiles, etc.) when that queue is preempted by a higher priority queue.speech._SpeechManager
class for a high level summary of the flow of control.New/enhanced speech commands:
EndUtteranceCommand
ends the current utterance at this point in the speech. This allows you to have two utterances in a single speech sequence.CallbackCommand
calls a function when speech reaches the command.BeepCommand
produces a beep when speech reaches the command. This is the basis for features such as When in a sayall on webpages or in other html documents, nvda should be able to produce beeps when indicating links #905.WaveFileCommand
plays a wave file when speech reaches the command. This is the basis for features such as Provision of indication options for reporting spelling errors. #4233 and Trigger a Sound for Control Types and States #4089.BaseCallbackCommand
. You can subclass this to implement other commands which run a pre-defined function.ConfigProfileTriggerCommand
applies (or stops applying) a configuration profile trigger to subsequent speech. This is the basis for switching profiles (and thus synthesizers, speech rates, etc.) for specific languages, math, etc.; Virtual synth driver which can automatically recognise and switch between certain languages/synths #279, Changing speeds for different languages #4738, Voice aliases #4433, Introduce a special Math speech rate #7274, etc.PitchCommand
,RateCommand
andVolumeCommand
can now take either a multiplier or an offset. In addition, they can convert between the two on demand, which makes it easier to handle these commands in synth drivers based on the synth's requirements. They also have anisDefault
attribute which specifies whether this is returning to the default value (as configured by the user).Speech priorities:
speech.speak
now accepts apriority
argument specifying one of three priorities:SPRI_NORMAL
(normal priority),SPRI_NEXT
(speak after next utterance of lower priority)orSPRI_NOW
(speech is very important and should be spoken right now, interrupting lower priority speech).Refactored functionality to use the new framework:
EndUtteranceCommand
s,BeepCommand
s andPitchCommand
s as appropriate. This can be created and incorporated elsewhere using thespeech.getSpeechForSpelling
function. This fixes Problem spelling things when speaking of tooltips is turn on. #1229 (since it's a single sequence) and is also the basis for fixing issues such as Optional reporting of capitals while reading full text (not just characters) #3286, When selecting text, the various options for distinguishing capital letters don't apply #4874 and No language switching when selecting text #4661.CallbackCommand
s instead of a polling generator. The code should also be a lot more readable now, as it is now classes with methods for the various stages in the process.Backwards compatibility for old synths:
speechCompat
module.Deprecated/removed:
speech.getLastIndex
is deprecated and will simply return None.IndexCommand
should no longer be used in speech sequences passed tospeech.speak
. Use a subclass ofspeech.BaseCallbackCommand
instead.speech
module,speakMessage
,speakText
,speakTextInfo
,speakObjectProperties
andspeakObject
no longer take anindex
argument. No add-ons in the official repository use this, so I figured it was safe to just remove it rather than having it do nothing.speech.SpeakWithoutPausesBreakCommand
has been removed. Usespeech.EndUtteranceCommand
instead. No add-ons in the official repository use this.speech.speakWithoutPauses.lastSentIndex
has been removed. Instead,speakWithoutPauses
returns True if something was actually spoken, False if only buffering occurred.Update comtypes to version 1.1.3.
Updated synth drivers
The espeak, oneCore and sapi5 synth drivers have all been updated to support the new speech framework.
Testing performed:
Unfortunately, I'm out of time to write unit tests for this, though much of this should be suitable for unit testing. I've been testing with the Python console test cases below. Note that the
wx.CallLater
is necessary so that speech doesn't get silenced straight away; that's just an artefact of testing with the console.For the profile tests, you'll need to set up two profiles, one triggered for say all and the other triggered for the notepad app.
Python Console test cases:
Known issues with pull request:
No issues with the code that I know of. There are two issues for the project, though:
Change log entry:
Bug Fixes:
Changes for Developers: