Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New speech framework including callbacks, beeps, sounds, profile switches and prioritized queuing #7599

Merged
merged 34 commits into from
May 15, 2019

Conversation

jcsteh
Copy link
Contributor

@jcsteh jcsteh commented Sep 13, 2017

Link to issue number:

Fixes #4877. Fixes #1229.

Summary of the issue:

We want to be able to easily and accurately perform various actions (beep, play sounds, switch profiles, etc.) during speech. We also want to be able to have prioritized speech which interrupts lower priority speech and then have the lower priority speech resume. This is required for a myriad of use cases, including switching to specific synths for specific languages (#279), changing speeds for different languages (#4738), audio indication of spelling errors when reading text (#4233), indication of links using beeps (#905), reading of alerts without losing other speech forever (#3807, #6688) and changing speech rate for math (#7274). Our old speech code simply sends utterances to the synthesizer; there is no ability to do these things. Say all and speak spelling continually poll the last index, but this is ugly and not feasible for other features.

Description of how this pull request fixes the issue:

Enhance nvwave to simplify accurate indexing for speech synthesizers.

  1. Add an onDone argument to WavePlayer.feed which accepts a function to be called when the provided chunk of audio has finished playing. Speech synths can simply feed audio up to an index and use the onDone callback to be accurately notified when the index is reached.
  2. Add a buffered argument to the WavePlayer constructor. If True, small chunks of audio will be buffered to prevent audio glitches. This avoids the need for tricky buffering across calls in the synth driver if the synth provides fixed size chunks and an index lands near the end of a previous chunk. It is also useful for synths which always provide very small chunks.

Enhancements to config profile triggers needed for profile switching within speech sequences.

  1. Allow triggers to specify that handlers watching for config profile switches should not be notified. In the case of profile switches during speech sequences, we only want to apply speech settings, not switch braille displays.
  2. Add some debug logging for when profiles are activated and deactivated.

Add support for callbacks, beeps, sounds, profile switches and utterance splits during speech sequences, as well as prioritized queuing.

Changes for synth drivers:

  • SynthDrivers must now accurately notify when the synth reaches an index or finishes speaking using the new synthIndexReached and synthDoneSpeaking extension points in the synthDriverHandler module. The lastIndex property is deprecated. See below regarding backwards compatibility for SynthDrivers which do not support these notifications.
  • SynthDrivers must now support PitchCommand if they which to support capital pitch change.
  • SynthDrivers now have supportedCommands and supportedNotifications attributes which specify what they support.
  • Because there are some speech commands which trigger behaviour unrelated to synthesizers (e.g. beeps, callbacks and profile switches), commands which are passed to synthesizers are now subclasses of speech.SynthCommand.

Central speech manager:

  • The core of this new functionality is the speech._SpeechManager class. It is intended for internal use only. It is called by higher level functions such as speech.speak.
  • It manages queuing of speech utterances, calling callbacks at desired points in the speech, profile switching, prioritization, etc. It relies heavily on index reached and done speaking notifications from synths. These notifications alone trigger the next task in the flow.
  • It maintains separate queues (speech._ManagerPriorityQueue) for each priority. As well as holding the pending speech sequences for that priority, each queue holds other information necessary to restore state (profiles, etc.) when that queue is preempted by a higher priority queue.
  • See the docstring for the speech._SpeechManager class for a high level summary of the flow of control.

New/enhanced speech commands:

Speech priorities:

Refactored functionality to use the new framework:

Backwards compatibility for old synths:

  • For synths that don't support index and done speaking notifications, we don't use the speech manager at all. This means none of the new functionality (callbacks, profile switching, etc.) will work.
  • This means we must fall back to the old code for speak spelling, say all, etc. This code is in the speechCompat module.
  • This compatibility fallback is considered deprecated and will be removed eventually. Synth drivers should be updated ASAP.

Deprecated/removed:

  • speech.getLastIndex is deprecated and will simply return None.
  • IndexCommand should no longer be used in speech sequences passed to speech.speak. Use a subclass of speech.BaseCallbackCommand instead.
  • In the speech module, speakMessage, speakText, speakTextInfo, speakObjectProperties and speakObject no longer take an index argument. No add-ons in the official repository use this, so I figured it was safe to just remove it rather than having it do nothing.
  • speech.SpeakWithoutPausesBreakCommand has been removed. Use speech.EndUtteranceCommand instead. No add-ons in the official repository use this.
  • speech.speakWithoutPauses.lastSentIndex has been removed. Instead, speakWithoutPauses returns True if something was actually spoken, False if only buffering occurred.

Update comtypes to version 1.1.3.

  • This is necessary to handle events from SAPI 5, as one of the parameters is a decimal which is not supported by our existing (very outdated) version of comtypes .
  • comtypes has now been added as a separate git submodule.

Updated synth drivers

The espeak, oneCore and sapi5 synth drivers have all been updated to support the new speech framework.

Testing performed:

Unfortunately, I'm out of time to write unit tests for this, though much of this should be suitable for unit testing. I've been testing with the Python console test cases below. Note that the wx.CallLater is necessary so that speech doesn't get silenced straight away; that's just an artefact of testing with the console.

For the profile tests, you'll need to set up two profiles, one triggered for say all and the other triggered for the notepad app.

Python Console test cases:

# Text, beep, beep, sound, text.
wx.CallLater(500, speech.speak, [u"This is some speech and then comes a", speech.BeepCommand(440, 10), u"beep. If you liked that, let's ", speech.BeepCommand(880, 10), u"beep again. I'll speak the rest of this in a ", speech.PitchCommand(offset=50), u"higher pitch. And for the finale, let's ", speech.WaveFileCommand(r"waves\browseMode.wav"), u"play a sound."])
# Text, end utterance, text.
wx.CallLater(500, speech.speak, [u"This is the first utterance", speech.EndUtteranceCommand(), u"And this is the second"])
# Change pitch, text, end utterance, text. Expected: All should be higher pitch.
wx.CallLater(500, speech.speak, [speech.PitchCommand(offset=50), u"This is the first utterance in a higher pitch", speech.EndUtteranceCommand(), u"And this is the second"])
# Text, pitch, text, enter profile1, enter profile2, text, exit profile1, text. Expected: All text after 1 2 3 4 should be higher pitch. 5 6 7 8 should have profile 1 and 2. 9 10 11 12 should be just profile 2.
import sayAllHandler, appModuleHandler; t1 = sayAllHandler.SayAllProfileTrigger(); t2 = appModuleHandler.AppProfileTrigger("notepad"); wx.CallLater(500, speech.speak, [u"Testing testing ", speech.PitchCommand(offset=100), "1 2 3 4", speech.ConfigProfileTriggerCommand(t1, True), speech.ConfigProfileTriggerCommand(t2, True), u"5 6 7 8", speech.ConfigProfileTriggerCommand(t1, False), u"9 10 11 12"])
# Enter profile, text, exit profile. Expected: 5 6 7 8 in different profile, 9 10 11 12 with base config.
import sayAllHandler; trigger = sayAllHandler.SayAllProfileTrigger(); wx.CallLater(500, speech.speak, [speech.ConfigProfileTriggerCommand(trigger, True), u"5 6 7 8", speech.ConfigProfileTriggerCommand(trigger, False), u"9 10 11 12"])
# Two utterances at SPRI_NORMAL in same sequence. Two separate sequences at SPRI_NEXT. Expected result: numbers in order from 1 to 20.
wx.CallLater(500, speech.speak, [u"1 2 3 ", u"4 5", speech.EndUtteranceCommand(), u"16 17 18 19 20"]); wx.CallLater(510, speech.speak, [u"6 7 8 9 10"], priority=speech.SPRI_NEXT); wx.CallLater(520, speech.speak, [u"11 12 13 14 15"], priority=speech.SPRI_NEXT)
# Utterance at SPRI_NORMAL including a beep. Utterance at SPRI_NOW. Expected: Text before the beep, beep, Text after..., This is an interruption., Text after the beep, text...
wx.CallLater(500, speech.speak, [u"Text before the beep ", speech.BeepCommand(440, 10), u"text after the beep, text, text, text, text"]); wx.CallLater(1500, speech.speak, [u"This is an interruption"], priority=speech.SPRI_NOW)
# Utterance with two sequences at SPRI_NOW. Utterance at SPRI_NOW. Expected result: First utterance, second utterance
wx.CallLater(500, speech.speak, [u"First ", u"utterance"], priority=speech.SPRI_NOW); wx.CallLater(510, speech.speak, [u"Second ", u"utterance"], priority=speech.SPRI_NOW)
# Utterance with two sequences at SPRI_NOW. Utterance at SPRI_NEXT. Expected result: First utterance, second utterance
wx.CallLater(500, speech.speak, [u"First ", u"utterance"], priority=speech.SPRI_NOW); wx.CallLater(501, speech.speak, [u"Second ", u"utterance"], priority=speech.SPRI_NEXT)
# Utterance at SPRI_NORMAL. Utterance at SPRI_NOW with profile switch. Expected: Normal speaks but gets interrupted, interruption with different profile, normal speaks again
import sayAllHandler; trigger = sayAllHandler.SayAllProfileTrigger(); wx.CallLater(500, speech.speak, [u"This is a normal utterance, text, text"]); wx.CallLater(1000, speech.speak, [speech.ConfigProfileTriggerCommand(trigger, True), u"This is an interruption with a different profile"], priority=speech.SPRI_NOW)
# Utterance at SPRI_NORMAL with profile switch. Utterance at SPRI_NOW. Expected: Normal speaks with different profile but gets interrupted, interruption speaks with base config, normal speaks again with different profile
import sayAllHandler; trigger = sayAllHandler.SayAllProfileTrigger(); wx.CallLater(500, speech.speak, [speech.ConfigProfileTriggerCommand(trigger, True), u"This is a normal utterance with a different profile"]); wx.CallLater(1000, speech.speak, [u"This is an interruption"], priority=speech.SPRI_NOW)
# Utterance at SPRI_NORMAL with profile 1. Utterance at SPRI_NOW with profile 2. Expected: Normal speaks with profile 1 but gets interrupted, interruption speaks with profile 2, normal speaks again with profile 1
import sayAllHandler, appModuleHandler; t1 = sayAllHandler.SayAllProfileTrigger(); t2 = appModuleHandler.AppProfileTrigger("notepad"); wx.CallLater(500, speech.speak, [speech.ConfigProfileTriggerCommand(t1, True), u"This is a normal utterance with profile 1"]); wx.CallLater(1000, speech.speak, [speech.ConfigProfileTriggerCommand(t2, True), u"This is an interruption with profile 2"], priority=speech.SPRI_NOW)
# Utterance at SPRI_NORMAL including a pitch change and beep. Utterance at SPRI_NOW. Expected: Text speaks with higher pitch, beep, text gets interrupted, interruption speaks with normal pitch, text after the beep speaks again with higher pitch
wx.CallLater(500, speech.speak, [speech.PitchCommand(offset=100), u"Text before the beep ", speech.BeepCommand(440, 10), u"text after the beep, text, text, text, text"]); wx.CallLater(1500, speech.speak, [u"This is an interruption"], priority=speech.SPRI_NOW)

Known issues with pull request:

No issues with the code that I know of. There are two issues for the project, though:

  1. All third party synth drivers need to be updated in order to support the new functionality. Old drivers will still work for now thanks to the compat code, but they get none of the new functionality. Getting third parties to do this will take some time.
  2. While this PR forms the basis for a lot of functionality, it doesn't provide many user visible changes. That means merging it is risky without immediate benefit. That said, putting anything more in this PR would make it even more insane than it already is.

Change log entry:

Bug Fixes:

- When spelling text, reported tool tips are no longer interjected in the middle of the spelling. Instead, they are reported after spelling finishes. (#1229)

Changes for Developers:

- nvwave has been enhanced to simplify accurate indexing for speech synthesizers: (#4877)
 - `WavePlayer.feed` now takes an `onDone` argument specifying a function to be called when the provided chunk of audio has finished playing. Speech synths can simply feed audio up to an index and use the onDone callback to be accurately notified when the index is reached.
 - The `WavePlayer` constructor now takes a `buffered` argument. If True, small chunks of audio will be buffered to prevent audio glitches. This avoids the need for tricky buffering across calls in the synth driver if the synth provides fixed size chunks and an index lands near the end of a previous chunk. It is also useful for synths which always provide very small chunks.
- Several major changes related to synth drivers: (#4877)
 - SynthDrivers must now accurately notify when the synth reaches an index or finishes speaking using the new `synthIndexReached` and `synthDoneSpeaking` extension points in the `synthDriverHandler` module.
  - The `lastIndex` property is deprecated.
  - For drivers that don't support these, old speech code will be used. However, this means new functionality will be unavailable, including callbacks, beeps, playing audio, profile switching and prioritized speech. This old code will eventually be removed.
 - SynthDrivers must now support `PitchCommand` if they which to support capital pitch change.
 - SynthDrivers now have `supportedCommands` and `supportedNotifications` attributes which specify what they support.
 - Because there are some speech commands which trigger behaviour unrelated to synthesizers (e.g. beeps, callbacks and profile switches), commands which are passed to synthesizers are now subclasses of `speech.SynthCommand`.
- New/enhanced speech commands: (#4877)
 - `EndUtteranceCommand` ends the current utterance at this point in the speech. This allows you to have two utterances in a single speech sequence.
 - `CallbackCommand` calls a function when speech reaches the command.
 - `BeepCommand` produces a beep when speech reaches the command.
 - `WaveFileCommand` plays a wave file when speech reaches the command.
 - The above three commands are all subclasses of `BaseCallbackCommand`. You can subclass this to implement other commands which run a pre-defined function.
 - `ConfigProfileTriggerCommand` applies (or stops applying) a configuration profile trigger to subsequent speech. This is the basis for switching profiles (and thus synthesizers, speech rates, etc.) for specific languages, math, etc.
 - `PitchCommand`, `RateCommand` and `VolumeCommand` can now take either a multiplier or an offset. In addition, they can convert between the two on demand, which makes it easier to handle these commands in synth drivers based on the synth's requirements. They also have an `isDefault` attribute which specifies whether this is returning to the default value (as configured by the user).
- `speech.speak` now accepts a `priority` argument specifying one of three priorities: `SPRI_NORMAL` (normal priority), `SPRI_NEXT` (speak after next utterance of lower priority)or `SPRI_NOW` (speech is very important and should be spoken right now, interrupting lower priority speech). Interrupted lower priority speech resumes after any higher priority speech is complete. (#4877)
- Deprecated/removed speech functionality: (#4877)
 - `speech.getLastIndex` is deprecated and will simply return None.
 - `IndexCommand` should no longer be used in speech sequences passed to `speech.speak`. Use a subclass of `speech.BaseCallbackCommand` instead.
 - In the `speech` module, `speakMessage`, `speakText`, `speakTextInfo`, `speakObjectProperties` and `speakObject` no longer take an `index` argument.
 - `speech.SpeakWithoutPausesBreakCommand` has been removed. Use `speech.EndUtteranceCommand` instead.
 - `speech.speakWithoutPauses.lastSentIndex` has been removed. Instead, `speakWithoutPauses` returns True if something was actually spoken, False if only buffering occurred.
- Updated comtypes to version 1.1.3. (#4877)

1. Add an onDone argument to WavePlayer.feed which accepts a function to be called when the provided chunk of audio has finished playing. Speech synths can simply feed audio up to an index and use the onDone callback to be accurately notified when the index is reached.
2. Add a buffered argument to the WavePlayer constructor. If True, small chunks of audio will be buffered to prevent audio glitches. This avoids the need for tricky buffering across calls in the synth driver if the synth provides fixed size chunks and an index lands near the end of a previous chunk. It is also useful for synths which always provide very small chunks.
…within speech sequences.

1. Allow triggers to specify that handlers watching for config profile switches should not be notified. In the case of profile switches during speech sequences, we only want to apply speech settings, not switch braille displays.
2. Add some debug logging for when profiles are activated and deactivated.
…nce splits during speech sequences, as well as prioritized queuing.

Changes for synth drivers:

- SynthDrivers must now accurately notify when the synth reaches an index or finishes speaking using the new `synthIndexReached` and `synthDoneSpeaking` extension points in the `synthDriverHandler` module. The `lastIndex` property is deprecated. See below regarding backwards compatibility for SynthDrivers which do not support these notifications.
- SynthDrivers must now support `PitchCommand` if they which to support capital pitch change.
- SynthDrivers now have `supportedCommands` and `supportedNotifications` attributes which specify what they support.
- Because there are some speech commands which trigger behaviour unrelated to synthesizers (e.g. beeps, callbacks and profile switches), commands which are passed to synthesizers are now subclasses of `speech.SynthCommand`.

Central speech manager:

- The core of this new functionality is the `speech._SpeechManager` class. It is intended for internal use only. It is called by higher level functions such as `speech.speak`.
- It manages queuing of speech utterances, calling callbacks at desired points in the speech, profile switching, prioritization, etc. It relies heavily on index reached and done speaking notifications from synths. These notifications alone trigger the next task in the flow.
- It maintains separate queues (`speech._ManagerPriorityQueue`) for each priority. As well as holding the pending speech sequences for that priority, each queue holds other information necessary to restore state (profiles, etc.) when that queue is preempted by a higher priority queue.
- See the docstring for the `speech._SpeechManager` class for a high level summary of the flow of control.

New/enhanced speech commands:

- `EndUtteranceCommand` ends the current utterance at this point in the speech. This allows you to have two utterances in a single speech sequence.
- `CallbackCommand` calls a function when speech reaches the command.
- `BeepCommand` produces a beep when speech reaches the command.
- `WaveFileCommand` plays a wave file when speech reaches the command.
- The above three commands are all subclasses of `BaseCallbackCommand`. You can subclass this to implement other commands which run a pre-defined function.
- `ConfigProfileTriggerCommand` applies (or stops applying) a configuration profile trigger to subsequent speech. This is the basis for switching profiles (and thus synthesizers, speech rates, etc.) for specific languages, math, etc.
- `PitchCommand`, `RateCommand` and `VolumeCommand` can now take either a multiplier or an offset. In addition, they can convert between the two on demand, which makes it easier to handle these commands in synth drivers based on the synth's requirements. They also have an `isDefault` attribute which specifies whether this is returning to the default value (as configured by the user).

Speech priorities:

`speech.speak` now accepts a `priority` argument specifying one of three priorities: `SPRI_NORMAL` (normal priority), `SPRI_NEXT` (speak after next utterance of lower priority)or `SPRI_NOW` (speech is very important and should be spoken right now, interrupting lower priority speech). Interrupted lower priority speech resumes after any higher priority speech is complete.

Refactored functionality to use the new framework:

- Rather than using a polling generator, spelling is now sent as a single speech sequence, including `EndUtteranceCommand`s, `BeepCommand`s and `PitchCommand`s as appropriate. This can be created and incorporated elsewhere using the `speech.getSpeechForSpelling` function.
- Say all has been completely rewritten to use `CallbackCommand`s instead of a polling generator. The code should also be a lot more readable now, as it is now classes with methods for the various stages in the process.

Backwards compatibility for old synths:

- For synths that don't support index and done speaking notifications, we don't use the speech manager at all. This means none of the new functionality (callbacks, profile switching, etc.) will work.
- This means we must fall back to the old code for speak spelling, say all, etc. This code is in the `speechCompat` module.
- This compatibility fallback is considered deprecated and will be removed eventually. Synth drivers should be updated ASAP.

Deprecated/removed:

- `speech.getLastIndex` is deprecated and will simply return None.
- `IndexCommand` should no longer be used in speech sequences passed to `speech.speak`. Use a subclass of `speech.BaseCallbackCommand` instead.
- In the `speech` module, `speakMessage`, `speakText`, `speakTextInfo`, `speakObjectProperties` and `speakObject` no longer take an `index` argument. No add-ons in the official repository use this, so I figured it was safe to just remove it rather than having it do nothing.
- `speech.SpeakWithoutPausesBreakCommand` has been removed. Use `speech.EndUtteranceCommand` instead. No add-ons in the official repository use this.
- `speech.speakWithoutPauses.lastSentIndex` has been removed. Use a subclass of `speech.BaseCallbackCommand` instead. No add-ons in the official repository use this.
This is necessary to handle events from SAPI 5, as one of the parameters is a decimal which is not supported by our existing (very outdated) version of comtypes .
comtypes has now been added as a separate git submodule.
@jcsteh
Copy link
Contributor Author

jcsteh commented Sep 13, 2017

Urgh. Accidentally hit submit before I was ready. I've updated the PR description with (very lengthy) details. :)

@jcsteh
Copy link
Contributor Author

jcsteh commented Sep 13, 2017

Here's a test I was using with say all to make sure it was moving the cursor correctly and breaking utterances where I expected. The text "New utterance" should literally be at the start of a new utterance when you hear it.

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
New utterance  line 11
Line 12
Line 13
Line 14
Line 15
Line 16
Line 17
Line 18
Line 19
Line 20
New utterance line 21
Line 22.
New utterance line 23
Line 24
Line 25
Line 26
Line 27
Line 28
Line 29
Line 30
Line 31
Line 32
New utterance line 33
Last line

@jcsteh
Copy link
Contributor Author

jcsteh commented Sep 13, 2017

Some notes re unit testing:

  • I think the Python console test cases above can be used as the basis for unit tests.
  • A lot of this can be unit tested by running various methods and making assertions on the output or based on the state of the queue.
  • However, it's also necessary to check whether being notified about a particular index or done speaking will cause certain text to be sent to the synth, certain profiles to be switched, etc. This is a bit trickier.
  • Some of these (e.g. _switchProfile) can just be replaced with mocks when setting up the speech manager. We could probably replace the calls to send text to the synth and to exit all/restore all profile triggers with tiny functions which we can similarly mock. This is pretty trivial to do, but I didn't want to do this without being certain it was actually helpful.

@LeonarddeR
Copy link
Collaborator

Somehow, /source/comInterfaces/_944DE083_8FB8_45CF_BCB7_C477ACB2F897_0_1_0.py ended up in this. I thought those were created upon building and that they were part of gitignore, but it seems it already existed in the tracked source tree.

@jcsteh
Copy link
Contributor Author

jcsteh commented Sep 13, 2017

No, that comInterface is intentionally part of the repo now, since we can't rely on everyone running the latest Windows 10 build and thus having all of the interfaces in their typelib. It had to be re-built for updated comtypes.

@jcsteh
Copy link
Contributor Author

jcsteh commented Sep 13, 2017

I posted a brain dump on the wiki with implementation ideas for some of the more tricky use cases. I don't think these should be considered for this PR, but I'm linking it here so we have a reference.

@jcsteh jcsteh mentioned this pull request Sep 13, 2017
@derekriemer
Copy link
Collaborator

Sayall isn't working for me in firefox with this.
Also, It is possible to make something preempt with priority now, and make NVDA read the same thing forever. Not sure it needs fixed as it's more of a DDos of the user, but do this in the python console, then go read some text in notepad++
expected: speech starts where it ended.
actual: speech starts at the beginning of the line, preempted at some point, then starts at the beginning of the line, preempted at the same place, ... forever.
Paste this into python console.

def a():
	wx.CallLater(4000, speech.speak, ["Interruption! You are in the path of a tornado, evacuate immediately!", speech.CallbackCommand(a)], priority=speech.SPRI_NOW)
a()

As soon as a is called, escape, and start a sayall.

@derekriemer
Copy link
Collaborator

Recording of above sayall repeat forever.
https://files.derekriemer.com/tornado.flac

@michaelDCurran
Copy link
Member

michaelDCurran commented Sep 13, 2017 via email

@derekriemer
Copy link
Collaborator

This fails for me. This is meant to simulate the reporting of a notification.
wx.CallLater(1000, speech.speak, [speech.WaveFileCommand(r"C:\Windows\Media\Windows Notify System Generic.wav"), "You have a meeting in ten minutes!"])

@derekriemer
Copy link
Collaborator

Does anyone else running this have problems with sayall in browsers?

@jcsteh
Copy link
Contributor Author

jcsteh commented Sep 13, 2017

@derekriemer commented on 14 Sep 2017, 08:13 GMT+10:

This fails for me. This is meant to simulate the reporting of a notification.
wx.CallLater(1000, speech.speak, [speech.WaveFileCommand(r"C:\Windows\Media\Windows Notify System Generic.wav"), "You have a meeting in ten minutes!"])

Can you be more specific about how it "fails"? It works just fine for me. I hear the notification sound with the message. Tested with espeak and oneCore.

@derekriemer commented on 14 Sep 2017, 08:14 GMT+10:

Does anyone else running this have problems with sayall in browsers?

Can you be more specific? Again, it works just fine for me. Tested in Firefox with eSpeak.

@jcsteh
Copy link
Contributor Author

jcsteh commented Sep 13, 2017

Oh blerg. Both of those things fail with eSpeak if you have automatic language switching turned on. If you use oneCore (or turn off auto language switching with eSpeak), it works as expected.

It looks like eSpeak fails to notify about marks (indexes) if they're immediately followed by a language change. That'll need to be fixed in eSpeak (or worked around somehow).

@derekriemer
Copy link
Collaborator

@jcsteh commented on Sep 13, 2017, 5:48 PM MDT:

@derekriemer commented on 14 Sep 2017, 08:13 GMT+10:

This fails for me. This is meant to simulate the reporting of a notification.
wx.CallLater(1000, speech.speak, [speech.WaveFileCommand(r"C:\Windows\Media\Windows Notify System Generic.wav"), "You have a meeting in ten minutes!"])

Can you be more specific about how it "fails"? It works just fine for me. I hear the notification sound with the message. Tested with espeak and oneCore.

@derekriemer commented on 14 Sep 2017, 08:14 GMT+10:

Does anyone else running this have problems with sayall in browsers?

Can you be more specific? Again, it works just fine for me. Tested in Firefox with eSpeak.

I hear nothing but the text

@derekriemer
Copy link
Collaborator

@jcsteh commented on Sep 13, 2017, 5:54 PM MDT:

Oh blerg. Both of those things fail with eSpeak if you have automatic language switching turned on. If you use oneCore (or turn off auto language switching with eSpeak), it works as expected.

It looks like eSpeak fails to notify about marks (indexes) if they're immediately followed by a language change. That'll need to be fixed in eSpeak (or worked around somehow).

confirm

@Brian1Gaff
Copy link

Brian1Gaff commented Sep 14, 2017 via email

@ruifontes
Copy link
Contributor

ruifontes commented May 16, 2019 via email

@ruifontes
Copy link
Contributor

ruifontes commented May 16, 2019 via email

@michaelDCurran
Copy link
Member

michaelDCurran commented May 16, 2019 via email

@LeonarddeR
Copy link
Collaborator

LeonarddeR commented May 16, 2019 via email

@LeonarddeR
Copy link
Collaborator

Hmm, OneCore is an interesting one. It tries to send pitch using ssml, but in this case, it shouldn't.

I think it should be fixed as follows:

  1. _convertProsody, convertRateCommand, convertPitchCommand and convertVolumeCommand should be moved from _OcSsmlConverter to _OcPreAPI5SsmlConverter
  2. I believe convertRateCommand, convertPitchCommand and convertVolumeCommand all should return None on _OcSsmlConverter in other for these commands not being handles using ssml.

However then still, I'm afraid things won't work as expected when sending commands, as they aren't processed in the speak function. I'm afraid I'm too unfamiliar with synthesizer drivers to fix this ASAP.

@jcsteh
Copy link
Contributor Author

jcsteh commented May 16, 2019 via email

@LeonarddeR
Copy link
Collaborator

Are you saying OneCore shouldn't use SSML for PitchCommand, etc.? If so, why? SSML is the ideal fit for inline speech prosody commands.

Isn't the problem with SSML that it doesn't support the full rate and pitch range that is supported with the prosody commands? Or am I just misunderstanding something?

@jcsteh
Copy link
Contributor Author

jcsteh commented May 16, 2019 via email

@LeonarddeR
Copy link
Collaborator

LeonarddeR commented May 16, 2019 via email

@feerrenrut feerrenrut modified the milestones: 2019.2, 2019.3 Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.