Releases: Uberi/speech_recognition
Releases · Uberi/speech_recognition
Version 3.8.1
Lots of changes since June! Summary below. Get all of these and more with a quick pip install --upgrade SpeechRecognition
.
- Snowboy hotwords support for highly efficient, performant listening (thanks @beeedy!). This is implemented as the
snowboy_configuration
parameter ofrecognizer_instance.listen
. - Configurable Pocketsphinx models - you can now specify your own acoustic parameters, language model, and phoneme dictionary, using the
language
parameter ofrecognizer_instance.recognize_sphinx
(thanks @frawau!). audio_data_instance.get_segment(start_ms=None, end_ms=None)
is a new method that can be called on any AudioData instance to get a segment of the audio starting atstart_ms
and ending atend_ms
. This is really useful when you want to get, say, only the first five seconds of some audio.- The
stopper
function returned bylisten_in_background
now accepts one parameter,wait_for_stop
(defaulting toTrue
for backwards compatibility), which determines whether the function will wait for the background thread to fully shutdown before returning. One advantage is that ifwait_for_stop
isFalse
, you can call thestopper
function from any thread! - New example, demonstrating how to simultaneously listen to and recognize speech with the threaded producer/consumer pattern: threaded_workers.py.
- Various improvements and bugfixes:
- Python 3 style type annotations in library documentation.
recognize_google_cloud
now uses the v1 rather than the beta API (thanks @oort7!).recognize_google_cloud
now returns timestamp info when theshow_all
parameter isTrue
.recognize_bing
won't time out as often on credential requests, due to a longer default timeout.recognize_google_cloud
timeouts respectrecognizer_instance.operation_timeout
now (thanks @reefactor!).- Any recognizers using FLAC audio were broken inside Linux on Docker - this is now fixed (thanks @reefactor!).
- Various documentation and lint fixes (thanks @josh-hernandez-exe!).
- Lots of small build system improvements.
Version 3.7.1
As usual, get it with pip install --upgrade SpeechRecognition
- New
grammar
parameter forrecognizer_instance.recognize_sphinx
- now, you can specify a JSGF or FSG grammar to PocketSphinx (thanks @aleneum!). - Update PyAudio to version 0.2.11 - this fixes a couple memory management issues users have been experiencing.
- Update FLAC to 1.3.2 on all platforms - this will make it easier to support more audio formats in the near future.
- Fixes for various APIs on Python 3.6+ - small changes in
urllib.request
behavior made requests fail in certain situations. - Fixes for Bing Speech API timing out due to some backwards incompatible changes to their API.
- Restore original IBM audio segmentation behaviour - previously, it would stop recognizing after the first pause. Now, it will recognize all speech in the input audio, as it did before IBM's changes.
- Fix links in PocketSphinx docs and library reference. Add-on language models now available from Google Drive, including the now-officially-supported Italian model.
- New troubleshooting entries for JACK server in README.
- Documentation and build process updates.
Version 3.6.5
Quick bugfix for PortableNamedTemporaryFile
:
- Fix file descriptor opening on Python 2.
- Add tests for Sphinx keyword matching.
Version 3.6.4
Bugfix release!
- Fix
tempfile.NamedTemporaryFile
on Windows, by replacing it with aPortableNamedTemporaryFile
class. Previously, it didn't necessarily support the file being re-opened after originally opened. - Documentation/troubleshooting improvements (thanks @hassanmian!).
- Add support for 24-bit FLAC audio files (thanks @sudevschiz!).
- Fix
phrase_time_limit
being ignored forlisten_in_background
(thanks @dodysw!) - Added lots of new audio regression tests.
- Code cleanup for tests and examples.
Version 3.6.3
Version 3.6.0
This is more of a maintenance release, but a few features slipped in as well:
- Support for the Google Cloud Speech API with
recognizer_instance.recognize_google_cloud
(thanks @Thynix!), plus documentation and examples. - Automatic sample rate detection in
speech_recognition.Microphone
- this should fully resolve all the "Invalid sample rate" issues from PyAudio. - Project now has automated tests and continuous integration with TravisCI. It's pretty nifty, and has already caught a few things during development!
- Keywords example for
recognizer_instance.recognize_sphinx
. - Documentation improvements and updated advice in troubleshooting and library reference.
- Bugfix - Google Speech Recognition sometimes didn't return the text with the highest confidence (thanks @akabraham!).
- Bugfix -
EOFError
upon encountering malformed audio files; a proper exception message is now given. - Updated FLAC binaries for OS X.
- Bugfix - invalid FLAC binary path on OS X (thanks @akabraham!).
- Code cleanup.
Version 3.5.0
- Support for the Houndify API with
recognizer_instance.recognize_houndify
(thanks @tb0hdan!). recognize_sphinx
now supports keyword-based matching via thekeywords=[("cat", 30), ("potato", 45)]
parameter.- The second number in each pair is the sensitivity, which determines how loosely Sphinx will interpret speech to be those keywords - higher numbers mean more false positives, while lower numbers mean a lower detection rate.
- A new example for keyword matching is now available.
- BREAKING CHANGE: API.AI STT API IS BEING SHUT DOWN SOON. (source)
- For now, the
recognize_api
function will keep working if you're on a paid API.AI plan, and we will not be removing it until the service is shut down entirely. - It is best to transition to another backend as soon as possible. I recommend Microsoft Bing Voice Recognition or Wit.ai for previous API.AI users.
- For now, the
phrase_time_limit
option for listening functions, to limit phrase lengths to a certain number of seconds.- Support for operation timeouts with
recognizer_instance.operation_timeout
- this can be used to ensure long requests always take finite time. recognize_ibm
now opts out of request logging by default, for improved user privacy (thanks @michellemorales!). This is a breaking change if you previously relied on request logging behaviour.- Bugfix -
listen()
sometimes didn't terminate on finite-length streams. - Bugfix - Microsoft Bing Voice Recognition changed their authentication API endpoint, so that required some small code updates (thanks @tmator!).
- Bugfix - 24-bit audio now works correctly on Python 2.
- Update Wit.ai API version from deprecated version.
- A bunch of documentation updates, fixes, and improvements.
Version 3.4.6
Bugfix release.
Changes:
- api.ai now requires the
sessionId
field, so we'll just add that in (thanks @jhoelzl!). - Improve documentation a bit.
- Various other small fixes.
Version 3.4.5
Changes:
- Bug fix: non-24-bit audio wasn't converted properly to 16-bit audio on Python 2, due to the new 24-bit audio shim. Thanks to @jhoelzl for reporting!
Version 3.4.4
Maintenance release:
- Python versions less than 3.4 don't support 24-bit audio properly. We now have pure-Python shims that will allow 24-bit audio to work on those old Python versions, though they will be somewhat slower. Thanks to @danse for reporting the issue!
- Added updated Pocketsphinx binaries and Pocketsphinx installation procedures to match improvements on their end.
- Fix Unicode file paths on Windows.
- Fix caching in
recognizer_instance.recognize_bing
. - We now use the Manylinux Docker image for building FLAC. Hopefully, this will make building universal Linux binaries easier for packagers.