Skip to content
breandan edited this page May 9, 2020 · 1 revision

Handsfree Acoustic Development Interface

  • A general purpose voice user interface for IntelliJ IDEA. Use case for blind and RSI users (and distracting coworkers).

  • Using CMU Sphinx-4 for speech recognition and MaryTTS speech synthesis end-to-end voice control in pure Java.

  • Pretrained language models have good recognition accuracy for small vocabulary grammars.

  • Check out this presentation: Using Python to Code by Voice

Features

Idear is currently a work in progress. These are some of the features we have implemented and are currently working on:

Activation

  • User presses button or activates voice control by saying something, “Okay __, help me.”
  • “Hello , welcome to the handsfree audio development interface for IntelliJ IDEA.”
  • “There are a number of commands you can use, for example ‘Open settings’, ‘Find action’, ‘Open file’...”

Visually Impaired Mode

  • Action reader. When user enables a flag, any selecting menu options or actions read back to user.
  • Status updates. User says, “Run application”. Plugin responds, “building project”, “compiling application”, “running project”.
  • Text selection. Plugin reads back selected region (rapidly).
  • User says, "Where am I?". Plugin responds, "You are inside method X, on line Y".

Interactive Features

  • User says, “open Analyze”. Plugin responds, “Would you like to ‘Inspect Code’, ‘Code Cleanup’...”
  • User says, “open tip of the day”. Plugin responds, “Did you know that... ”
  • User says, “activate intentions”. Plugin responds, “Would you like to ‘Invert if condition’, ‘Remove braces’,...”

IDE Features

  • Understand numbers (one, two , three, four, five, six…)
    • Jump to text inside the editor window
    • Goto line numbers
  • Understand free form language
    • Finding text in the editor
    • Performing arbitrary actions
  • Menus (open + file, edit, view, navigate, code, analyze, refactor, build, run, tools, version control)
  • Navigation keys (“Page Up”, “Page down”, “line up”, “line down”, “go left”, “go right”)
  • Fixed actions (“extract method”, “expand selection”, “shrink selection”, “focus project”)

Code Features

  • Code generation (generate for-loop, getter, setter…)
  • Refactorings
    • Extract method
    • Extract parameter
  • Show intention actions
  • Auto-completion
  • Speech typing

To-do List

  1. Define a grammar & vocabulary

    1. For example: dialog.gram
  2. Binding speech results to Actions

    1. Current hack: VoiceControlAction.java

    2. How to trigger actions programmatically

  3. Speech synthesis API

    1. Convert text (in code, comments, menus) to audio

    2. SplitCamelCaseText -> Split Camel Case Text

    3. Interruptible audio input / output

  4. GUI Navigation

    1. Extracting menu text

    2. Selecting menu items

  5. Code creation

    1. Java language features

    2. Refactoring actions

    3. Code selection actions

    4. Spelling letter-by-letter

  6. Code navigation

    1. File based navigation

    2. Line based navigation

    3. Searching for symbols

Reference Materials