Patents by James Chase

6,775,358: Method and system for enhanced interactive playback of audio content to telephone callers.

One of the core inventions in this patent was the “state language” as described in the Voice Engine section of the Technology page. This allowed us to:

  • Track the actual number of seconds of audio content play—important because the user could issue a voice command and interrupt play at any time. The agreements with our content licensees was that we paid by actual seconds of play
  • Review the entire session afterward. This was a powerful tool for finding issues with the voice interface design, and for debugging
  • Determine if there was a problem with a particular audio segment, and users were not listening to all of it before issuing a voice command and interrupting play
  • Allow users who were disconnected to start content play exactly where they left off
  • Build a stateless server. This meant that a user session could float from server to server for each request of a single session, dramatically improving scalability of the overall service.


7,028,252: System and method for construction, storage, and transportation of presentation-independent multi-media content.

The long-term vision for Indicast was a platform for presenting content to the user regardless of original content type. For instance, textual content could be converted to concatenated speech. We also wanted more types of metadata on content to make better choices of content for the user (e.g., I want to hear any content mentioning “Google” and “cars”). This patent was based on our “Omni-View XML” schema and related persistence form that could describe any type of data in one or more forms (e.g., audio and a text transcript) and, most importantly, its related metadata. We leveraged work from the Dublin Core metadata initiative and others.


7,440,555: Method and apparatus for rendering audio streams from textual content for delivery to a user.

We devised a powerful new way of generating concatenated speech. Described simply, we scanned the text and looked for the longest sequence of words that were human-voice pre-recorded and then presented those as the audio. This tool allowed us to generate early versions of our “voice user interfaces” (VUI) that played text-to-speech (TTS) to start. As we added pre-recorded audio to our libraries, the TTS was gradually replaced by human-voice pre-recorded content. An example of this was our movie times, where we presented the movie title in TTS at first, and later introduced human-voice pre-recorded titles. We only had to add the pre-recorded audio segment to the appropriate library, the service picked it up automatically, and the user experience improved. The Marketing team could do this, as no code needed to be modified.