6,775,358: Method and system for enhanced interactive playback of audio content to telephone callers.
One of the core inventions in this patent was the “state language” as described in the Voice Engine section of the Technology page. This allowed us to:
7,028,252: System and method for construction, storage, and transportation of presentation-independent multi-media content.
The long-term vision for Indicast was a platform for presenting content to the user regardless of original content type. For instance, textual content could be converted to concatenated speech. We also wanted more types of metadata on content to make better choices of content for the user (e.g., I want to hear any content mentioning “Google” and “cars”). This patent was based on our “Omni-View XML” schema and related persistence form that could describe any type of data in one or more forms (e.g., audio and a text transcript) and, most importantly, its related metadata. We leveraged work from the Dublin Core metadata initiative and others.
7,440,555: Method and apparatus for rendering audio streams from textual content for delivery to a user.
We devised a powerful new way of generating concatenated speech. Described simply, we scanned the text and looked for the longest sequence of words that were human-voice pre-recorded and then presented those as the audio. This tool allowed us to generate early versions of our “voice user interfaces” (VUI) that played text-to-speech (TTS) to start. As we added pre-recorded audio to our libraries, the TTS was gradually replaced by human-voice pre-recorded content. An example of this was our movie times, where we presented the movie title in TTS at first, and later introduced human-voice pre-recorded titles. We only had to add the pre-recorded audio segment to the appropriate library, the service picked it up automatically, and the user experience improved. The Marketing team could do this, as no code needed to be modified.