March 17, 2016 Comments 1 Techidesi

A primer on Speech to Text technology

You can use your voice to control your computer. You can utter commands that the computer will respond to using the speech to text functionality.

A couple of years ago, technologies converting speech to text were considered futuristic. However, current advancements in software and hardware technologies have enabled the successful integration of speech into everyday appliances.

Speech to text systems need high speed computing technologies, robust performance, and high memory capacity

Application of Speech to Text Systems

So, where and by whom can speech to text systems, also known as speech recognition systems, be used?

Let’s take the example of hospitals. These systems can be used for medical documentation. The healthcare provider dictates to the voice recognition application. The words that are captured and recognized are displayed as they are being spoken. Before signing off, the provider can verify the data and edit, as required.

Speech recognition systems would work very well for generating narrative text as part of, let’s say, interpretation of a report, progress note, or discharge summary.

The main problem however is that most Electronic Health Records have not been set up to take advantage of voice to text capabilities and are mostly dependent on keyboard and mouse interactions. Therein lies opportunity.

There is scope for customization as it has been done for pathology dictation, in some cases. These systems make use of voice macros. For instance, the use of the words “normal report”, will trigger the filling up of some default values.

Another example where speech recognition systems could add significant value is in the education of disabled children. Using speech-to-text programs, students with disabilities will be able to work on school assignments independent of a scribe and also not have to worry their handwriting or typing.

These system can also be deployed for language learning, where students can be taught proper pronunciation and helped to develop speaking fluency.

Advancements in voice to text technologies will also help develop more sophisticated security systems.

Future of Speech to text – Challenges And Opportunities

Will the keyboard and mouse be dead a few years from now? Are we on the cusp of a revolution as voice recognition is slated to slowly become more a part of people’s lives?

For years, Bill Gates has been steadfast in saying touch screens and voice to text systems are really the future.

To that end, the four top players in the post-PC market – Apple, Google, Microsoft, and Facebook – possess significant infrastructure for hands-free communication with devices.

Rumour has it that Google and Apple have major plans for implementing voice recognition technology in television products.

There will come a time when users will be able to have a conversation with their device to solve a problem rather than madly tapping the screen, keyboard, or clicking the mouse.

However, as it stands today, voice to text systems may not work as well for every situation and role.

Today, when good voice recognition software is available for under $100 and has been available for this price for a long time, it’s still to gain acceptance and popularity.

Imagine an office with 200 odd employees talking to their computers at the same time. These are quite like the scenes we see in movies depicting telemarketers and stock brokers. Even if the noise level was to somehow become acceptable, there would be other challenges in using voice to text systems. Some of these would be:

Dependence on training to speak in a particular tone, speed, sensitivity to noise are some of the other challenges that will need to be overcome.
Users will need to develop separate set of skills for instructing the machine to, let’s say, rephrase a sentence, or to correct the spelling of a word that has a homonym.
Until users invest time in picking up required skills, the old ways will be quicker. Most people have developed a reasonably good typing speed with practice. Most software display a squiggly line along with help in the form of autocorrect and Thesaurus for misspelled words. Locating errors and correcting them happens quickly. They are able to change the order of the sentence and format the document, all while they write.
For users who program macros and use them frequently may find using the keyboard faster and more efficient.

The process of blending of the physical and the digital world has already begun.

While there are many speech to text applications, sophisticated language understanding systems are yet to be developed – Systems that are able to understand exactly what a sentence means.