Speech-to-text. Which option suits me best?


A business, department or individual will at times require an efficient process to convert speech-to-text. This may be a healthcare professional documenting patient care, legal professionals producing accurate file notes or a student conducting interview research. With remote working looking like it’s here to stay meetings conducted via Zoom, Teams or other video platforms are now part of a typical working week. When a transcript is required, Accuro’s outsourced speech-to-text solution provides a cost effective and secure way of documenting the meeting accurately.

Presently there are two viable choices, using an outsourced transcription business with a panel of experienced transcribers or an automated speech recognition solution.

Accuro has, for the last 18-months, analysed the performance of speech recognition software and compared it to the high volumes of transcriber output, being its core business. This analysis specifically concentrated on commercial audio files completed in a work environment, including telephone and video calls.

The speech recognition system we finally opted for was the Microsoft Azure engine with the process having two major benefits, speed and cost. The ‘rub’ is quality; a typical 5-minute single voice audio file returned an accuracy rate of between 55% and 65%, not taking into account the ability to paragraph and follow a template, and punctuation errors. Moving forward, the Azure system allows Accuro to create specialised profiles within the platform which perform better, so a medical audio is converted to text within the medical profile, legal in legal etc.

We concluded that customers who require transcription of single voice in a data dump of block text (no formatting) which they can manipulate can do so cheaply and very quickly. We refer to this as ‘rough draft’.

So how can we make speech-to-text software relevant to business?

The decision to outsource transcription probably means that having to heavily manipulate or edit documents is avoided. Assuming this to be correct, Accuro has launched a speech recognition + edit service. This hybrid is made up of audio through Microsoft Azure (profile based) which is then edited by professional proofreaders. This returns block text documents with accuracy rates over 98%. This is ideal for customers who do not need formatting, e.g., notes into a CMS or can reformat documents into their bespoke templates but doing so in the knowledge the content is accurate.

The speech recognition + edit is a hybrid, allowing businesses reduced transcription costs with quick delivery times.

It should be noted that whilst we are developing multi-voice speech-to-text solutions, presently the quality of multi-voice audio via telephone and video calls limits the accuracy of output within speech recognition solutions.

In summary:

  • A profile based ‘rough draft’ is available for customer requiring a single voice data dump in block text format which needs amending for final copy. Pro’s; quick and cheap.
  • Speech recognition + edit, produces accurate text, again in block text which require formatting. Pro’s; quicker and cheaper.
  • Outsourced transcription produces documents ‘right first time’ which do not need proofreading or reformatting. Also suitable for audio of poor quality, regional or strong dialects, technical and multi-voice.

Examples of templates can be found by clicking here: https://accuro.co.uk/individuals/

The 3 options, at a glance:

Accuro provides excellent medical transcription serviceslegal transcription services, academic transcription services, property transcription services, captioning servicesZoom & Teams transcription services and translation services.

To find out more, please do not hesitate to contact us below.

Share this post