One possible solution to John’s challenge is a speech recognition package. These generally come in two forms: speech recognition software, and real-time speech-to-text software. John was happy to make some corrections comparable to the amount necessary with OCR.
Speech recognition software requires training so that the software can recognise your voice and interpret what you're saying accurately. For most programs, this means reading a specific passage of text (say, three pages of Alice in Wonderland) out loud in the voice you want recognised. That way, the software calibrates the words it expects to hear against the speaker’s voice. The problem is that calibration isn’t usually possible for pre-recorded speech.
Additionally, most packages cost more than $100, and John would prefer not to pay that much. Although there’s work on an open-source speech recognition engine, it’s not yet available.
Speech to text
Another technology, real-time speech-to-text, is usually used for a small range of words, and is more suited to computer commands and similar, rather than long passages of speech. Not only that, the cheap and shareware versions of these programs often only recognise American accents, and would have difficulty with the noisiness of John’s recordings.
Transcribe your tapes
Another option is transcription. Many secretarial services also provide transcription, usually priced from $20–40 per hour of audio. While this is an option for small amounts of audio, it can be expensive if you have many hours of recordings.
Dictation software allows for the functionality of a dictaphone: slowing text down so that it can be more easily transcribed. We found a free program called Express Scribe at www.nch.com.au/scribe.
The current best option: clean up the audio
The final option is to clean up the current audio files so that they can be transferred to CD with a sound-quality that's good enough to listen to while driving.
To clean up audio, the sound file must be imported into a sound-editing software package before adjustments can be made. Some sound-editing programs allow you to remove hisses, pops and background noise, or isolate the vocals (often to remove them, so you can make your own karaoke tracks, for example). This requires a bit of fine-tuning, however it’s considerably less hassle than transcribing text or using a speech-to-text software package.
We tried two speech recognition packages using a poem that John sent to us, without success, so we suggested that he try some sound-editing software. We recommended the open-source package, Audacity, and a freeware program, Waverepair. Both these packages feature specialised tools to help clean up audio recordings and are highly recommended on audio enthusiast sites.
To remove hiss, John provided the programs with a few minutes of hiss with no signal, so that the software knew what to remove. Then he processed his sound files to remove enough hiss without affecting the audio signal he wanted to keep. Older sound-editing packages that John had tried used a filter spectrum but these were harder to use and much less effective at eliminating hiss.
John has so far tried both sound-editing packages, and has found that they provide a sharper-sounding audio file than his original taped recording. Apart from removing the hiss, the processed audio is more suitable for recording to CD. He hopes that with a little more practise, he’ll be ready to listen to them while driving.