Transcribing Video Interviews – My Approach

Posted in Documentary, Editing, Filmmaking on January 21st, 2014 by Dave


I recently had over four hours of interviews to log and transcribe for a documentary film I’m working on, and although it is a tedious job, it’s a great way to re-listen to what the subject had to say away from filming, running audio, doing the lighting etc. For shorter interviews fitting into shorter projects, I’ll often just put the footage in the timeline, review the clips and  make notes, set markers or make subclips to note the parts of a given clip I want to keep, and then just edit away…

Back in the 1980’s I used to log all my footage with a pen and yellow legal pad. Spreadsheets  and word processing documents followed, but I found this approach cumbersome: You typed on a laptop while pressing play/pause on a video deck with a VHS time code burn of your footage.
When I decided to log and transcribe the footage for this current film, I was sure that technology would come to the rescue with some new software tool that would hopefully integrate with my editing software and allow me to create a transcript that was keyed to a particular clip with timecode,  and that would allow me to add descriptors like type of shot, camera angle, etc. The main thing I hoped for was a way to make it all searchable based on keywords I created. My web search yielded very little. In fact I was astonished that the topic was barely discussed online at all. There were certainly plenty of transcription services online that would do the work, but I wanted to kill two birds with one stone:  review the footage and save some money.

The editing software I use, Sony Vegas Pro 11, has media management software built in which helps you create a database of all your media assets spanning multiple projects, but I found it was prone to crashing, had an interface I wasn’t interested in learning,  and  didn’t really give me what  I needed anyway; which is to say a transcript of the interviewees narrative with time code location along with other descriptors I tend to invent to suit the job. Vegas does allow clips to be named and then renamed into subclips which are searchable, but did nothing to solve my transcription needs.

I wanted the transcriptions so I could develop a “paper edit” before actual editing. The paper edit would basically involve cutting, pasting, rearranging, and deleting the transcript. If I wanted to, I wanted to go “war room” and spread all the pages out on the floor to see the big picture, pace around, re-arrange, muse, ponder…you get the idea.

I tried software called InqScribe which was actually pretty nice;  it allows you to import your clips into it’s interface, view the clips in a preview window,  and insert the time-code of the portion of the clip you were working on into a text box. You then typed away as you watched and listened to the playback of the clip. It also allowed you to slow the speech down while maintaining  pitch – another nice feature. It was 100.00 which wasn’t bad, but then to make the process really speed up you needed to buy a foot pedal which would insert time-code into a text box field when pressed.  Without a foot-pedal, it takes two keystrokes,(CTRL +; in windows), to insert the time-code.  Another $60 – 100 bucks for a foot pedal… and then you still had to slave away with the typing.

Frustrated with my web searches and not wanting to plunk down cash if I didn’t have to, I began cast about for a home-grown solution.

I own an Ipod Touch 4.  I also had on hand a Home Edition of the Dragon Naturally Speaking Speech Recognition software, which cost me $50.00 a couple of years ago. So…I tried an experiment. I laid out all three hours and forty five minutes of   interview clips in the Vegas timeline – all butted together end to end. I rendered the footage out to a mp3 sound file, so I had the audio track only.

I swear, I may be alone out there, but it took me a solid two hours to import that mp3 file into Itunes and then sync it to the Ipod. I’m still not exactly sure exactly the sequence I used and I regret not having written the steps down as I worked, because what should have been a simple task turned out to be frustratingly  unnecessary – and I thank Apple for that.  Basically I had to use the Itunes File Menu to “Add a file to Library”, synced the Ipod, and then did a search for the file on the Ipod. I couldn’t find the file unless I searched for it.

Doubtless others out there may have had no problem with file transfers to their Ipod or phone –  but I sure as heck did.  I felt that Itunes wanted me to live only in their little world and that my own content was not welcome. I guess if I had uploaded my interviews to their store to sell, and charged .99 cents, I would would have been able to buy (or ransom), the dern thing back. Grrrrrrrrrr.

Well, I finally was able to play the interviews back on the Ipod. I fired up the Dragon, opened Word and created a two column table with one column labeled Time-code, and the other labeled Audio. Keeping it simple for now.  I put the headphones on that came with Dragon…and immediately took them off because it made me feel like I was in an Iron Maiden. I thought to myself, why do I need headphones? I took out my USB mike, just a 20.00 cheapie, plugged it in, pressed play on the Ipod and started dictating. And away I went. I did a paragraph, then two, speaking more naturally, (and faster), as I went along. The Dragon was picking up my speech nearly perfectly! Unbelievable. My wife was preparing dinner so the room I was in wasn’t entirely quiet but it didn’t seem to matter. To navigate I said, “Press Tab” , to move from one column to another.  Dragon has a lot of commands for moving around in your document, editing, correcting mistakes etc., but I only used a couple of commands to do my work.

Then I noticed something else about the Ipod interface. It has a jump forward 15 seconds and a jump back 15 seconds feature. If it hadn’t had that feature, it would have been impossible to nudge the playback a few seconds, given that the timeline in the Ipod was nearly 4 hours long. So if I didn’t get the last few words spoken by my interviewee, I could just touch the 15 second review icon and try again. Another great thing: if you’re in podcast mode, (which is where I placed my mp3), you can slow down the speech to half speed or speed it up all the way to 2x. That feature I knew about because I listen to a lot of podcasts speeded up.  But for my transcribing job,  I had it set to .5 playback and I just narrated, (slowly), but accurately and with no typing! Just me holding the Ipod in my left hand and pressing play and pause as I moved along in the dictation. After a while I moved it back to normal playback speed and I had no problem keeping up with the dictation – and neither did Dragon. As a matter of fact, I found the whole process to be relaxing, productive and FAST. The nearly 4 hours of dictation took most of the day, but it wasn’t the monster I had resigned myself to at all.

Now maybe you’re wondering how would I note time-code given that I was listening to an audio-only file. Remember I said I butted all my video clips together on the Vegas timeline? Well, the Ipod had it’s own little time counter and I checked it at various points against my clips in Vegas after I finished my work.  I was within a couple of seconds of my notation in the transcript all the way through my document. Not frame accurate, but definitely close enough for government work! And really, I wouldn’t have been counting frames anyway even if I could see the timecode.

This has proved to be a great solution for me. Others may have different solutions to this task and I would love to hear them. I consider the Dragon software to be fantastic.  Amazing. The software  recognized my dictation correctly at least 98 percent of the time. I had used the software previously but under different circumstances. I found that I just didn’t need it when I was actually composing a document.  That’s because I’m thinking , pausing, and then typing a little. I didn’t feel that speech recognition was all that useful for that kind of work. There would be errors and I’d have to go back and correct them anyway, so the time savings just didn’t add up.

But for rote transcription the combination of the Ipod, USB Mike, and Dragon Software worked beautifully. For my purposes, I didn’t care about small errors as long as the context is clear. My sweet little Dragon.  I feel listened to. I feel heard. Sweet.