TC Sound- Transcribing Audio Collections Instructions

The Smithsonian holds over 150,000 audio recordings - in various formats, and dating from the late 19th century to the present - in its many museums, archives, and libraries. Transcription of these collections will increase accessibility, unlock hidden content, and preserve historical audio assets for future generations. What will we learn when #WeListenTogether? Learn more about the background & impact of #TCSound -- and share your tips, discoveries, & questions -- by following along with us on social media using our campaign hashtags. 




TC Sound - Transcribing Audio Collections Printable Instructions Cheat Sheet - Click Here

You can also find an example of a fully transcribed TC Sound project here for reference.



Our Transcription & Review Process


How the process works

We seek to balance quality and speed with our transcription process - which of course is still evolving as we continue to develop this service. At the moment, this is how our system works:

1) Anyone can start transcribing or add to a transcription of a document.

2) Once a volunteer decides they’ve “finished” and they’re ready for review, a different volunteer (who must have an account on the site) can review the transcription and either send it back for edits, or complete the transcription.

3) The finished transcript is sent to the Smithsonian, where it may be used immediately, or undergo additional work.

Return to top



Navigating the Sound Transcription Project Pages

Audio recordings are broken into manageable segments for transcription, with each segment becoming a project page (much like textual collections in Transcription Center). The segments are determined by breaks (silences) in the recording. Each project page displays a player on the top half of the page, which includes the audio wav file, and buttons for manually playing, pausing, and skipping forward and backward in the recording. Directly below this, are the keyboard shortcuts available for inserting speakers, timestamps, and manipulating the recording (i.e. start, stop, go back 10 seconds, go forward 10 seconds, increase or decrease playback speed, etc.).







Transcribing Audio Recordings

Just as with textual projects in the Transcription Center, volunteers can choose to begin working on any segment/project page within an audio collections project. Just be aware that if you start transcribing or reviewing at an audio segment in the middle or end of a recording, you may need to listen to the previous segment to make sense of what is being said. Transcriptions and notes left in other project pages may also include helpful information, such as identification of which voice in the recording matches which speaker tag, or the correct transcription of a tricky word that might be difficult to hear. 

Once you've chosen a project page/segment you can begin listening and transcribing.

All of the words spoken must be transcribed, your transcription must include correctly placed speaker tags and timestamps, and the transcribed text captions must align with the recording as it plays in the player. As long as these requirements are met, however, the order in which you transcribe, apply tags, etc. does not matter. Feel free to transcribe the recording in any order that works best for you. You may want to listen to the entire segment before transcribing in order to get comfortable with the recording. You may want to enter timestamps and speaker tags as you transcribe, or you may prefer to add them in after you've transcribed a section. Find a rhythm that works for you, and be sure to share with your fellow volunpeers any tips and tricks you have using #TCSound.


Here's a video tutorial on getting started with TC Sound projects:



Example Steps 

Here is an example step by step process of how you may choose to transcribe an audio segment. For an overview, please see the TC audio instructional video above as well. 

STEP ONE: Click inside the "Transcription Form" box and press ENTER-- this will automatically insert the beginning timestamp of the segment you're working on. 

STEP TWO: Using your mouse to click the play button OR the keyboard shortcut Ctrl + Spacebar, begin playing the recording and begin typing what you hear. Use the various keyboard shortcuts to start/stop the recording if you need time to type the words you've heard, go back 10 seconds to re-listen, or increase or decrease playback speed (this can be especially helpful when trying to catch everything that is said!). 

STEP THREE: After you have transcribed a few sentences or a section of the segment, SAVE your work. With each save, your transcription will appear as a caption in the player. Preview this by starting the segment over and reading along with your transcribed captions as you listen to the recording. 

STEP FOUR: As you listen and read, identify any words you've missed, speaker tags, or sound tags (such as [[laughter]]), or timestamps that need to be added into your transcription and edit these in the transcription box. 

STEP FIVE: When the entire segment is transcribed and the recording has reached the end of the segment, insert a final timestamp.

STEP SIX: Preview the segment. If the captions align with the recording as it plays, speakers are identified and tagged if applicable, and all words are transcribed, hit "COMPLETE AND MARK FOR REVIEW."


Inserting Timestamps

This can be tricky, but correctly inserted timestamps are essential to synchronizing the transcribed text with the audio recording. In other words, timestamps tell the system where to put the transcribed text alongisde the recording at the correct place the text can be heard. Meaining you can both listen to the recording and read the words being said simultaneously (thus making these recordings accessible!). Timestamps are also included in the pdf transcriptions, additionally providing information on where in the tape/recording the text transcribed can be heard. 


Check out our video on inserting timestamps:



In order to make sure that transcription of audio recordings could be collaborative, and edited, we decided not to make timestamps automatic in TC sound projects. Instead, we ask that you insert timestamps manually using the CTRL + i keyboard shortcut. 

Because every audio recording is different, there is no minimum requirement for how many timestamps should be inserted per TC sound segment. A good rule of thumb, though, is to insert a timestamp every 3-5 seconds (or every 1-2 phrases spoken). The easiest way we've found to do this is to transcribe the text without worrying about timestamps, then go back -- playing the segment from the beginning and following along with your transcription as you listen. Watch the text in the player preview and insert timestamps in your transcription where there are natural pauses. If there is too much text (e.g. the text is taking up the entire player box), then you need to enter more timestamps to break up the text. If the text changes too quickly as you listen and disappears from the player too often as the recording plays, then maybe you have too many timestamps. Just be sure to insert/edit timestamps at the correct corresponding point in the recording. Save your work and check it by playing the segment and folllowing along with your transcription again. 


Transcribing Speaker Sounds, Background Noise, Interruptions, & Unknowns, etc. 


Filler sounds and relexive speech (ums, uhs, ughs): It is not required to transcribe every instance a filler sound is heard, especially if there are many (try to limit this to 2-4 instances per segment). Please only transcribe these sounds if they are used for stalling or for a significant pause in the conversation, 

Punctuation and Grammar: Please include punctuation as best you can. If it is clear where sentences end and begin, indicate this with corresponding punctuation, and include ? at the end of questions. Commas, semi-colons, etc. are not absolutely necessary, but can be included if you hear pauses or other speech indicators that would result in this kind of punctuation. If a phrase or sentence by a speaker is being yelled or spoken emphatically, you can indicate this with an ! or you can put [[yelling]] etc. in brackets to signal the tone/volume. Please also capitalize proper nouns, and include apostrophes in contractions that are spoken. Watch out for hononyms as well, and make sure to transcribe the correct spelling of the word spoken (for example is the speaker saying "too" or "to," "your" or "you're," "there" or "their."  Remember, we're not looking for perfection. Puncutation and grammar does not have to be perfect, especially since it can often be difficult to determine correct punctuation when transcribing an audio recording, and we do not always speak with perfect grammar.  Aim for readability and accessibility--- in other words, transcribe the content in a way that reflects not only all the words that are said, but that also conveys to the reader the meaning/intention in the recording (i.e. are they making statements, questions, or emphatic demands, etc.). Reach out to the TC team or your fellow volunpeers anytime with questions. 


Numbers, dates, ordinals: Please do not spell out numbers and dates unless the speaker does so. For example, normally transcribe "1941" or "22 years ago," unless the speaker says "in nineteen-hundred and forty-one." In that case, it's fine to spell out the date to indicate that is how it is being spoken. For instances of "first in line" or "1st in line" you can transcribe either way you'd like. The same goes for the "third of December" or "3rd of December." As long as it is clear to the reader what is being said, the spelling out of these words is up to you. 


Feedback words, agreement/disagreement in conversation:  If an affirmation such as “mm-hmm, uh-huh,” is spoken as a clear response/answer to a question please transcribe this, but if the speaker is simply saying this over and over again along with the conversation, it is not needed to mark this each time. You may also note/tag whether the feedback word is affirmative or negative. Ex: uh-huh [[affirmative]] or Mm-mm [[negative]].


Incomplete sentence, hanging phrase, parenthetic statement, or interruption: end the phrase or statement with an em dash ( -- ) . Please do not use ellipses (…) as this may indicate to a reader that something was left out of the transcription 


Sounds (other than speaker words): please indicate significant noises and sounds with brackets. Examples: [[laughter]], [[eerie music playing]], [[clapping]]. Coughing, sneezing, etc. does not have to be included in the transcription, nor does continuous background noise (if desired, continuous background noise could be noted at the beginning of the segment, such as [[background talking, in crowded area throughout interview]] etc.) Please do note significant background talking or noise as [[background noise]] or [[background voice]] or you're welcome to abbreviate using [[BGV]] or [[BG]]


Unintelligible words: please indicate these with [[??]] so that other transcribers can easily locate missing areas of text; if it is impossible to hear what is being spoken please indicate this with [[inaudible]]


Cross talk/speaking over each other: If speakers are consistently speaking over one another, please transcribe their speech as best you can on separate lines with their speaker tags and indicate that they are speaking over each other with [[Cross Talk]] or if applicable [[side conversation]]


Unsure who's speaking? You can always use curly brackets { } and Speaker 1, Speaker 2, or Unknown Speaker 1, etc. if it's unclear who is speaking, or if there are no specific speaker names listed for the project. Ex: {Speaker 1} {Unknown Speaker}. If you are able to identify a speaker that is not specifically named/listed for a project, let us know! We'll then be able to update the project information and create a speaker tag. 


Insert spaces and lines between each speaker/phrase as you transcribe – make it readable


Transcribing Song Lyrics


Many historic audio recordings include music and songs. Please refer to these instructions from the Center for Folklife and Cultural Heritage (CFCH) Ralph Rinzler Folklife Archives and Collections to transcribe lyrics and musical content in TC Sound projects. 


Reviewing TC Sound Projects


The process of transcribing and reviewing TC Sound projects is the same as textual projects. Pages or segments are transcribed and marked for review, those pages or segments are then reviewed, and if mistakes are found/edits are needed, then the reviewer can "reopen for editing" and fix any errors; then when all is corrected to the best of volunteers' abilities, the page can be "marked complete." 

As you are reviewing TC Sound projects, we suggest first playing the recording and following along with the transcribed captions in the player. This way, you can review the entire segment and watch for any words missed or transcribed incorrectly, any missing speaker tags or background noise, etc., and any missing timestamps. For instance, as you listen and read along, is there too much text displayed in the player, making it cramped and blocking the WAV file image? If so, then the segment probably needs more timestamps. Then you can reopen the segment for editing, fixing any errors and inserting tags and timestamps where needed. Leave notes for other volunteers in the notes box, and reach out to with any questions!