Notes on collecting conversations for the ICE-NZ Corpus

Janet Holmes
Corpus Director, Professor of Linguistics
Victoria University of Wellington

Here are some notes from our experience of collecting conversations. We hope they will be useful to others.

The most difficult data to collect is in many respects the most important, namely "natural" relaxed conversational interaction.

A. No surreptitious recording

A firm decision was taken at the start of the project that all contributors would know that they were being recorded.  There was to be no surreptitious taping.  However, it was sometimes possible to collect recordings without the person recorded being aware that they were being recorded on that particular occasion.   This involved the data collector asking the person in advance if they would agree to be recorded at some future date without necessarily being informed at the specific time of the
recording.  They would be told afterwards and would have the right to veto the use of the tape. This strategy was used for some of the face-to-face conversations and telephone conversations.  But, it is worth noting that the quality  of surreptitious recordings is often dubious, since the microphone cannot always be in the best position for collecting the data.

B. Collecting face-to-face conversations

1. "Natural" situations

2. Data collection strategies C. Collecting telephone conversations

1. What did not work

2. What did work D. Transcription For further methodological discussion see:

Holmes, Janet. 1994. "Methodological problems in collecting spoken New Zealand English." ICE Newsletter 19.

Holmes, Janet. 1996. "The New Zealand spoken component of ICE: some methodological challenges." In Sidney Greenbaum (ed) Comparing English World-Wide: The International Corpus of English. Oxford: Clarendon Press. 163-181.

Here is the wording of the assignment:


Most sociolinguistic research involves collecting data on the way people use language in different contexts. One of the most widely used methods of collecting data involves using a tape recorder to collect samples of speech.  One of the skills you are expected to develop in this course is the ability to collect speech data of a sufficiently high quality to permit detailed phonetic analysis.

One of the terms requirements for this course is therefore that you provide a tape recording of a 30 minute conversation between two adult New Zealanders (defined  for this purpose as people over 16 who were born in New Zealand or who arrived in NZ before the age of  ten). You will  need a tape recorder of reasonable quality in order to collect this speech data.
You may arrange to borrow one if necessary.  The data you collect may be included in a Corpus of New Zealand English. The accompanying sheets should therefore be filled in for all speakers you record. (Note however that contributors to the Corpus are finally anonymous since we use pseudonyms in the Corpus). This tape recording should be handed in to your tutor by July  30th.


1.  The conversation you record should be as relaxed and "natural" as possible. We particularly want people who are grassroots New Zealanders to include in the Corpus: ie we prefer people who have no connection with the university where possible. If you can tape such people we will be very appreciative.

2. The recorded conversation should ideally involve only two or three people. The reason for this is that it gets increasingly difficult to transcribe speech as the number of participants increases. You may be a participant and contribute to the conversation if you are a native New Zealander (defined as someone born in New Zealand or who arrived in NZ
before they were aged ten.)

3. The topic of the conversation is entirely open, but if you need ideas the following topics have proved successful in social dialect research.

        (a) a situation in which you were in danger of death
        (b) your first boyfriend/girlfriend/best friend at school
        (c) your worst holiday job
        (d) your worst day at school

4.  Use C60 tapes. You should attempt to collect data of the highest possible recording quality so that you can easily transcribe sections and analyse the pronunciation of the speakers  as required. Avoid background noise which will reduce the tape's quality such as motor mowers, canaries, and television sets.

5. Ethics

You must inform people if you are tape recording them. In general you should not collect data surreptitiously. However, it is sometimes possible to collect data from close friends by obtaining a general  agreement to tape them surreptitiously at some time in the future and then checking after the recording that they agree to the Corpus including the precise
material collected from them.

6. Background information sheets must be filled in for each speaker and for the interaction as a whole. Please put your name on the top of each of these sheets so we can link them to your tape. Please also put some indication on each of these sheets as to which speaker on the tape they refer to (eg ideally a brief "quote" from the relevant speaker taken from the beginning of the tape; or label speaker 1, speaker 2 etc ).

We would  be very grateful for any useful additional background information you can provide: eg regional background, details of the relationship between the speakers. information on the total number of speakers, any audience present etc.

Although you provide all this information the anonymity of the contributors to the Corpus is protected. We need the information for classification purposes only.


The tape recording will be collected by your tutor, checked for quality, and then returned to you for transcription. This will be used in tutorials and as background material for lectures later in the course.

Do not forget to hand in the accompanying background information sheets on each of your speakers. Label your tape and background information sheets clearly with your name and the date you collected the data.


BEFORE you start
*       Tape recorder and microphone: check they are working.
*       Batteries: check they are OK and carry a spare set.
                If you are using a lead, check that it is plugged in.
*       Switches: check the tape recorder is switched on, and any wall switches are on.
*       Microphone: check this is switched on if necessary. And off at the end.
*       Tapes: use a C60.  They are more robust when replayed.
*       Labeling tapes is crucial: your name and date of recording are essential information.
*       Background information sheets providing relevant information on the person recorded are essential: eg age, ethnicity, where brought up, years overseas etc. See example attached. Make sure you get them filled in fully at the time of recording.
*       Include a line asking  the person being recorded to consent to the tape being used for linguistic research purposes.

When you begin
* Do a sound check or voice level check to check the recorder and microphone are  working and the level is high enough but not too high. Ask those being recorded to count up to ten.
* Don't record with little children present.  Sit towards a corner of the room if possible since this reduces boom effects.
* It is very important to get the TV and radio turned off. Say something like "I wonder if you'd mind if we turned the TV/radio off. It's just that this microphone picks up TV/radio better than anything else so it will be hard for me to hear what you have said."
* Keep an eye on your tape recorder. When the tape is approaching the end turn it over.
* Give people topics/tasks which will absorb their attention


* Microphone plugged in earphone hole or auxiliary output hole.
* People don't keep still and so sound quality varies
* People eat or drink while recording
* Motor mowers interfere with sound quality
* Tape runs out
* Someone switches off tape without you noticing

When you have finished

* Switch off the microphone if necessary.
* Make sure your tape is accurately labeled.
* Makes sure your Background Information (BI) sheets are filled in.
* Identify each Background Information sheet with your name at the top so you don't lose it.
* Push the tabs out and copy your tape before you begin using it for analysis.

Ethical issues
No surreptitious recording
Never record surreptitiously. Always tell people that you would like to record them well in advance so that they can think about the idea and let you know if they are agreeable.

It is sometimes possible to collect data from close friends by obtaining a general agreement to tape them surreptitiously at some time in the future and then checking after the recording that they are agreeable to your using the material collected from them. You must then tell them immediately after the recording  so that they can then veto the use of the tape if they wish to do so. Note that this strategy which is aimed at collecting more "natural" speech does not always yield usable data: the quality  of surreptitious recordings is often dubious since the microphone is rarely in the best position for collecting the data.


The BI sheets ask contributors to sign their name to a clause giving permission to use the recording for linguistic research.  When transcribing material, names should be changed to names of equivalent length and phonological structure to protect the identity of people referred to.


If you use published written or recorded material from radio/TV you should check copyright. Most organisations do not object to use of small amounts of material for student projects.