|November 1998, no. 26|
This newsletter contains a discussion of the following topics:
In attendance: Bas Aarts (London), Jan Aarts (Nijmegen), Francisco Gonzalvez Garcia (Spain), Sylviane Granger (Louvain), John Kirk (Belfast), Graeme Kennedy (New Zealand), Christian Mair (Freiburg), Charles Meyer (Boston), Gerry Nelson (London), Yiben Ni (Singapore), Vincent Ooi (Singapore), Nelleke Oostdijk (Nijmegen), Josef Schmied (Chemnitz)
The 1998 ICE meeting took place at the annual ICAME meeting, this year hosted by John Kirk at the beautiful Slieve Donard Hotel located in Newcastle, Northern Ireland. We discussed a number of issues at the meeting, including the release of ICE-GB, the status of ICE CDs, Edgar Schneider's proposal to publish ICE Handbooks in a series he edits for John Benjamins, ideas to publish another edited book on ICE, and my proposals for future releases of ICE components.
Below I detail the results of our discussion of these agenda items.
The release of ICE-GB and other ICE components
As I'm sure many of you have heard, ICE-GB has been released and is being distributed (with ICECUP) by the Survey of English Usage. Each ICE team will receive one free copy of the CD containing ICE-GB and ICECUP. By now the director of each ICE team should have received a letter from Gerry Nelson with a release form that needs to be signed and returned to the Survey before a copy of ICE-GB will be sent out. For more details about ICE-GB and ICECUP, see the Survey Web page, where you can also download the latest version of ICECUP as well as an ICE-GB sampler.
There are two other components of ICE that will soon be released: ICE-New Zealand and ICE-East Africa. I have already received ICE-NZ and expect to receive ICE-East Africa in December. I am presently in negotiations with the Norwegian Computing Centre for the Humanities to distribute these two components in lexical form on a single CD. As soon as I have more details, I'll let everyone know.
I'm hoping to arrange some kind of social event at next year's ICAME conference in Freiburg to celebrate the release of these three components of ICE. Congratulations are in order to all of those responsible for the first release of these ICE components!
Gerry Nelson reports that work on ICE CDs is ongoing. The CD for ICE-Australia has been completed, and other CDs are in the works. ICECUP is being designed to be able to read the CDs and play back sound files aligned with their transcription. The format of the CDs is described in last year's newsletter. If you'd like to contribute a CD, read the section below on collecting and transcribing texts, which describes how to create sound files that ICECUP will be able to read.
Last year, all ICE teams received a letter from Edgar Schneider (Regensburg, Germany) in which Schneider proposed to include ICE handbooks in a series that he edits for John Benjamins. Each regional component of ICE would have its own handbook, which would contain a listing of texts, speakers, and writers included in the component; information about the status of English in the country the component represents; and perhaps reports of linguistic analyses based on the component.
There was general support for Schneider's proposal, with some concern about the cost of books in the series. I spoke with Schneider at a recent conference, and he mentioned that a number of ICE teams had been in contact with him. If you haven't contacted Schneider yet, and are interested in doing a handbook for the series, you can write Schneider at: email@example.com
Tips for collecting and transcribing texts for spoken components of ICE
For those teams still collecting samples of speech for inclusion in their component of ICE, Janet Holmes (Corpus Director, Professor of Linguistics, Victoria University of Wellington) has some suggestions for collecting speech ("Notes on collecting conversations for the ICE-NZ Corpus") based on her experiences creating ICE-New Zealand. In this paper, Janet also discusses how collecting tapes of spoken English can be worked into class assignments.
For the transcription of speech, I have had great success using a program that Gerry Nelson recommended: "Cool Edit 96". Gerry has used this program to digitize all of the spoken samples included in ICE-GB, and to prepare the ICE CDs that he has been working on.
"Cool Edit 96" is a shareware program that is quite inexpensive (U.S $25-$50, depending upon whether you want the full or lite version). The program requires a Windows based computer with a sound board. Not only does the program enable you to digitize tape recordings by simply patching your cassette recorder into your soundboard, but once a tape is digitized the program allows you to replay short segments of a recording for purposes of transcription. A demonstration version of the program can be downloaded from Syntrillium Software Corporation.
I have found this a much better way of doing transcription than using a traditional transcription machine. These machines, from my experience, have bad audio quality, are overpriced, and become broken after a year or two of use. The trick to digitizing speech is not making the quality of a recording too good. If the quality is too high, you end up creating huge sound files that take up multiple MBs of disk space. "Cool Edit 96" saves in various file formats at varying levels of quality. All my digital recordings are mono, 8 bit, 16000 Hz. This is more than adequate quality for transcription.
The screen capture below illustrates how "Cool Edit 96" looks when a digitized sample is loaded into the program. A wave form for the sample appears, and you can select parts of the wave form (the highlighted blue vertical line) and play only this section of the sample. Selected sections can be replayed until an accurate transcription is achieved, and transcriptions can be done with any word processing program opened simultaneously with "Cool Edit 96". The screen capture below contains a 2,000 word monologue from ICE-USA. Saved according the file specifications listed above, this sample takes up about 12.5 MB of disk space.
"Cool Edit 96" can also be used to digitize speech from analog cassettes. You simply patch your recorder into your soundboard, turn on the cassette player, and click "record". After you've recorded a sample, the program will ask you what file format you want to save the recording in, and you simply select the specifications you want (in this case, Windows PCM .wav, mono, 8 bit, 16000 Hz).
To use "Cool Edit 96" to prepare sound files for inclusion on ICE CDs, Gerry Nelson recommends the following procedure (adapted from a from a recent e-mail message that Gerry sent me):
1) Save all files in Windows PCM format (as .wav files), mono, 16 bit (not 8 bit, as described above for transcribing texts), 16000 Hz.
2) Each text unit from each sample needs to be selected, copied, and saved into a separate file. These files are named f001.wav, f002.wav, etc. You have to experiment when selecting text to see whether what you've selected corresponds to a given text unit, but with practice, I've found that this gets easier and easier to do.
3) It is often necessary to select several text units together - usually in dialogues, where units may be very short, speakers talk fast, etc, making it impossible to separate the units of sound. Also, with overlapping speech, units must be combined. These are saved as, say, f001-006.wav (corresponding to text units 1 to 6 inclusive). For ICE-GB, it was often necessary to combine up to 20 units in a single file, where there are lots of overlap.
4) The file names must always have 3 digits after the 'f', and 3 digits after the hyphen. So f020-029.wav, not f20-9.wav or f020-29.wav, or any other combination. These files are then placed into a directory named after the sample they're taken from (e.g. S1A-001).
If you have any questions about using "Cool Edit 96" for either transcribing speech or preparing an ICE CD, please contact either me or Gerry.
ICE papers at next year's ICAME conference in Freiburg, Germany
Christian Mair (ICE-Caribbean) is hosting next year's ICAME conference in Freiburg, Germany. Christian has told me that he is willing to have a special ICE session at the conference if there is enough interest from ICE participants. The first circular for the conference has been sent out. If you wish to attend and did not receive a circular, you can contact Christian at: firstname.lastname@example.org
I would encourage all ICE participants planning on attending the conference to consider giving an ICE-related presentation. Since many components of ICE are fairly well developed, I suggest that people give presentations reporting the results of actual analyses of ICE corpora, and where possible, comparisons with other ICE components. I'm planning to discuss pseudo-titles (constructions like Panamanian strongman Manuel Noriega) in the genre of press reportage from as many ICE components as possible. This construction is very common in American press reportage, and I'm interested in determining the extent to which it can be found in the reportage from other countries.
A proposal for an edited book on ICE
Because many ICE components are either finished or nearing completion, I think it would be a good idea for us to consider an edited volume in which we report actual analyses of ICE corpora. The papers given at the ICAME conference in Freiburg could be the basis for the volume, as well as additional papers from those not planning on attending the conference.
If you would be interested in contributing to the volume, please send me an e-mail message in which you give me a tentative title, plus a brief description of your paper. Even if you don't have a specific paper in mind at this time, let me know if you are at least potentially interested in contributing. Analyses of individual ICE components or comparisons across components are most welcome.
Once I receive titles and descriptions from people, I'll begin contacting potential publishers. My plan is to first contact Oxford University Press to see if they would be interested in a companion volume to the book that Sidney Greenbaum edited a few years ago. If Oxford is not interested in considering a proposal, I'll contact other publishers.
This volume is a good way for us to show the corpus linguistics community examples of the kinds of linguistic analyses that can be conducted on ICE corpora.
Future developments in the ICE Project
Last spring I circulated an e-mail message discussing the future of the ICE Project. In this message, I sketched what I saw as three possibilities:
1) We continue the status quo, with each team working on its component of ICE indefinitely and releasing it when it is complete.
2) We set an ending date for the ICE Project (say, 2000) at which time we release whatever is finished and call an end to the project.
3) We set a date for an interim release of ICE (again, say, 2000) at which time we release what is available. After this release, teams who wished to could continue working on their components until the component is complete.
At the ICE meeting, we discussed all of these options, and the consensus was that we go with option (3) and release an interim version of ICE in 2000. I should also add that many ICE members who could not attend the meeting sent me an e-mail also in support of option (3). What this means is that each ICE team should set as a target date the year 2000 for an interim release of the entire ICE corpus. We need to work out the logistics of this, but we have time to do this.
University of Massachusetts at Boston
100 Morrissey Blvd.
Boston, MA 02125-3393