Basic-level classes recognition in a broad-coverage ontology

Massimiliano Ciaramita
Department of Cognitive and Linguistic Sciences
Brown University

Lexical semantic information in natural language processing (NLP) 
is often expressed as category membership; e.g. that "Albert 
Einstein" is a "person." This information can be useful for 
dealing with sparse data problems in several applications such as 
word sense classification or syntactic parsing. Categories 
typically used in NLP are mainly of two kinds: very specific 
(word sense level) or very general (named-entity level).
In this study we investigated the problem of finding levels of 
abstraction that lie in between these two extremes. Our goal is to 
find an intermediate, or "basic", level that is the most 
informative according to speakers' judgments. We present results 
from an experiment based on an existing broad-coverage ontology of 
the English language - Wordnet - which shows that speakers are 
often very consistent among each other in deciding which level is 
more informative. We then formalize the task of finding this 
informative level automatically in the Wordnet ontology as a 
ranking problem and show several methods of varying complexity 
that correlate well with the speakers' data.