Scalable Ontology Systems
Event Type: Seminar
Date: March 06, 2008
Time:
10:00AM
- 11:30AM
Venue:
S-3-028
Abstract:
Ontologies have become commonplace as a way to represent both knowledge
and data. The bio-medical field is a clear success story, with many
bio-medical ontologies testing the limits of current knowledge
representation systems.
Such systems typically use a relational database representation to store
ontological data. However, access patterns associated with querying and
reasoning about ontologies are substantially different than those of
traditional database queries, to the extent that performance degrades
significantly when using relational models.
In this talk, I will address three of the main challenges in building a
scalable ontology system. First, I will describe an efficient
alternative to reification for RDF data annotated with information such
as probabilities, validity intervals or provenance. The Annotated RDF
framework allows a user to add any type of partially-ordered metadata to
an ontology, while maintaining query processing times short when
compared to reified representations.
Second, I will describe methods of indexing RDF ontologies which are
several times faster than their relational counterparts. Our GRIN
indexing method avoids the computationally complex self-joins inherent
in a relational-backed representation by relying on the locality
property of queries.
More specifically, we show that we only need to iterate over a small
subset of RDF resources to locate the smallest portion of the ontology
guaranteed to contain the answers to a given query. I also describe
several experimental findings from comparisons to leading systems such
as Jena2, RDFBroker and 3store.
Third, I will present a novel ontology integration algorithm called
ILIADS that combines statistical and logical inference to improve the
quality of integrated ontologies. Some of our most interesting findings
show that (i) matching schema and data at the same time yields
significantly better recall than existing leading algorithms; (ii) the
robustness of the integrating two ontologies depends on how similar
their characteristics are and (iii) a little logical inference goes a
long way in improving result quality.
Speaker:
Octavian Udrea
Speaker Bio:
Octavian Udrea is currently a PhD student at the University of Maryland
College Park. His primary research interests include knowledge
representation, automated reasoning and heterogenous databases. He has
also publishes several papers on activity-based querying of video
databases and on automated code verification.