1. Large-scale Knowledge Acquisition and Representation (Publications)

    • Corpora: Gutenberg, Wikipedia, 1800-word web collection, WordNet all-word web collection

    • Cleaned corpora: Gutenberg, Wikipedia, 1800-word web collection, WordNet all-word web collection

    • Parsed corpora: Gutenberg, Wikipedia, 1800-word web collection, WordNet all-word web collection

    • Node set: 12-07-2007 set (~133000 nodes), 1800-word web collection node set

    • Chi-square node set: 12-07-2007 set (~133000 nodes), 1800-word web collection node set

    • Software packages: HTML cleaning, Chi-square calculation, node-based web crawler, Minipar, node building

     

  2. Word Sense Disambiguation (Publications)

    • Dictionary: WordNet (4 text files, noun, verb, adjective, adverb)

    • Corpora: Semeval 4 Task 7 testing corpus, Semeval 4 Task 17 testing corpus

    • Software packages: Tree matching-based WSD

     

  3. Question Answering (Publications)

    • Corpora: TAC 2008 QA track corpus, BLOG06, AQUAINT 1 & 2, UIUC question categories and set

    • Software packages: Tree matching-based QA, Question classification, Named Entity Recognition, GATE

     

  4. Search engine (Publications)

    • Node set: Wikipedia node set

    • A simple interface

    • Software packages: Tree matching-based search engine

     

  5. Coreference Resolution

    • Node set: 12-07-2007 set

    • Software packages: Anaphora resolution, Event node generation, Event flow generation, Spelling error detection, Notepad Plus

     

  6. Word similarity

    • Dataset: WordNet synsets, 353 similar words

    • Software packages: 2D Lexical semantic space building

     

  7. Semantic Association Rule Mining (Publications)

     

  8. Text Summarization (Publications)

Artificial Intelligence Lab     S-3-015, Science Building
University of Massachusetts Boston | Computer Science Department
Home | Publications | Research | Resources | People | News