CLEAR (Computational Language and EducAtion Research)

Lexical Resources

Ontonotes Sense Groups

Word sense ambiguity is a continuing major obstacle to accurate information extraction, summarization and machine translation. While WordNet has been an important resource in this area, the subtle fine-grained sense distinctions in it have not lent themselves to high agreement between human annotators or high automatic tagging performance. Building on results in grouping fine-grained WordNet senses into more coarse-grained senses that led to improved inter-annotator agreement (ITA) and system performance (Palmer et al., 2004; Palmer et al., 2006), we have developed a process for rapid sense inventory creation and annotation that also provides critical links between the grouped word senses and the Omega ontology.

This process is based on recognizing that sense distinctions can be represented by linguists in an hierarchical structure, similar to a decision tree, that is rooted in very coarse-grained distinctions which become increasingly fine-grained until reaching WordNet senses at the leaves. Sets of senses under specific nodes of the tree are grouped together into single entries, along with the syntactic and semantic criteria for their groupings, to be presented to the annotators.

As shown in Figure 1, a 50 sentence sample of instances is annotated and immediately checked for inter-annotator agreement. ITA scores below 90% lead to a revision and clarification of the groupings by the linguist. It is only after the groupings have passed the ITA hurdle that each individual group is linked to a conceptual node in the ontology. In addition to higher accuracy, we find at least a three-fold increase in annotator productivity. On the English side, in our first year, we are annotating the most frequent noun and verb senses in a 300K subset of the PropBank.

Word Sense Annotation Procedure

Figure 1. Annotation Procedure

Verbs

Subcategorization frames and semantic classes of arguments play major roles in determining the groupings for verbs, as illustrated by the grouping for the 22 WN 2.1 senses for drive in Table 1. In addition to improved annotator productivity and accuracy, we predict a corresponding improvement in system performance. Training on this new data, Chen and Palmer (2005) report 80.7% accuracy for verbs using a smoothed maximum entropy model and rich linguistic features. They also report state-of-the-art performance on fine-grained senses, but the results are more than 10% lower.

GI: operating or traveling via a vehicle NP (Agent) drive NP, NP drive PP	WN1: "Can you drive a truck?" WN2: "drive to school" WN3: "drive her to school" WN12: "this truck drives well" WN13: "he drives a taxi" WN14: "the car drove around the corner" WN:16: "drive the turnpike to work"

G2: force to a position or stance NP drive NP/PP/infinitival	WN4: "he drives me mad" WN6: "drive back the invaders" WN7: "she finally drove him to change jobs" WN8: "drive a nail" WN15: "drive the herd" WN22: "drive the game"

G3: to exert energy on behalf of something NP drive NP/infinitival	WN5: "her passion drives her" WN10: "he is driving away at his thesis"

G4: cause object to move rapidly by striking it NP drive NP	WN9: "drive the ball into the outfield " WN17 "drive a golf ball" WN18 "drive a ball"

G5: a directed course of conversation	WN11: "What are you driving at?"

G6: excavate horizontally, as in mining	WN19: "drive a tunnel through the mountain"

G7: cause to function or operate	WN20: "steam drives the engine"

Table 1. Grouping of WordNet Senses for "drive"

Nouns

We follow a similar procedure for the annotation of nouns. The same individual who groups WordNet verb senses also creates noun senses, starting with WordNet and other dictionaries. We aim to double-annotate the 1100 most frequent polysemous nouns in the initial corpus by the end of 2006, while maximizing overlap with the sentences containing annotated verbs.

Certain nouns carry predicate structure; these include nominalizations (whose structure obviously is derived from their verbal form) and various types of relational nouns (like father, President, and believer, that express relations between entities, often stated using of). We have identified a limited set of these whose structural relations can be semi-automatically annotated with high accuracy.

The word sense annotations for verbs is being carried out at the University of Colorado, under the supervision of Prof. Martha Palmer, and the same for nouns is being carried out at Information Sciences Institute, under the supervision of Prof. Eduard Hovy.