Coyote Papers:
Working Papers in Linguistics


Modeling Semantic Coherence from Corpus Data: The Fact and the Frequency of a Co-occurrence

Viktor Pekar
Bashkir State University

The paper presents a preliminary evaluation of a corpus-based representation of individual words and a method to generalize over these representations. The vector space is represented in a way that gives weight to the fact that words co-occur rather than to the frequency of their co-occurrence. This format is hypothesized to allow for reducing the vector space, minimizing negative effects of data sparseness and enhancing ability of the model to generalize words to novel contexts. The model is assessed by comparing computer-calculated probabilities of different verb-argument combinations with human subjects' judgements about appropriateness of these combinations. The results indicate that there is a correlation between the probabilities calculated by the model and the subjects' evaluations.

If you have any comments, questions or would like to place an order please contact:
Email: coyote@u.arizona.edu

University of Arizona Linguistics Circle
ATTN: Coyote Papers
200E Douglass Building
Tucson, AZ 85721
U.S.A.

Home | Issues | Price | Style Sheet | Links
University of Arizona
| Department of Linguistics | Linguistics Circle