Gregory Garretson, Boston University, talar under rubriken "Studying meaning by profiling words in corpora: The Corpus-Derived Profiles framework".

Tid och plats

: 25 februari kl. 15 i rum C307.


One way of viewing meaning in language is to see it as operating along two dimensions, the horizontal and the vertical, also called the syntagmatic and the paradigmatic. Paradigmatic word relations include synonymy, antonymy, and the like, in which words are grouped on the basis of similarities and differences between them. Syntagmatic word relations include collocation--the tendency of words to occur together--as well as colligation and semantic preference, which have both a horizontal and a vertical dimension. Many linguists, such as Sinclair (2004:141), have argued that in the history of linguistics, "the tradition of linguistic theory has been massively biased in favour of the paradigmatic rather than the syntagmatic dimension". The use of corpus linguistic methodologies helps us to rectify this imbalance.

In this seminar I will present the result of my recent doctoral work (Garretson 2010): a theoretical and technological framework for studying word meaning, called the Corpus-Derived Profiles framework. Specifically, it focuses on the syntagmatic relations of collocation, colligation, and semantic prosody. These relations are not new, but in the framework they are defined precisely, interrelated, and broken down into sub-relations. Even more importantly, the framework has been implemented computationally such that all of these relations may be measured automatically. The result of such measurement is a "lexical profile", in which thousands of pieces of information about the syntagmatic relations of a given word in a given corpus are collected.

CenDiPede is a Java program that provides a simple but powerful interface to the CDP framework. CenDiPede allows the user to create a profile of any word in a corpus and then execute queries on the resulting profile. The CDP query language is a relatively intuitive system for asking questions about lexical profiles, and is especially valuable for comparing different ones. This makes it possible to compare two or more words in the same corpus, or to compare the same word's profile in different corpora, making the program a useful and flexible tool for lexical analysis. CenDiPede will soon be available for free to anyone who wishes to use it.

In my Ph.D. thesis, I present three empirical studies, involving cases of polysemy, synonymy, and antonymy, based on an analysis of frequent nouns in the British National Corpus. In this seminar, I will present the basics of the CDP framework, including descriptions of collocation, colligation, and semantic preference. I will demonstrate CenDiPede, showing what a lexical profile looks like and how the query language can be used. I will also briefly present one of the three studies mentioned, showing how collocational behavior may be studied to identify the differences between synonymous words.


Garretson, G. 2010. Corpus-Derived Profiles: A framework for studying word meaning in text. Unpublished Ph.D. dissertation, Boston University.

Sinclair, J. M. 2004. Trust the text: language, corpus and discourse. London: Routledge.