(Support verb constructions and Natural Language Processing)
Centrum für Informations- und Sprachverarbeitung, Ludwig-Maximilians-Universität München
This book discusses support verb constructions and light verbs from the viewpoint of Natural Language Processing and makes suggestions how to formalize their lexical and semantic description. A wide variety of language specific and also cross-language theoretical approaches to the phenomenon are discussed. Based on the analysis of such constructions by M. Gross and I. Melcuk, the main concepts are distilled, and an example formalization is given in FrameNet. The approach to formalization is described with only a few minimal theoretic requirements, namely the distinction of a semantic and a surface layer of description, such that the basic concepts can be potentially utilized for a wide variety of grammar frameworks.
In a separate chapter, a collection of tests is presented which allow delineating different types of verb-noun-constructions that are situated between fully compositional and completely frozen constructions. The result is a test battery for support verb constructions. Following the description of the linguistic tests, automated approaches to detect support verb construction candidates in large corpora are presented and discussed. Although a large number of experiments have been described in various papers and for various languages, none of the presented algorithms achieve satisfying results, which means that manual lexicographic coding of the constructions is still needed to select support verb construction and to capture their properties.
The main focus of this book is on the description and classification of German support verb constructions, but English and French examples are abundant, and the linguistic concepts are described in a language independent way.
ISBN 9783929075649. Linguistic Resources for Natural Language Processing 03. 207pp. 2009.
RESOURCES FOR NATURAL LANGUAGE PROCESSING
Applications of natural language processing in a growing variety of technical, industrial and ecommerce domains have become common place. Yet there is still little agreement among theoretically and practically minded computational linguists about the basic assumptions and working
The monographs in this series address the role and the form of linguistic resources in all areas and applications for natural language processing. Even though it is widely admitted that such resources are an
important prerequisite for serious progress in the construction, there has been little consensus about the details of these resources. There have also been very few systematic attempts to outline and to pursue large-scale programs in this field. In addition to the enumeration of all the morphological forms of a language, the central resources are still outstanding, in particular the widely underestimated greater need for very large dictionaries of "complex forms". These range from dictionaries of nominal compounds to dictionaries of predicate-argument schemas as expressed by verbs, predicative nouns and adjectives for instance. And in particular , specific attention needs to be directed towards the construction of exhaustive dictionaries of "frozen predicates" which in fact outnumber the other types.
On the basis of such dictionaries even more adequate representative structures in the form of local grammars and transducers that can deal with the ubiquitous variations of these predicate-argument structure schemas can be envisaged.
Once such extensive linguistic databases are available, we will be able to benefit from the insight that the central goal of linguistic analysis is to identify linguistic units of different degrees of complexity on the basis of pre-exisiting lexico-grammatical structures. Only then will we be able to tackle the challenging tasks concerning language learning by humans and machines in an adequate way.