|
Computational Linguistics, or Natural Language
Processing (NLP), is not a new field. As early as
1946, attempts have been undertaken to use
computers to process natural language. These
attempts concentrated mainly on Machine
Translation and, due to the political
situation at the time, almost exclusively on the
translation from Russian into English.
Considerable resources were dedicated to this
task, both in the U.S.A. and in Great Britain,
during the fifties and sixties. Other countries,
mainly in continental Europe, joined the
enterprise, and the first systems ("SYSTRAN")
became operational at the end of this period.
However, the limited performance of these systems
made it clear that the underlying theoretical
difficulties of the task had been grossly
underestimated, and in the following years and
decades much effort was spent on basic research
in formal linguistics. Today, a number of Machine
Translation systems are available commercially
although there still is no system that produces
fully automatic high-quality translations (and
probably there will not be for some time). Human
intervention in the form of pre- and/or post-editing
is still required in all cases.
Another application that has become
commercially viable in the last years is the
analysis and synthesis of spoken language, i.e., speech
understanding and speech generation.
Potential applications go from help for the
handicapped (e.g., text-to-speech systems for the
blind) to telephony based information systems (e.g.,
inquiry systems for train or plane connections,
telebanking) and further on to office dictation
systems (as offered by several vendors). Several
text-to-speech systems are commercially
available, and are in daily use in many places.
The difficulties of speech understanding are much
greater than those for speech generation yet some
of the speech understanding systems are also
entering the marketplace.
An application that will become at least as
important as those already mentioned is the creation,
administration, and presentation of texts by
computer. Even reliable access to written texts
is a major bottleneck in science and commerce.
The amount of textual information is enormous (and
growing incessantly), and the traditional, word-based,
information retrieval methods are getting
increasingly insufficient as either precision or
recall is always low (i.e., you get either a
large number of irrelevant documents together
with the relevant ones, or else you fail to get a
large number of the relevant ones in the
collection). Linguistically based retrieval
methods, taking into account the meaning of
sentences as encoded in the syntactic structure
of natural language, promise to be a way out of
this quandary. However, the creation of
texts is also becoming a problem. Manuals of
complex technical systems (airplanes, computers
etc.) are constantly out of date as the systems
themselves are upgraded ever faster. Writing
manuals by hand is thus getting ever more
expensive and unreliable, and if manuals have to
be maintained in different languages, manual
production becomes increasingly unmanageable. If
different versions of the manuals have to be
written (for service users, for technicians, for
auditors etc.), things get out of hand altogether.
The automatic creation of manuals from a common
knowledge base, in different languages and for
different types of readers is a possible solution
of this cluster of problems. The creation of
natural language texts has always been a bit of
"poor cousin" in the field of
Computational Linguistics. The situation
described is about to change this in a
fundamental manner.
Another topic that might come to the forefront
of research in Computational Linguistics is the presentation
of textual information. Traditionally, text
generation systems have created standard, i.e.,
linear, text. If the amount of text is large, and/or
if different types of readers must be addressed,
hypertext is a better medium of presentation. The
automatic creation of hypertext from an
underlying knowledge base calls for an extension
of this traditional approach.
|
|
Many people with a degree in Computational
Linguistics work in research groups in
universities, governmental research labs, or in
large enterprises. For example in Sweden
Computational Linguists work in research groups
at the various universities that offer courses in
linguistics (like Göteborg or Uppsala), at
research labs like SICS (The Swedish Institute of
Computer Science), or for companies like Telia or
IBM.
In addition there are development groups
working on commercial products. These range from
software houses like Microsoft, that employs
Computational Linguists for their work on Grammar
Checkers and Automatic Summarization, to the
Munich based SailLabs, that develops a machine
translation system, to Caterpillar which employs
Computational Linguists for translations of
technical manuals.
In recent years the demand for Computational
Linguists has risen with the increase of language
technology products in the Internet. Job offers
come from developers improving Internet search
engines with linguistic means, or facilitating
the user interface with lingubots. Others are
integrating speech recognition with language
processing techniques.
In general one can say that currently the job
market for Computational Linguists is good.
|