Learning by Reading and Learning to Read

Workshop at ICSC-08

The majority of human knowledge is encoded in text, and much of this text is available in machine-readable form. But to machines, the knowledge encoded in the texts they read remains inaccessible. Machines of today can recognize textual strings, compare them to other strings, and recognize named entities and the like. These capabilities have already advanced the state of the art in information extraction and filtering, as a variety of current application systems demonstrate. However, understanding texts and reasoning about their meaning is still left to people.

One widely accepted reason for this state of affairs is that high-end reasoning requires vast amounts of knowledge in machine-tractable form, and acquiring such knowledge is an expensive undertaking. Automation of the acquisition of general knowledge about the world and about language is a natural way of making it cheaper. Addressing this difficult task may require modifications, or at least extensions, to the current machine learning paradigm. Indeed, while current approaches predominantly address acquistion of facts and opinions and the breadth of text/corpus coverage, attention must also be paid to acquisition of concepts and to the depth to which text is processed: how much is understood, how that understood knowledge can be automatically converted into useful, unambiguous knowledge, and how reasoners can be configured to operate over that knowledge.

One attractive paradigm modification would involve generating and manipulating semantic structures rather than (often multiply ambiguous) strings when learning new knowledge of the world (e.g., ontological concepts) and language (e.g., lexicon entries). Such semantically informed learning is what has sometimes been referred to as learning by reading, with machine reading having the same implications as human reading: that the text is understood. Of course, once a new knowledge element is learned, it should be added to the knowledge resources of the system that generates and manipulates the semantic representations. This will facilitate improvement of the quality of subsequent semantic analysis. This gradual process has be called learning to read. There is a clear symbiotic relationship between these two tasks – expanding knowledge resources enables systems that extract knowledge from text to improve over time and consequently improve the quality of the resources. The workshop will concentrate specifically on these two tasks and their interplay.

The goal of the workshop is to facilitate discussion and debate of core issues, obstacles, prerequisites, and alternatives on the road to overcoming the knowledge bottleneck by using and improving automatic, semantically oriented text analysis. Of particular interest is work not covered in other meetings devoted to computational semantics, data mining or knowledge extraction from text, where the emphasis is typically on "supply-side" issues of broad coverage, evaluation regimens or formalisms rather than on the "demand-side" issues of related to depth of coverage of concepts and phenomena.

Topics of interest include but are not limited to:

Submissions

Submit papers in PDF format by e-mail to the organizers (addresses below). Authors should follow the formatting instructions for ICSC 2008 long papers. Please consult http://icsc.eecs.uci.edu/submissions.html

Dates

Paper submission: April 7, 2008
Notification of acceptance: April 30, 2008
Workshop date: August 4, 2008

Organizers

Sergei Nirenburg and Tim Oates
Computer Science and Electrical Engineering Department
University of Maryland Baltimore County
{sergei, oates} @umbc.edu

Program Committee

Ken Barker University of Texas
Lynn Carlson Department of Defense
Graeme Hirst University of Toronto
Eduard Hovy USC ISI
David Israel SRI
Sergei Nirenburg UMBC
Tim Oates UMBC
Lehnart Schubert University of Rochester