Title: The
Open Mind Initiative: Large-scale Knowledge Acquisition From non-experts Via
the Web
David G. Stork
stork@OpenMind.org
www.OpenMind.org
Abstract:
The Open Mind Initiative is a web-based collaborative framework for collecting
large knowledge bases from non-expert contributors. Such knowledge bases are
vital for a wide range of 'intelligent' software such as speech and handwriting
recognizers, commonsense reasoners, and natural language
understanding systems. This talk begins by examining several important trends
that underly Open Mind:
the rise
in open source software
the
expansion of opportunities for less-skilled users to contribute knowledge
the
increase in scientific collaboration over the internet
the
growing need for large sets of 'informal' data from non-experts
Next we contrast the Open
Mind approach with traditional data mining, and then describe ongoing projects
collecting common sense, natural language and handwriting recognition knowledge
bases. Our largest project, Open Mind common sense, has collected 750,000
simple assertions from over 12,000 non-expert contributors. Two important
considerations are speeding the collection of data (by interactive learning techniques)
and ensuring data quality (by identifying and filtering unreliable or even
'hostile' contributions). We derive information-theoretic algorithms and
perform simple simulations which justify our approach to these two problems.
The talk concludes with a vision of future directions and opportunities.
[Including work by P. Singh, T. Chklovsky,
C. Lam and N. Aron]
---------------------------------------------------------------------
David G. Stork is Chief Scientist of Ricoh Innovations as well as Consulting
Professor of Electrical Engineering at