Title: The Open Mind Initiative: Large-scale Knowledge Acquisition From non-experts Via the Web

David G. Stork
stork@OpenMind.org
www.OpenMind.org

 

Abstract:
The Open Mind Initiative is a web-based collaborative framework for collecting large knowledge bases from non-expert contributors. Such knowledge bases are vital for a wide range of 'intelligent' software such as speech and handwriting recognizers, commonsense reasoners, and natural language understanding systems. This talk begins by examining several important trends that underly Open Mind:

 

the rise in open source software

the expansion of opportunities for less-skilled users to contribute knowledge

the increase in scientific collaboration over the internet

the growing need for large sets of 'informal' data from non-experts

 

Next we contrast the Open Mind approach with traditional data mining, and then describe ongoing projects collecting common sense, natural language and handwriting recognition knowledge bases. Our largest project, Open Mind common sense, has collected 750,000 simple assertions from over 12,000 non-expert contributors.  Two important considerations are speeding the collection of data (by interactive learning techniques) and ensuring data quality (by identifying and filtering unreliable or even 'hostile' contributions). We derive information-theoretic algorithms and perform simple simulations which justify our approach to these two problems.

The talk concludes with a vision of future directions and opportunities.

[Including work by P. Singh, T. Chklovsky, C. Lam and N. Aron]

---------------------------------------------------------------------
David G. Stork is Chief Scientist of Ricoh Innovations as well as Consulting Professor of Electrical Engineering at
Stanford University. His primary interests lie in pattern recognition, machine learning, neural networks and novel uses of the internet; he is the creator and leader of the Open Mind Initiative. He sits on the editorial boards of four international journals and his five books include HAL's Legacy: 2001's computer as dream and reality (MIT Press) for general audiences and the second edition of Pattern Classification with R. Duda and P. Hart (Wiley).