Redwood Neuroscience
Title: "
Compositional memory for recognition of complex objects: A proposal"
Ross Gayler
Abstract:
The problem of extracting an invariant
representation from perceptual inputs has long been recognised
(e.g. Lashley, 1942).
More recently, various proposals have been made and implemented for
recurrent connectionist systems that simultaneously settle on a mapping and
retrieve an item from memory (Arathorn, 2002; Hinton, 1981; Olshausen,
Anderson, & Van Essen, 1993). The
mapping (which captures the variant aspect of the input) transforms the input
into the cue that retrieves the item (which is the invariant representation)
from memory.
Coming from a completely different direction, I have
been developing a connectionist memory architecture to
support high level cognition (Gayler, 2000). Surprisingly, this architecture is also based
on simultaneous transformation and recognition and is abstractly isomorphic to
the perceptual invariance architectures.
The similarity between the perceptual and cognitive architectures
suggests that there may be a fundamental unity between them. The difference lies in the details; the
perceptual architectures use localist representations
and a fixed palette of geometric transformations, while the cognitive
architecture uses distributed connectionist representations capable of
representing recursive structures, and transformations that are arbitrary
structural substitutions. I propose that
the architectures could be unified and devote the remainder of this
presentation to exploring how this may enable the recognition of composite
objects.
The perceptual architectures mentioned earlier recognise a single item at a time. They can be persuaded to attend to multiple
items serially, but they do not allow for representation of the relations
between items. These architectures do
represent the relations between the elements (pixels or feature vectors) within
an item, but these relations are fixed.
Each item is recognised holistically and
treated as atomic (having no internal compositional structure). Thus, multi-level composite items can not be
represented.
The representational advantage offered by the
distributed approach is that transformations are “first-class” entities, having
the same status as the content mapped by the transformations. This means that representations of
transformations can be included in the representations of objects. In particular, two serially fixated items and
the attentional transformation between the fixations
could be represented on the same set of connectionist units used to represent
just one item. Thus, it should be possible
to represent complex entities as a network of components with transformations
between them. This leads naturally to
graph structures as representations of objects – a common choice in computer
vision systems.
The process advantage of such an approach is that it
should be possible to build a connectionist memory that simultaneously recalls
multiple items while settling on mappings between them. These mappings would serve to unify the
retrieved items into a representation of a novel composite object. Memory systems of this sort should be able to
recognise novel compositions of familiar components
as readily as they recognise the components
themselves. The distributed
connectionist implementation of this recognition process can be construed as an
indirect implementation of Pelillo's (1999)
approximate graph matching via replicator equations,
by embedding his algorithm in a fixed high-dimensional vector space.
Arathorn, D. W. (2002).
Map-seeking circuits in visual cognition: A computational mechanism for
biological and machine vision.
Gayler, R.
W. (2000). Multiplicative Binding, Representation Operators
& Analogical Inference. Presented at Cognitive
Science Conference.
Hinton, G. E. (1981). A parallel computation that assigns canonical object-based frames
of reference. Proceedings of the Seventh International
Joint Conference on Artificial Intelligence Vol. 2.
Lashley, K.
S. (1942). The problem of cerebral organization in vision.
Biological Symposia, 7, 301-322.
Olshausen,
B. A., Anderson, C. H., & Van Essen, D. C. (1993). A neurobiological model of visual attention and invariant pattern
recognition based on dynamic routing of information. The Journal of
Neuroscience, 13, 4700-4719.
Pelillo, M. (1999).
Replicator equations, maximal cliques, and graph isomorphism. Neural
Computation, 11, 1933-1955.