Paper Commentary
--summary and analysis of a formal research paper--
Erica Bolan


"Motion-Based Autonomous Grounding: Inferring External World Properties from Encoded Sensory States Alone" builds off of previous work suggesting that invariance-driven motor action is a possible explanation for how the brain is able to decipher stimuli from the outside world based on its sensory states alone. Specifically, the research focuses on visual orientation: how do the visual receptors in the brain orient themselves do an external visual stimulus, without explicit knowledge of what that stimulus is? The paper provides more substantial documentation than previous work, and bases its conclusions on more realistic examples (ie, a natural image). It also introduces the use of Difference-of-Gaussian filters and Gabor filters to prepare a natural image for simulated sensory processing, and the use of the stochastic Q-learning algorithm to simulate "learning" by the sensory system.

The main contribution of this paper is that it takes a promising theory, based on primarily theoretical (and somewhat artificial) experiments and results, and provides more concrete, world-based evidence for it. Namely, it suggests that it is possible for an autonomous agent's sensory system to determine its external stimuli without ever having direct access to those stimuli. The main contributions of this paper go toward the area of Artificial Intelligence, and the ongoing attempt to successfully model the human brain. Researchers have long been aware that the brain is somehow able to determine the nature of environmental stimulus based entirely on its sensory states and using only its own innate (built-in) capabilities, but have thus far been relatively unsuccessful at replicating that ability in computational models of the brain. Other theories as to how the brain accomplishes this remarkable feat exist and are certainly plausible, but the work presented in this paper provides the basis for a possible computational model of one of the many facets of the human brain.

The main limitation of this paper is that it addresses only one aspect (vision) of the human sensory system, and is restricted to a highly specific piece of that aspect. To extend the theory to other, more complex areas of the vision system, as well as to the rest of the human senses, remains a substantial undertaking and will clearly require modifications and additions to the theory presented. The paper does not profess to solve this question, and in fact helps to focus attention on this important direction for future research, and provide a framework for that research. Another limitation of the paper lies in the fact that the algorithm presented does seem, in some sense, to provide the agent with external input: namely, the Q function used in the learning process. Whether the brain has an innate learning algorithm or some built-in, predefined process for proper stimulus identification is unclear, but the theory presented in this paper remains somewhat synthetic in that it forces an algorithmic learning method upon the agents. Although it is difficult to imagine a way in which to force a computational model of the brain to "learn" without giving it an algorithm at some level, this is an important factor to keep in mind when evaluating whether the agents in the model are truly autonomous. And, finally, the paper fails to address where the model of the human visual input process itself came from, and whether it is a theoretical (synthetic) model, or whether it is an accepted representation of how human vision works (or, at least, how the human visual system processes an image). All told, however, this paper makes a significant contribution to the ability of research in the AI field to understand how the brain defines external stimuli based on its sensory states, and how future work can perhaps replicate that ability with computer-based models.