Humans possess the remarkable ability of peripheral vision, allowing them to perceive shapes outside their direct line of sight albeit with reduced detail. This expanded field of vision proves beneficial in various scenarios, such as noticing a vehicle approaching from the side while driving.
In contrast, artificial intelligence lacks this peripheral vision capability. Integrating computer vision models with this feature could enhance their ability to detect potential dangers or predict human reactions to oncoming obstacles.
Researchers at MIT have made strides in this direction by creating an image dataset that enables the simulation of peripheral vision in machine learning models. Training these models with the dataset has shown improvements in detecting objects in the visual periphery, although their performance still falls short of human capabilities.
Interestingly, the study revealed that factors like object size and visual clutter did not significantly influence the AI’s performance, unlike in humans. Vasha DuTell, a postdoc involved in the research, highlights the persistent disparity between AI models and human vision, prompting a deeper exploration into the missing elements in these models.
Understanding these discrepancies could pave the way for developing AI models that mimic human vision more closely, potentially enhancing driver safety and optimizing display interfaces for better user experience.
Anne Harrington MEng ‘23, the lead author, emphasizes that modeling peripheral vision accurately could offer insights into the visual cues that drive eye movements and information gathering in humans.
The research team, including co-authors like Mark Hamilton, Ayush Tewari, Simon Stent, William T. Freeman, and Ruth Rosenholtz, aims to present their findings at the International Conference on Learning Representations. Ruth Rosenholtz underscores the critical role of peripheral vision in human-machine interactions, emphasizing its importance in understanding what individuals can perceive.
To simulate peripheral vision effectively, the researchers adopted a technique called the texture tiling model, originally used in human vision studies. This method, tailored to mimic human visual information loss, was adapted to transform images in a more flexible manner without prior knowledge of where the focus will be.
The team leveraged this approach to generate a vast dataset of transformed images, reflecting the textural changes in the visual periphery. Training computer vision models with this dataset enhanced their object detection capabilities, although human performance remained superior.
Despite the performance enhancements observed in the AI models, they still lagged behind human proficiency, particularly in detecting objects in the far periphery. Anne Harrington notes the dissimilarities in how models and humans utilize contextual information for such tasks, hinting at distinct strategies employed.
Future investigations will delve deeper into these differences, aiming to develop models that align more closely with human visual capabilities. This endeavor could lead to AI systems that provide timely alerts to drivers regarding potential hazards. Moreover, the researchers aim to inspire further studies in computer vision using their publicly available dataset.
The significance of this research lies in challenging the notion of peripheral vision as merely limited vision and instead recognizing it as an optimized representation for real-world tasks. Justin Gardner, an associate professor at Stanford University, commends the study for shedding light on the disparities between neural network models and human vision, emphasizing the need for AI research to draw insights from human neuroscience.
The study receives support from the Toyota Research Institute and the MIT CSAIL METEOR Fellowship, showcasing the collaborative efforts driving advancements in AI and human vision research.