Dynamic Object Comprehension

Augmented and Mixed Reality are emerging as likely successors to the mobile internet.  However, many technical challenges remain. One of the key requirements of these systems is the ability to create a continuity between physical and virtual worlds, with the user’s visual perception as the primary interface medium. Building this continuity requires the system to develop a visual understanding of the physical world. While there has been significant recent progress in computer vision and AI techniques such as image classification and object detection, success in these areas has not yet led to the visual perception required for these critical MR and AR applications.  Why is that the case? 

In our paper “Dynamic Object Comprehension: A Framework for Evaluating Artificial Visual Perception” (just submitted to ICIP2022), we explore the metrics that have guided the tremendous breakthroughs in computer vision and deep learning over the past decade.  Breakthroughs that have given us household names (well … for us techies at least) like AlexNet, Inception, ResNet, and YOLO to name a few. And we dive in to see why these existing metrics, while seemingly appropriate, are just not quite the right fit for evaluating the artificial visual perception required for next generation mixed-reality applications. 

We propose that for AR and MR applications we need to step back from the existing Image Classification and Object Detection metrics that have dominated recent AI-based computer vision and re-think the metrics of success.  By defining a new paradigm, Dynamic Object Comprehension, we can set a new bar and drive the innovation required for a new generation of computer vision applications.