/* ---- Google Analytics Code Below */

Sunday, February 27, 2022

Pretraining for Autonomous Systems

More on Microsoft's pretraining system 

Microsoft Research Blog

COMPASS: COntrastive Multimodal Pretraining for AutonomouS Systems

Published February 23, 2022

By Shuang Ma , Senior Researcher  Sai Vemprala , Senior Researcher  Wenshan Wang , Project Scientist  Jayesh Gupta , Senior Researcher  Yale Song , Senior Researcher  Daniel McDuff , Principal Researcher  Ashish Kapoor , Partner Research Manager

Humans have the fundamental cognitive ability to perceive the environment through multimodal sensory signals and utilize this to accomplish a wide variety of tasks. It is crucial that an autonomous agent can similarly perceive the underlying state of an environment from different sensors and appropriately consider how to accomplish a task. For example, localization (or “where am I?”) is a fundamental question that needs to be answered by an autonomous agent prior to navigation, often addressed via visual odometry. Highly dynamic tasks, such as vehicle racing, necessitate collision avoidance and understanding of the temporal evolution of their state with respect to the environment. Agents must learn perceptual representations of geometric and semantic information from the environment so that their actions can influence the world.

Task-driven approaches are appealing, but learning representations that are suitable only for a specific task limits their ability to generalize to new scenarios, thus confining their utility. For example, as shown in Figure 1, to achieve tasks of drone navigation and vehicle racing, people usually need to specifically design different models to encode representations from very different sensor modalities, e.g., different environments, sensory signals, sampling rate, etc. Such models must also cope with different dynamics and controls for each application scenario. Therefore, we ask the question if it is possible to build general-purpose pretrained models for autonomous systems that are agnostic to tasks and individual form factor.

In our recent work, COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems, we introduce a general-purpose pretraining pipeline, built to overcome such limitations arising from task-specific models. The code can be viewed on GitHub.   ....' 

No comments: