Machine Learning Vision
MIT Advances Unsupervised Computer Vision with ‘STEGO’
By Oliver Peckham
Training machine learning models often means working with labeled data. For computer vision tasks, this might look, for instance, like an hour of camera footage from a car, meticulously sectioned by humans to designate roads, road signs, vehicles, pedestrians and so forth. But labeling even this small amount of data could take hundreds of hours for a human, bottlenecking the training process. Now, researchers from MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL) are introducing a new, state-of-the-art algorithm for unsupervised computer vision tasks that operates without any human labels.
The model is called STEGO, short for “Self-supervised Transformer with Energy-based Graph Optimization.” STEGO is a semantic segmentation algorithm, the process of labeling the pixels in an image. Historically, semantic segmentation has been easiest for discrete objects like people or vehicles and harder for more amorphous, blended elements of the environment like clouds or bushes—or cancers.
“If you’re looking at oncological scans, the surface of planets, or high-resolution biological images, it’s hard to know what objects to look for without expert knowledge. In emerging domains, sometimes even human experts don’t know what the right objects should be,” explained Mark Hamilton, a research affiliate of MIT CSAIL, software engineer at Microsoft, and lead author of the paper describing STEGO, in an interview with MIT’s Rachel Gordon. “In these types of situations where you want to design a method to operate at the boundaries of science, you can’t rely on humans to figure it out before machines do.”
STEGO is built on top of the DINO algorithm, itself trained on 14 million images. The researchers tested STEGO on a variety of test cases, including the incredibly diverse COCO-Stuff image dataset. The researchers reported that STEGO doubled the performance of prior unsupervised computer vision models on the COCO-Stuff benchmark, and performed similarly well on tasks like driverless car datasets and space imagery datasets. ... '
No comments:
Post a Comment