Google Presents MultiModel: A Neural Network Capable of Learning Multiple Tasks in Multiple Domains intro by Roland Meertens In InfoQ
Google created an algorithm that can take inputs from multiple modalities and can generate output in multiple modalities.
Currently, many machine learning applications focus on one domain. Machine translation builds models for one language pair, and image recognition algorithms only perform one task (e.g. describe an image, say what category an image belongs to, or find objects in the image). However, our brain performs very well on all tasks and transfers knowledge from one domain to another. The brain can even transfer what we learned by listening to other domains: things we see or read.
Google built a model that performs 8 tasks in multiple domains: speech recognition, image classification and captioning, sentence parsing, and back and forth translation of English-German and English-French. It consists of an encoder, decoder, and an "input-output mixer" that feeds previous input and output to the decoder. In the image below, each "petal" indicates a modality (either sound, text, or an image). The network can learn every task with one of these inputs and output modalities. ... "
No comments:
Post a Comment