Authors:
Yehezkel S. Resheff
1
;
Itay Lieder
2
and
Tom Hope
2
Affiliations:
1
Intuit Tech Futures and Israel
;
2
Intel Advanced Analytics and Israel
Keyword(s):
Deep Learning, Fusion.
Related
Ontology
Subjects/Areas/Topics:
Feature Selection and Extraction
;
Pattern Recognition
;
Theory and Methods
Abstract:
Pre-trained deep neural networks, powerful models trained on large datasets, have become a popular tool in computer vision for transfer learning. However, the standard approach of using a single network potentially misses out on valuable information contained in other readily available models. In this work, we study the Mixture of Experts (MoE) approach for adaptively fusing multiple pre-trained models for each individual input image. In particular, we explore how far we can get by combining diverse pre-trained representations in a customized way that maximizes their potential in a lightweight framework. Our approach is motivated by an empirical study of the predictions made by popular pre-trained nets across various datasets, finding that both performance and agreement between models vary across datasets. We further propose a miniature CNN gating mechanism operating on a thumbnail version of the input image, and show this is enough to guide a good fusion. Finally, we explore a multi
-modal blend of visual and natural-language representations, using a label-space embedding to inject pre-trained word-vectors. Across multiple datasets, we demonstrate that an adaptive fusion of pre-trained models can obtain favorable results.
(More)