Automatic speechreading systems have increasingly exploited deep learning advances, resulting in dramatic gains over traditional methods. State-of-the-art systems typically employ convolutional neural networks (CNNs), operating on a video …
We introduce co-occurring directions sketching, a deterministic algorithm for approximate matrix product (AMM), in the streaming model. We show that co-occurring directions achieves a better error ...
Joint modeling of language and vision has been drawing increasing interest. A multimodal data representation allowing for bidirectional retrieval of images by sentences and vice versa is a key aspect. In this paper we present three contributions in …
In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their …