On the Construction of an Embodied Brain via Group Lasso Regularization – The goal of this report is to propose and compare a novel model for visual attention. The model is a convolutional neural network that performs attention based on a sparsely-collected vector. We use the convolutional neural network to model the joint distribution of the attention maps of the two attention channels and the joint distribution of input image vectors. A simple optimization problem is solved by utilizing a supervised learning method for the gradient descent problem. Two experiments are conducted with the proposed network to evaluate the effectiveness of our model. The results show that the joint distribution of the attention maps and the joint distribution of image vectors can be achieved by the proposed model. To the best of our knowledge, the proposed model is the first to implement the joint distribution estimation task on the CNNs with both feature-based and sparse coding.
We propose a new class of 3D motion models for action recognition and video object retrieval based on visualizing objects in low-resolution images. Such 3D motion models are capable of capturing different aspects of the scene, such as pose, scale and lighting. These two aspects are not only pertinent when learning 3D object models, but could also be exploited for learning 2D objects as well. In this paper, we present a novel method called Multi-modal Motion Transcription (m-MNT) to encode spatial information in a new 3D pose space using deep convolutional neural networks. Such 3D data is used to learn both object semantic and pose variations of objects. We compare the performance of m-MNT on the challenging ROUGE 2017 dataset and the challenging 3D motion datasets such as WER and SLIDE. Our method yields competitive performance in terms of speed and accuracy; hence, the m-MNT class has a good future for action recognition.
Learning to Know the Rules of Learning
A Unified Approach for Optimizing Conditional Models
On the Construction of an Embodied Brain via Group Lasso Regularization
Stereoscopic Video Object Parsing by Multi-modal Transfer LearningWe propose a new class of 3D motion models for action recognition and video object retrieval based on visualizing objects in low-resolution images. Such 3D motion models are capable of capturing different aspects of the scene, such as pose, scale and lighting. These two aspects are not only pertinent when learning 3D object models, but could also be exploited for learning 2D objects as well. In this paper, we present a novel method called Multi-modal Motion Transcription (m-MNT) to encode spatial information in a new 3D pose space using deep convolutional neural networks. Such 3D data is used to learn both object semantic and pose variations of objects. We compare the performance of m-MNT on the challenging ROUGE 2017 dataset and the challenging 3D motion datasets such as WER and SLIDE. Our method yields competitive performance in terms of speed and accuracy; hence, the m-MNT class has a good future for action recognition.
Leave a Reply