Second- and Higher-order Representations in Computer Vision-ICCV2019

This tutorial aims at promoting discussions among researchers investigating innovative second-order (bilinear), kernel and tensor-based approaches to computer vision problems. Specifically, we will stimulate discussions on recent advances, ongoing developments, and novel applications of bilinear, kernel and multilinear algebra, optimization, and feature representations using matrices and tensors in the context of CNN learning.

TOPICS

We have addressed a wide range of theoretical and practical issues including, but not limited to the following topics:

Bilinear, kernel and tensor methods in low-level feature design and deep learning
Power Normalisation, non-linearities and their formulations
Mid-level representations with co-occurrence matrices and tensors
Low-rank factorisation methods and denoising approaches
Latent topic models using matrices, kernels and tensor methods
Co-occurrence matrices, kernels and tensors in optimization and dictionary learning
Advancements in Riemannian geometry, kernel methods and multilinear algebra
Dimensionality reduction, similarity learning, metric learning, and other machine learning topics
Applications of co-occurrences, kernels and tensors for:
- Object recognition
- Scene understanding
- Fine-grained classification
- Action recognition
- Industrial and medical applications
- Other CV and ML problems
Other related topics not listed above

SCHEDULE

Below is the program of the tutorial that took place on the 2nd of November, 2019. Below is the Detailed Program with abstracts and biographies of our speakers (or click on links in tables).

Afternoon Session

Time	Speaker	Title
13:30	Organizers	Welcome
13:35	Prof. René Vidal	Invited Talk I: Global Optimality in Separable Dictionary Learning
14:15	Prof. Richard Hartley	Invited Talk II: Kernels on Manifolds
14:55	Dr. Piotr Koniusz	Tutorial Part I. Foundations of Second- and Higher-order Representations /slides/
15:35	Coffee break	Venue
16:10	Assoc. Prof. Lei Wang	Tutorial Part II. Learning SPD-matrix-based Representation for Visual Recognition /slides/
16:50	Dr. Subhransu Maji	Invited Talk III: Improving the Generalization and Efficiency of Second-order Representations /slides/
17:30	Prof. Ruiping Wang	Tutorial Part III. Riemannian Metric Learning and its Vision Applications /slides/
18:10	Organizers	Closing remarks

INFORMATION

~~Kindly note that registration at ICCV'19 webpage is mandatory for everyone participating in the workshop (at least a registration for pre-conference workshop).~~
The workshop took place on the 2nd of November, 2019.
You can find an archive listing related papers on second-order pooling here

DETAILED PROGRAM

Below is the list of speakers (in no particular order) who gave a talk during the tutorial (including organizers):

Prof. Richard Hartley (Australian National University)
Title: Kernels on Manifolds
Abstract: Data often has a structure in which it is constrained to lie on certain Riemannian manifolds, such as Grassmannian manifolds or the space of rotations. Kernel methods, such as SVM can be adapted to such data by defining suitable positive definite kernels on these manifolds. I will discuss the existence and properties of various kernels on manifolds, and their use.
Biography: Richard Hartley is a member of the computer vision group in the Research School of Engineering, at the Australian National University, where he has been since January, 2001. He is a joint leader of the Computer Vision group in NICTA, a government funded research laboratory. Dr. Hartley worked at the General Electric Research and Development Center from 1985 to 2001, working first in VLSI design, and later in computer vision. He became involved with Image Understanding and Scene Reconstruction working with GE's Simulation and Control Systems Division. In 1991, he began an extended research effort in the area of applying projective geometry techniques to reconstruction using calibrated and semi-calibrated cameras. This research direction was one of the dominant themes in computer vision research throughout the 1990s. In 2000, he co-authored (with Andrew Zisserman) a book on Multiview Geometry in Computer Vision, summarizing the previous decade’s research in this area.
Prof. René Vidal (Johns Hopkins University)
Title: Global Optimality in Separable Dictionary Learning
Abstract: Sparse dictionary learning is a popular method for representing signals as linear combinations of a few elements from a dictionary that is learned from the data. In many applications in computer vision and medical imaging, signals are better represented as matrices or tensors (e.g., images or videos), and it may be beneficial to exploit the multi-dimensional structure of the data to learn a more compact representation. One such approach is separable dictionary learning, where one learns a separate dictionary for different dimensions of the data (e.g., spatial and temporal dimensions of a video). However, typical formulations of separable dictionary learning involve solving a non-convex optimization problem; thus guaranteeing global optimality remains a challenge. In this work, we propose a framework that builds upon recent developments in matrix factorization to provide theoretical and numerical guarantees of global optimality for separable dictionary learning. Specifically, we prove that local minima are guaranteed to be global when some dictionary atoms and the corresponding coefficients are zero. We also propose an algorithm to find such a globally optimal solution, which alternates between following local descent steps and checking a certificate for global optimality.
Biography: Professor Vidal received his B.S. degree in Electrical Engineering from the Pontificia Universidad Católica de Chile in 1997 and his M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California at Berkeley in 2000 and 2003, respectively. In 2004 he joined the Johns Hopkins University, where he is currently a Professor in the Center for Imaging Science and the Department of Biomedical Engineering. Dr. Vidal is co-author of the book “Generalized Principal Component Analysis” (2016), co-editor of the book “Dynamical Vision” (2006), and co-authored of more than 200 articles in machine learning, computer vision, biomedical image analysis, hybrid systems, robotics and signal processing. Dr. Vidal is Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence, the SIAM Journal on Imaging Sciences, Computer Vision and Image Understanding, and Medical Image Analysis. He has been Program Chair for ICCV 2015 and CVPR 2014, and Area Chair for all major conferences in machine learning, computer vision, and medical image analysis. Dr. Vidal has received many awards for his work including the 2012 J.K. Aggarwal Prize, the 2009 ONR Young Investigator Award, the 2009 Sloan Research Fellowship, the 2005 NFS CAREER Award, and best paper awards at in computer vision (ICCV-3DRR 2013, PSIVT 2013, ECCV 2004), controls (CDC 2012, CDC 2011) and medical robotics (MICCAI 2012). Dr. Vidal was elected fellow of the IEEE in 2014 and fellow of the IAPR in 2016.
Dr. Subhransu Maji (UMass Amherst)
Title: Improving the Generalization and Efficiency of Second-order Representations
Abstract: I will present techniques for improving the robustness of second-order representations based on iterative techniques for spectral scaling (e.g., matrix square root or logarithm) and feature reweighing (e.g., democratic pooling). These can be easily integrated with existing deep networks and allow efficient forward and backward operations by 'unrolling' the iterations as 'layers'. I will discuss the memory and computational tradeoffs these offer and some open questions. I'll then present some empirical analysis of these on standard image classification datasets, as well as how these methods perform when integrated with the latest deep networks such as ResNets and DenseNets.
Biography: Since September 2014, Dr. Maji is an Assistant Professor in the College of Information and Computer Sciences at the University of Massachusetts, Amherst and the co-director of the Computer Vision Lab. He is affilated with the Center of Data Science and AWS AI. Prior to this Dr. Maji spent three years as a Research Assistant Professor at TTI Chicago, a philanthropically endowed academic computer science institute in the University of Chicago campus. He obtained his Ph.D. under the supervision of Jitendra Malik from the University of California at Berkeley in 2011, and a B.Tech. in Computer Science and Engineering from IIT Kanpur in 2006. In the past Dr. Maji has enjoyed working at Google, INRIA LEAR group, Microsoft Research India, the CLSP center at Johns Hopkins University, and Oxford University. His research is funded by the National Science Foundation, as well as faculty grants from Facebook, NVIDIA, and Adobe. His research focusses on computer vision with a particular emphasis on algorithms for high-level recognition. His goal is to enable cheap and robust sensing of the visual world using cameras powered by computer vision.
Assoc. Prof. Lei Wang (University of Wollongong)
Title: Learning SPD-matrix-based Representation for Visual Recognition
Abstract: Learning high-order feature representation has recently attracted much attention in computer vision. As a second-order pooled representation, covariance matrix has played an important role in this research trend. This talk will report our recent work on learning covariance matrices to achieve better recognition. The first part presents a method called discriminative Stein kernel which utilises label information to adjust covariance matrices for better discriminative capability. The second part explores the sparsity structure among features to compute sparse inverse covariance matrix as representation, achieving better recognition performance in the case of high-dimensional features but small sample. The last part moves beyond covariance matrix and employs kernel matrix as feature representation. It develops a deep learning network that jointly learns local descriptors and kernel-matrix-based pooled representation in an end-to-end manner. Extensive experimental study is conducted on visual classification tasks to demonstrate the efficacy and advantage of the proposed methods.
Biography: Lei Wang received his PhD degree from Nanyang Technological University, Singapore. He is now Associate Professor at School of Computing and Information Technology of University of Wollongong, Australia. His research interests include machine learning, pattern recognition, and computer vision. Lei Wang has published 150+ peer-reviewed papers, including those in highly regarded journals and conferences such as IEEE TPAMI, IJCV, CVPR, ICCV and ECCV, etc. He was awarded the Early Career Researcher Award by Australian Academy of Science and Australian Research Council. He served as the General Co-Chair of DICTA 2014, Area Chair of ICIP2019, and on the Technical Program Committees of 20+ international conferences and workshops. Lei Wang is senior member of IEEE.
Prof. Ruiping Wang (Chinese Academy of Sciences)
Title: Riemannian Metric Learning and its Vision Applications
Abstract: This part will introduce the recent advances of Riemannian metric learning and its applications to many vision tasks, such as video face recognition/retrieval, action recognition, fine-grained image classification, etc. By using second-order statistics based visual representations (e.g. SPD covariance matrix, linear subspace model, and Gaussian distribution), typical visual classification task can be formulated as metric learning on some specific Riemannian manifolds (correspondingly SPD Riemannian manifold, Grassmann manifold, and Gaussian statistical manifold). This tutorial will give an overview of such metric learning algorithms with more focus on their mathematical formulations, derivations, as well as their connections with traditional linear metric learning paradigm.
Biography: Ruiping Wang is with the faculty of the Institute of Computing Technology, Chinese Academy of Sciences, where he was an Assistant Professor (Jul. 2012-Sep. 2012), an Associate Professor (Oct. 2012-Sep. 2017), and is currently a Professor. He works in the Visual Information Processing and Learning (VIPL) group, focusing on research projects related to visual scene understanding and human face analysis. From July 2010 to June 2012, he was a Postdoctoral Researcher working with Prof. Qionghai Dai, in the Department of Automation, Tsinghua University, Beijing. From Nov. 2010 to Oct. 2011, he also spent one year working as a Research Associate with Prof. Larry S. Davis in the Institute for Advanced Computer Studies (UMIACS), University of Maryland, College Park. He received the B.S. degree in Applied Mathematics from Beijing Jiaotong University, Beijing, China, in July 2003, and the Ph.D. degree in Computer Science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in July 2010, under the supervision of Prof. Wen Gao.
Dr. Mehrtash Harandi (Monash University)

Biography: Mehrtash Harandi received the Ph.D. degree in artificial intelligence from the University of Tehran. He is currently a Senior Lecturer with the Department of Electrical and Computer Systems Engineering, Monash University. He is also a Contributing Research Scientist with the Machine Learning Research Group, Data61/CSIRO, and an Associate Investigator with the Australian Center for Robotic Vision. With broad interests in machine learning, computer vision, and signal processing, he develops algorithms and mathematical models to equip machines with intelligence.
Dr. Piotr Koniusz (Data61/CSIRO and Australian National University)
Title: Foundations of Second- and Higher-order Representations
Abstract: This tutorial will outline the foundations of second- and higher-order representations such as covariance and auto-correlation matrices, their derivations from sum kernels, connection to statistical moments, derivations of power normalising functions, the connection between the Power Normalisation and Eigenvalue Power Normalisation to CDF of Binomial distributions, and the aggregations over RBF feature maps in the context of image classification, fine-grained recognition, action recognition, domain adaptation and few-shot learning.
Biography: Dr. Koniusz is a senior research scientist in Machine Learning Research Group at Data61/CSIRO (former NICTA). He is also a senior honorary lecturer at Australian National University (ANU). Previously, he worked as a post-doctoral researcher in the team LEAR, INRIA, France. He received my BSc degree in Telecommunications and Software Engineering in 2004 from the Warsaw University of Technology, Poland, and completed his PhD degree in Computer Vision in 2013 at CVSSP, University of Surrey, UK. His interests include visual concept detection, visual category recognition, action recognition, zero-, one- and few shot learning, domain adaptation, image-to-image translation, feature and representation learning, invariance learning and understanding, feature pooling, spectral learning and graphs, as well as tensor, kernel methods, linearisations, sparsity and deep learning methods.

CITATION

If you wish to cite any topics raised during the tutorial, refer to specific papers of our speakers. Additionally, you are welcome to cite the tutorial itself:

@misc{secordcv_tutorial_2019,
  title = {Second- and Higher-order Representations in Computer Vision},
  author = {P. Koniusz and M. Harandi and L. Wang and R. Wang},
  howpublished = {ICCV Tutorial, \url{https://www.koniusz.com/secordcv-iccv19}},
  note = {Accessed: 02-11-2019},
  year = {2019},
}

ORGANISERS

Dr. Piotr Koniusz (Data61/CSIRO and the Australian National University)
Dr. Mehrtash Harandi (Monash University)
Dr. Lei Wang (University of Wollongong)
Dr. Ruiping Wang (Chinese Academy of Sciences)