1.Canonical Correlation Analysis for Misaligned Satellite Image Change Detection pdf
Canonical correlation analysis (CCA) is a statistical learning method that seeks to build view-independent latent representations from multi-view data. This method has been successfully applied to several pattern analysis tasks such as image-to-text mapping and view-invariant object/action recognition. However, this success is highly dependent on the quality of data pairing (i.e., alignments) and mispairing adversely affects the generalization ability of the learned CCA representations. In this paper, we address the issue of alignment errors using a new variant of canonical correlation analysis referred to as alignment-agnostic (AA) CCA. Starting from erroneously paired data taken from different views, this CCA finds transformation matrices by optimizing a constrained maximization problem that mixes a data correlation term with context regularization; the particular design of these two terms mitigates the effect of alignment errors when learning the CCA transformations. Experiments conducted on multi-view tasks, including multi-temporal satellite image change detection, show that our AA CCA method is highly effective and resilient to mispairing errors.
2.Polygonal approximation of digital planar curve using novel significant measure pdf
This paper presents an iterative smoothing technique for polygonal approximation of digital image boundary. The technique starts with finest initial segmentation points of a curve. The contribution of initially segmented points towards preserving the original shape of the image boundary is determined by computing the significant measure of every initial segmentation points which is sensitive to sharp turns, which may be missed easily when conventional significant measures are used for detecting dominant points. The proposed method differentiates between the situations when a point on the curve between two points on a curve projects directly upon the line segment or beyond this line segment. It not only identifies these situations, but also computes its significant contribution for these situations differently. This situation-specific treatment allows preservation of points with high curvature even as revised set of dominant points are derived. The experimental results show that the proposed technique competes well with the state of the art techniques.
3.Learning from Web Data: the Benefit of Unsupervised Object Localization pdf
Annotating a large number of training images is very time-consuming. In this background, this paper focuses on learning from easy-to-acquire web data and utilizes the learned model for fine-grained image classification in labeled datasets. Currently, the performance gain from training with web data is incremental, like a common saying "better than nothing, but not by much". Conventionally, the community looks to correcting the noisy web labels to select informative samples. In this work, we first systematically study the built-in gap between the web and standard datasets, i.e. different data distributions between the two kinds of data. Then, in addition to using web labels, we present an unsupervised object localization method, which provides critical insights into the object density and scale in web images. Specifically, we design two constraints on web data to substantially reduce the difference of data distributions for the web and standard datasets. First, we present a method to control the scale, localization and number of objects in the detected region. Second, we propose to select the regions containing objects that are consistent with the web tag. Based on the two constraints, we are able to process web images to reduce the gap, and the processed web data is used to better assist the standard dataset to train CNNs. Experiments on several fine-grained image classification datasets confirm that our method performs favorably against the state-of-the-art methods.
4.Learning Compositional Representations for Few-Shot Recognition pdf
One of the key limitations of modern deep learning based approaches lies in the amount of data required to train them. Humans, on the other hand, can learn to recognize novel categories from just a few examples. Instrumental to this rapid learning ability is the compositional structure of concept representations in the human brain - something that deep learning models are lacking. In this work we make a step towards bridging this gap between human and machine learning by introducing a simple regularization technique that allows the learned representation to be decomposable into parts. We evaluate the proposed approach on three datasets: CUB-200-2011, SUN397, and ImageNet, and demonstrate that our compositional representations require fewer examples to learn classifiers for novel categories, outperforming state-of-the-art few-shot learning approaches by a significant margin.
5.Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD pdf
Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a foundation of many multimedia retrieval systems. A common approach is to rely on Product Quantization that allows storing large vector databases in memory and also allows efficient distance computations. Yet, implementations of nearest neighbor search with Product Quantization have their performance limited by the many memory accesses they perform. Following this observation, André et al. proposed more efficient implementations of
$m\times{}4$ product quantizers (PQ) leveraging specific SIMD instructions.
Quicker ADC contributes additional implementations not limited to$m\times{}4$ codes and relying on AVX-512, the latest revision of SIMD instruction set. In doing so, Quicker ADC faces the challenge of using efficiently 5,6 and 7-bit shuffles that do not align to computer bytes or words. To this end, we introduce (i) irregular product quantizers combining sub-quantizers of different granularity and (ii) split tables allowing lookup tables larger than registers. We evaluate Quicker ADC with multiple indexes including Inverted Multi-Indexes and IVF HNSW and show that it outperforms FAISS PQ implementation and optimization (i.e., Polysemous codes) for numerous configurations. Finally, we open-source at this http URL a fork of FAISS that includes Quicker ADC.
6.A Multiscale Image Denoising Algorithm Based On Dilated Residual Convolution Network pdf
Image denoising is a classical problem in low level computer vision. Model-based optimization methods and deep learning approaches have been the two main strategies for solving the problem. Model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming. In contrast, deep learning methods have fast testing speed but the performance of these CNNs is still inferior. To address this issue, here we propose a novel deep residual learning model that combines the dilated residual convolution and multi-scale convolution groups. Due to the complex patterns and structures of inside an image, the multiscale convolution group is utilized to learn those patterns and enlarge the receptive field. Specifically, the residual connection and batch normalization are utilized to speed up the training process and maintain the denoising performance. In order to decrease the gridding artifacts, we integrate the hybrid dilated convolution design into our model. To this end, this paper aims to train a lightweight and effective denoiser based on multiscale convolution group. Experimental results have demonstrated that the enhanced denoiser can not only achieve promising denoising results, but also become a strong competitor in practical application.
7.Cascaded Coarse-to-Fine Deep Kernel Networks for Efficient Satellite Image Change Detection pdf
Deep networks are nowadays becoming popular in many computer vision and pattern recognition tasks. Among these networks, deep kernels are particularly interesting and effective, however, their computational complexity is a major issue especially on cheap hardware resources. In this paper, we address the issue of efficient computation in deep kernel networks. We propose a novel framework that reduces dramatically the complexity of evaluating these deep kernels. Our method is based on a coarse-to-fine cascade of networks designed for efficient computation; early stages of the cascade are cheap and reject many patterns efficiently while deep stages are more expensive and accurate. The design principle of these reduced complexity networks is based on a variant of the cross-entropy criterion that reduces the complexity of the networks in the cascade while preserving all the positive responses of the original kernel network. Experiments conducted - on the challenging and time demanding change detection task, on very large satellite images - show that our proposed coarse-to-fine approach is effective and highly efficient.
8.3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks pdf
In video super-resolution, the spatio-temporal coherence between, and among the frames must be exploited appropriately for accurate prediction of the high resolution frames. Although 2D convolutional neural networks (CNNs) are powerful in modelling images, 3D-CNNs are more suitable for spatio-temporal feature extraction as they can preserve temporal information. To this end, we propose an effective 3D-CNN for video super-resolution, called the 3DSRnet that does not require motion alignment as preprocessing. Our 3DSRnet maintains the temporal depth of spatio-temporal feature maps to maximally capture the temporally nonlinear characteristics between low and high resolution frames, and adopts residual learning in conjunction with the sub-pixel outputs. It outperforms the most state-of-the-art method with average 0.45 and 0.36 dB higher in PSNR for scales 3 and 4, respectively, in the Vidset4 benchmark. Our 3DSRnet first deals with the performance drop due to scene change, which is important in practice but has not been previously considered.
9.3D multirater RCNN for multimodal multiclass detection and characterisation of extremely small objects pdf
Extremely small objects (ESO) have become observable on clinical routine magnetic resonance imaging acquisitions, thanks to a reduction in acquisition time at higher resolution. Despite their small size (usually $<$10 voxels per object for an image of more than
$10^6$ voxels), these markers reflect tissue damage and need to be accounted for to investigate the complete phenotype of complex pathological pathways. In addition to their very small size, variability in shape and appearance leads to high labelling variability across human raters, resulting in a very noisy gold standard. Such objects are notably present in the context of cerebral small vessel disease where enlarged perivascular spaces and lacunes, commonly observed in the ageing population, are thought to be associated with acceleration of cognitive decline and risk of dementia onset. In this work, we redesign the RCNN model to scale to 3D data, and to jointly detect and characterise these important markers of age-related neurovascular changes. We also propose training strategies enforcing the detection of extremely small objects, ensuring a tractable and stable training process.
10.Detection of distal radius fractures trained by a small set of X-ray images and Faster R-CNN pdf
Distal radius fractures are the most common fractures of the upper extremity in humans. As such, they account for a significant portion of the injuries that present to emergency rooms and clinics throughout the world. We trained a Faster R-CNN, a machine vision neural network for object detection, to identify and locate distal radius fractures in anteroposterior X-ray images. We achieved an accuracy of 96% in identifying fractures and mean Average Precision, mAP, of 0.866. This is significantly more accurate than the detection achieved by physicians and radiologists. These results were obtained by training the deep learning network with only 38 original images of anteroposterior hands X-ray images with fractures. This opens the possibility to detect with this type of neural network rare diseases or rare symptoms of common diseases , where only a small set of diagnosed X-ray images could be collected for each disease.
11.Face Hallucination Revisited: An Exploratory Study on Dataset Bias pdf
Contemporary face hallucination (FH) models exhibit considerable ability to reconstruct high-resolution (HR) details from low-resolution (LR) face images. This ability is commonly learned from examples of corresponding HR-LR image pairs, created by artificially down-sampling the HR ground truth data. This down-sampling (or degradation) procedure not only defines the characteristics of the LR training data, but also determines the type of image degradations the learned FH models are eventually able to handle. If the image characteristics encountered with real-world LR images differ from the ones seen during training, FH models are still expected to perform well, but in practice may not produce the desired results. In this paper we study this problem and explore the bias introduced into FH models by the characteristics of the training data. We systematically analyze the generalization capabilities of several FH models in various scenarios, where the image the degradation function does not match the training setup and conduct experiments with synthetically downgraded as well as real-life low-quality images. We make several interesting findings that provide insight into existing problems with FH models and point to future research directions.
12.A Deep Four-Stream Siamese Convolutional Neural Network with Joint Verification and Identification Loss for Person Re-detection pdf
State-of-the-art person re-identification systems that employ a triplet based deep network suffer from a poor generalization capability. In this paper, we propose a four stream Siamese deep convolutional neural network for person redetection that jointly optimises verification and identification losses over a four image input group. Specifically, the proposed method overcomes the weakness of the typical triplet formulation by using groups of four images featuring two matched (i.e. the same identity) and two mismatched images. This allows us to jointly increase the interclass variations and reduce the intra-class variations in the learned feature space. The proposed approach also optimises over both the identification and verification losses, further minimising intra-class variation and maximising inter-class variation, improving overall performance. Extensive experiments on four challenging datasets, VIPeR, CUHK01, CUHK03 and PRID2011, demonstrates that the proposed approach achieves state-of-the-art performance.
13.Multi-component Image Translation for Deep Domain Generalization pdf
Domain adaption (DA) and domain generalization (DG) are two closely related methods which are both concerned with the task of assigning labels to an unlabeled data set. The only dissimilarity between these approaches is that DA can access the target data during the training phase, while the target data is totally unseen during the training phase in DG. The task of DG is challenging as we have no earlier knowledge of the target samples. If DA methods are applied directly to DG by a simple exclusion of the target data from training, poor performance will result for a given task. In this paper, we tackle the domain generalization challenge in two ways. In our first approach, we propose a novel deep domain generalization architecture utilizing synthetic data generated by a Generative Adversarial Network (GAN). The discrepancy between the generated images and synthetic images is minimized using existing domain discrepancy metrics such as maximum mean discrepancy or correlation alignment. In our second approach, we introduce a protocol for applying DA methods to a DG scenario by excluding the target data from the training phase, splitting the source data to training and validation parts, and treating the validation data as target data for DA. We conduct extensive experiments on four cross-domain benchmark datasets. Experimental results signify our proposed model outperforms the current state-of-the-art methods for DG.
14.Saliency Guided Hierarchical Robust Visual Tracking pdf
A saliency guided hierarchical visual tracking (SHT) algorithm containing global and local search phases is proposed in this paper. In global search, a top-down saliency model is novelly developed to handle abrupt motion and appearance variation problems. Nineteen feature maps are extracted first and combined with online learnt weights to produce the final saliency map and estimated target locations. After the evaluation of integration mechanism, the optimum candidate patch is passed to the local search. In local search, a superpixel based HSV histogram matching is performed jointly with an L2-RLS tracker to take both color distribution and holistic appearance feature of the object into consideration. Furthermore, a linear refinement search process with fast iterative solver is implemented to attenuate the possible negative influence of dominant particles. Both qualitative and quantitative experiments are conducted on a series of challenging image sequences. The superior performance of the proposed method over other state-of-the-art algorithms is demonstrated by comparative study.
15.Densely Semantically Aligned Person Re-Identification pdf
We propose a densely semantically aligned person re-identification (re-ID) framework. It fundamentally addresses the body misalignment problem caused by pose/viewpoint variations, imperfect person detection, occlusion,etc.. By leveraging the estimation of the dense semantics of a person image, we construct a set of densely semantically aligned part images (DSAP-images), where the same spatial positions have the same semantics across different person images. We design a two-stream network that consists of a main full image stream (MF-Stream) and a densely semantically-aligned guiding stream (DSAG-Stream). The DSAG-Stream, with the DSAP-images as input, acts as a regulator to guide the MF-Stream to learn densely semantically aligned features from the original image. In the inference, the DSAG-Stream is discarded and only the MF-Stream is needed, which makes the inference system computationally efficient and robust. To our best knowledge, we are the first to make use of fine grained semantics for addressing misalignment problems for re-ID. Our method achieves rank-1 accuracy of 78.9% (new protocol) on the CUHK03 dataset, 90.4% on the CUHK01 dataset, and 95.7% on the Market1501 dataset, outperforming state-of-the-art methods.
16.ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation pdf
This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU).
17.Slimmable Neural Networks pdf
We present a simple and general method to train a single neural network executable at different widths (number of channels in a layer), permitting instant and adaptive accuracy-efficiency trade-offs at runtime. Instead of training individual networks with different width configurations, we train a shared network with switchable batch normalization. At runtime, the network can adjust its width on the fly according to on-device benchmarks and resource constraints, rather than downloading and offloading different models. Our trained networks, named slimmable neural networks, achieve similar (and in many cases better) ImageNet classification accuracy than individually trained models of MobileNet v1, MobileNet v2, ShuffleNet and ResNet-50 at different widths respectively. We also demonstrate better performance of slimmable models compared with individual ones across a wide range of applications including COCO bounding-box object detection, instance segmentation and person keypoint detection without tuning hyper-parameters. Lastly we visualize and discuss the learned features of slimmable networks. Code and models are available at: this https URL
18.Efficient Misalignment-Robust Multi-Focus Microscopical Images Fusion pdf
In this paper we propose a very efficient method to fuse the unregistered multi-focus microscopical images based on the speed-up robust features (SURF). Our method follows the pipeline of first registration and then fusion. However, instead of treating the registration and fusion as two completely independent stage, we propose to reuse the determinant of the approximate Hessian generated in SURF detection stage as the corresponding salient response for the final image fusion, thus it enables nearly cost-free saliency map generation. In addition, due to the adoption of SURF scale space representation, our method can generate scale-invariant saliency map which is desired for scale-invariant image fusion. We present an extensive evaluation on the dataset consisting of several groups of unregistered multi-focus 4K ultra HD microscopic images with size of 4112 x 3008. Compared with the state-of-the-art multi-focus image fusion methods, our method is much faster and achieve better results in the visual performance. Our method provides a flexible and efficient way to integrate complementary and redundant information from multiple multi-focus ultra HD unregistered images into a fused image that contains better description than any of the individual input images. Code is available at this https URL.
19.Deep Learning to Assess Glaucoma Risk and Associated Features in Fundus Images pdf
Glaucoma is the leading cause of preventable, irreversible blindness world-wide. The disease can remain asymptomatic until severe, and an estimated 50%-90% of people with glaucoma remain undiagnosed. Thus, glaucoma screening is recommended for early detection and treatment. A cost-effective tool to detect glaucoma could expand healthcare access to a much larger patient population, but such a tool is currently unavailable. We trained a deep learning (DL) algorithm using a retrospective dataset of 58,033 images, assessed for gradability, glaucomatous optic nerve head (ONH) features, and referable glaucoma risk. The resultant algorithm was validated using 2 separate datasets. For referable glaucoma risk, the algorithm had an AUC of 0.940 (95%CI, 0.922-0.955) in validation dataset "A" (1,205 images, 1 image/patient; 19% referable where images were adjudicated by panels of fellowship-trained glaucoma specialists) and 0.858 (95% CI, 0.836-0.878) in validation dataset "B" (17,593 images from 9,643 patients; 9.2% referable where images were from the Atlanta Veterans Affairs Eye Clinic diabetic teleretinal screening program using clinical referral decisions as the reference standard). Additionally, we found that the presence of vertical cup-to-disc ratio >= 0.7, neuroretinal rim notching, retinal nerve fiber layer defect, and bared circumlinear vessels contributed most to referable glaucoma risk assessment by both glaucoma specialists and the algorithm. Algorithm AUCs ranged between 0.608-0.977 for glaucomatous ONH features. The DL algorithm was significantly more sensitive than 6 of 10 graders, including 2 of 3 glaucoma specialists, with comparable or higher specificity relative to all graders. A DL algorithm trained on fundus images alone can detect referable glaucoma risk with higher sensitivity and comparable specificity to eye care providers.
20.An Optical Flow-Based Approach for Minimally-Divergent Velocimetry Data Interpolation pdf
Three-dimensional (3D) biomedical image sets are often acquired with in-plane pixel spacings that are far less than the out-of-plane spacings between images. The resultant anisotropy, which can be detrimental in many applications, can be decreased using image interpolation. Optical flow and/or other registration-based interpolators have proven useful in such interpolation roles in the past. When acquired images are comprised of signals that describe the flow velocity of fluids, additional information is available to guide the interpolation process. In this paper, we present an optical-flow based framework for image interpolation that also minimizes resultant divergence in the interpolated data.
21.Three Dimensional Reconstruction of Botanical Trees with Simulatable Geometry pdf
We tackle the challenging problem of creating full and accurate three dimensional reconstructions of botanical trees with the topological and geometric accuracy required for subsequent physical simulation, e.g. in response to wind forces. Although certain aspects of our approach would benefit from various improvements, our results exceed the state of the art especially in geometric and topological complexity and accuracy. Starting with two dimensional RGB image data acquired from cameras attached to drones, we create point clouds, textured triangle meshes, and a simulatable and skinned cylindrical articulated rigid body model. We discuss the pros and cons of each step of our pipeline, and in order to stimulate future research we make the raw and processed data from every step of the pipeline as well as the final geometric reconstructions publicly available.
22.SMILER: Saliency Model Implementation Library for Experimental Research pdf
The Saliency Model Implementation Library for Experimental Research (SMILER) is a new software package which provides an open, standardized, and extensible framework for maintaining and executing computational saliency models. This work drastically reduces the human effort required to apply saliency algorithms to new tasks and datasets, while also ensuring consistency and procedural correctness for results and conclusions produced by different parties. At its launch SMILER already includes twenty three saliency models (fourteen models based in MATLAB and nine supported through containerization), and the open design of SMILER encourages this number to grow with future contributions from the community. The project may be downloaded and contributed to through its GitHub page: this https URL
23.Multimodal Sensor Fusion In Single Thermal image Super-Resolution pdf
With the fast growth in the visual surveillance and security sectors, thermal infrared images have become increasingly necessary ina large variety of industrial applications. This is true even though IR sensors are still more expensive than their RGB counterpart having the same resolution. In this paper, we propose a deep learning solution to enhance the thermal image resolution. The following results are given:(I) Introduction of a multimodal, visual-thermal fusion model that ad-dresses thermal image super-resolution, via integrating high-frequency information from the visual image. (II) Investigation of different net-work architecture schemes in the literature, their up-sampling methods,learning procedures, and their optimization functions by showing their beneficial contribution to the super-resolution problem. (III) A bench-mark ULB17-VT dataset that contains thermal images and their visual images counterpart is presented. (IV) Presentation of a qualitative evaluation of a large test set with 58 samples and 22 raters which shows that our proposed model performs better against state-of-the-arts.
24.A Smart Security System with Face Recognition pdf
Web-based technology has improved drastically in the past decade. As a result, security technology has become a major help to protect our daily life. In this paper, we propose a robust security based on face recognition system (SoF). In particular, we develop this system to giving access into a home for authenticated users. The classifier is trained by using a new adaptive learning method. The training data are initially collected from social networks. The accuracy of the classifier is incrementally improved as the user starts using the system. A novel method has been introduced to improve the classifier model by human interaction and social media. By using a deep learning framework - TensorFlow, it will be easy to reuse the framework to adopt with many devices and applications.
25.Generative Models from the perspective of Continual Learning pdf
Which generative model is the most suitable for Continual Learning? This paper aims at evaluating and comparing generative models on disjoint sequential image generation tasks. We investigate how several models learn and forget, considering various strategies: rehearsal, regularization, generative replay and fine-tuning. We used two quantitative metrics to estimate the generation quality and memory ability. We experiment with sequential tasks on three commonly used benchmarks for Continual Learning (MNIST, Fashion MNIST and CIFAR10). We found that among all models, the original GAN performs best and among Continual Learning strategies, generative replay outperforms all other methods. Even if we found satisfactory combinations on MNIST and Fashion MNIST, training generative models sequentially on CIFAR10 is particularly instable, and remains a challenge. Our code is available online \footnote{\url{this https URL_Continual_Learning}}.
26.A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization pdf
Emotional content is a crucial ingredient in user-generated videos. However, the sparsely expressed emotions in the user-generated video cause difficulties to emotions analysis in videos. In this paper, we propose a new neural approach---Bi-stream Emotion Attribution-Classification Network (BEAC-Net) to solve three related emotion analysis tasks: emotion recognition, emotion attribution and emotion-oriented summarization, in an integrated framework. BEAC-Net has two major constituents, an attribution network and a classification network. The attribution network extracts the main emotional segment that classification should focus on in order to mitigate the sparsity problem. The classification network utilizes both the extracted segment and the original video in a bi-stream architecture. We contribute a new dataset for the emotion attribution task with human-annotated ground-truth labels for emotion segments. Experiments on two video datasets demonstrate superior performance of the proposed framework and the complementary nature of the dual classification streams.
27.Non-Adversarial Image Synthesis with Generative Latent Nearest Neighbors pdf
Unconditional image generation has recently been dominated by generative adversarial networks (GANs). GAN methods train a generator which regresses images from random noise vectors, as well as a discriminator that attempts to differentiate between the generated images and a training set of real images. GANs have shown amazing results at generating realistic looking images. Despite their success, GANs suffer from critical drawbacks including: unstable training and mode-dropping. The weaknesses in GANs have motivated research into alternatives including: variational auto-encoders (VAEs), latent embedding learning methods (e.g. GLO) and nearest-neighbor based implicit maximum likelihood estimation (IMLE). Unfortunately at the moment, GANs still significantly outperform the alternative methods for image generation. In this work, we present a novel method - Generative Latent Nearest Neighbors (GLANN) - for training generative models without adversarial training. GLANN combines the strengths of IMLE and GLO in a way that overcomes the main drawbacks of each method. Consequently, GLANN generates images that are far better than GLO and IMLE. Our method does not suffer from mode collapse which plagues GAN training and is much more stable. Qualitative results show that GLANN outperforms a baseline consisting of 800 GANs and VAEs on commonly used datasets. Our models are also shown to be effective for training truly non-adversarial unsupervised image translation.
28.Cluster validity index based on Jeffrey divergence pdf
Cluster validity indexes are very important tools designed for two purposes: comparing the performance of clustering algorithms and determining the number of clusters that best fits the data. These indexes are in general constructed by combining a measure of compactness and a measure of separation. A classical measure of compactness is the variance. As for separation, the distance between cluster centers is used. However, such a distance does not always reflect the quality of the partition between clusters and sometimes gives misleading results. In this paper, we propose a new cluster validity index for which Jeffrey divergence is used to measure separation between clusters. Experimental results are conducted using different types of data and comparison with widely used cluster validity indexes demonstrates the outperformance of the proposed index.
29.Animating Arbitrary Objects via Deep Motion Transfer pdf
This paper introduces a novel deep learning framework for image animation. Given an input image with a target object and a driving video sequence depicting a moving object, our framework generates a video in which the target object is animated according to the driving sequence. This is achieved through a deep architecture that decouples appearance and motion information. Our framework consists of three main modules: (i) a Keypoint Detector unsupervisely trained to extract object keypoints, (ii) a Dense Motion prediction network for generating dense heatmaps from sparse keypoints, in order to better encode motion information and (iii) a Motion Transfer Network, which uses the motion heatmaps and appearance information extracted from the input image to synthesize the output frames. We demonstrate the effectiveness of our method on several benchmark datasets, spanning a wide variety of object appearances, and show that our approach outperforms state-of-the-art image animation and video generation methods.
30.A Scale Invariant Approach for Sparse Signal Recovery pdf
In this paper, we study the ratio of the $L_1 $ and $L_2 $ norms, denoted as
$L_1/L_2$ , to promote sparsity. Due to the non-convexity and non-linearity, there has been little attention to this scale-invariant metric. Compared to popular models in the literature such as the$L_p$ model for$p\in(0,1)$ and the transformed$L_1$ (TL1), this ratio model is parameter free. Theoretically, we present a weak null space property (wNSP) and prove that any sparse vector is a local minimizer of the $L_1 /L_2 $ model provided with this wNSP condition. Computationally, we focus on a constrained formulation that can be solved via the alternating direction method of multipliers (ADMM). Experiments show that the proposed approach is comparable to the state-of-the-art methods in sparse recovery. In addition, a variant of the$L_1/L_2$ model to apply on the gradient is also discussed with a proof-of-concept example of MRI reconstruction.construction.