Skip to content
This repository has been archived by the owner on Apr 21, 2024. It is now read-only.

Latest commit

 

History

History
43 lines (43 loc) · 25.9 KB

20181008.md

File metadata and controls

43 lines (43 loc) · 25.9 KB

ArXiv cs.CV --Mon, 8 Oct 2018

1.RCCNet: An Efficient Convolutional Neural Network for Histological Routine Colon Cancer Nuclei Classification pdf

Efficient and precise classification of histological cell nuclei is of utmost importance due to its potential applications in the field of medical image analysis. It would facilitate the medical practitioners to better understand and explore various factors for cancer treatment. The classification of histological cell nuclei is a challenging task due to the cellular heterogeneity. This paper proposes an efficient Convolutional Neural Network (CNN) based architecture for classification of histological routine colon cancer nuclei named as RCCNet. The main objective of this network is to keep the CNN model as simple as possible. The proposed RCCNet model consists of only 1,512,868 learnable parameters which are significantly less compared to the popular CNN models such as AlexNet, CIFARVGG, GoogLeNet, and WRN. The experiments are conducted over publicly available routine colon cancer histological dataset "CRCHistoPhenotypes". The results of the proposed RCCNet model are compared with five state-of-the-art CNN models in terms of the accuracy, weighted average F1 score and training time. The proposed method has achieved a classification accuracy of 80.61% and 0.7887 weighted average F1 score. The proposed RCCNet is more efficient and generalized terms of the training time and data over-fitting, respectively.

2.Interpretable Convolutional Neural Networks via Feedforward Design pdf

The model parameters of convolutional neural networks (CNNs) are determined by backpropagation (BP). In this work, we propose an interpretable feedforward (FF) design without any BP as a reference. The FF design adopts a data-centric approach. It derives network parameters of the current layer based on data statistics from the output of the previous layer in a one-pass manner. To construct convolutional layers, we develop a new signal transform, called the Saab (Subspace Approximation with Adjusted Bias) transform. It is a variant of the principal component analysis (PCA) with an added bias vector to annihilate activation's nonlinearity. Multiple Saab transforms in cascade yield multiple convolutional layers. As to fully-connected (FC) layers, we construct them using a cascade of multi-stage linear least squared regressors (LSRs). The classification and robustness (against adversarial attacks) performances of BP- and FF-designed CNNs applied to the MNIST and the CIFAR-10 datasets are compared. Finally, we comment on the relationship between BP and FF designs.

3.Hierarchical Recurrent Filtering for Fully Convolutional DenseNets pdf

Generating a robust representation of the environment is a crucial ability of learning agents. Deep learning based methods have greatly improved perception systems but still fail in challenging situations. These failures are often not solvable on the basis of a single image. In this work, we present a parameter-efficient temporal filtering concept which extends an existing single-frame segmentation model to work with multiple frames. The resulting recurrent architecture temporally filters representations on all abstraction levels in a hierarchical manner, while decoupling temporal dependencies from scene representation. Using a synthetic dataset, we show the ability of our model to cope with data perturbations and highlight the importance of recurrent and hierarchical filtering.

4.Automatic Detection of Arousals during Sleep using Multiple Physiological Signals pdf

The visual scoring of arousals during sleep routinely conducted by sleep experts is a challenging task warranting an automatic approach. This paper presents an algorithm for automatic detection of arousals during sleep. Using the Physionet/CinC Challenge dataset, an 80-20% subject-level split was performed to create in-house training and test sets, respectively. The data for each subject in the training set was split to 30-second epochs with no overlap. A total of 428 features from EEG, EMG, EOG, airflow, and SaO2 in each epoch were extracted and used for creating subject-specific models based on an ensemble of bagged classification trees, resulting in 943 models. For marking arousal and non-arousal regions in the test set, the data in the test set was split to 30-second epochs with 50% overlaps. The average of arousal probabilities from different patient-specific models was assigned to each 30-second epoch and then a sample-wise probability vector with the same length as test data was created for model evaluation. Using the PhysioNet/CinC Challenge 2018 scoring criteria, AUPRCs of 0.25 and 0.21 were achieved for the in-house test and blind test sets, respectively.

5.Learning Depth with Convolutional Spatial Propagation Network pdf

Depth prediction is one of the fundamental problems in computer vision. In this paper, we propose a simple yet effective convolutional spatial propagation network (CSPN) to learn the affinity matrix for various depth estimation tasks. Specifically, it is an efficient linear propagation model, in which the propagation is performed with a manner of recurrent convolutional operation, and the affinity among neighboring pixels is learned through a deep convolutional neural network (CNN). We can append this module to any output from a state-of-the-art (SOTA) depth estimation networks to improve their performances. In practice, we further extend CSPN in two aspects: 1) take sparse depth map as additional input, which is useful for the task of depth completion; 2) similar to commonly used 3D convolution operation in CNNs, we propose 3D CSPN to handle features with one additional dimension, which is effective in the task of stereo matching using 3D cost volume. For the tasks of sparse to dense, a.k.a depth completion. We experimented the proposed CPSN conjunct algorithms over the popular NYU v2 and KITTI datasets, where we show that our proposed algorithms not only produce high quality (e.g., 30% more reduction in depth error), but also run faster (e.g., 2 to 5x faster) than previous SOTA spatial propagation network. We also evaluated our stereo matching algorithm on the Scene Flow and KITTI Stereo datasets, and rank 1st on both the KITTI Stereo 2012 and 2015 benchmarks, which demonstrates the effectiveness of the proposed module. The code of CSPN proposed in this work will be released at this https URL

6.Generating Diffusion MRI scalar maps from T1 weighted images using generative adversarial networks pdf

Diffusion magnetic resonance imaging (diffusion MRI) is a non-invasive microstructure assessment method. Scalar measures quantifying micro-structural tissue properties can be obtained using diffusion models and data processing pipelines. However, it is costly and time consuming to collect high quality diffusion data. We demonstrate how Generative Adversarial Networks (GANs) can be used to generate diffusion scalar measures from structural MR images in a single optimized step, without diffusion models and diffusion data. We show that the used Cycle-GAN model can synthesize visually realistic and quantitatively accurate diffusion-derived scalar measures.

7.ReTiCaM: Real-time Human Performance Capture from Monocular Video pdf

We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster.

8.SLIC Based Digital Image Enlargement pdf

Low resolution image enhancement is a classical computer vision problem. Selecting the best method to reconstruct an image to a higher resolution with the limited data available in the low-resolution image is quite a challenge. A major drawback from the existing enlargement techniques is the introduction of color bleeding while interpolating pixels over the edges that separate distinct colors in an image. The color bleeding causes to accentuate the edges with new colors as a result of blending multiple colors over adjacent regions. This paper proposes a novel approach to mitigate the color bleeding by segmenting the homogeneous color regions of the image using Simple Linear Iterative Clustering (SLIC) and applying a higher order interpolation technique separately on the isolated segments. The interpolation at the boundaries of each of the isolated segments is handled by using a morphological operation. The approach is evaluated by comparing against several frequently used image enlargement methods such as bilinear and bicubic interpolation by means of Peak Signal-to-Noise-Ratio (PSNR) value. The results obtained exhibit that the proposed method outperforms the baseline methods by means of PSNR and also mitigates the color bleeding at the edges which improves the overall appearance.

9.Spatially-weighted Anomaly Detection pdf

Many types of anomaly detection methods have been proposed recently, and applied to a wide variety of fields including medical screening and production quality checking. Some methods have utilized images, and, in some cases, a part of the anomaly images is known beforehand. However, this kind of information is dismissed by previous methods, because the methods can only utilize a normal pattern. Moreover, the previous methods suffer a decrease in accuracy due to negative effects from surrounding noises. In this study, we propose a spatially-weighted anomaly detection method (SPADE) that utilizes all of the known patterns and lessens the vulnerability to ambient noises by applying Grad-CAM, which is the visualization method of a CNN. We evaluated our method quantitatively using two datasets, the MNIST dataset with noise and a dataset based on a brief screening test for dementia.

10.AIRNet: Self-Supervised Affine Registration for 3D Medical Images using Neural Networks pdf

In this work, we propose a self-supervised learning method for affine image registration on 3D medical images. Unlike optimisation-based methods, our affine image registration network (AIRNet) is designed to directly estimate the transformation parameters between two input images without using any metric, which represents the quality of the registration, as the optimising function. But since it is costly to manually identify the transformation parameters between any two images, we leverage the abundance of cheap unlabelled data to generate a synthetic dataset for the training of the model. Additionally, the structure of AIRNet enables us to learn the discriminative features of the images which are useful for registration purpose. Our proposed method was evaluated on magnetic resonance images of the axial view of human brain and compared with the performance of a conventional image registration method. Experiments demonstrate that our approach achieves better overall performance on registration of images from different patients and modalities with 100x speed-up in execution time.

11.Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime pdf

This work addresses the problem of semantic image segmentation of nighttime scenes. Although considerable progress has been made in semantic image segmentation, it is mainly related to daytime scenarios. This paper proposes a novel method to progressive adapt the semantic models trained on daytime scenes, along with large-scale annotations therein, to nighttime scenes via the bridge of twilight time -- the time between dawn and sunrise, or between sunset and dusk. The goal of the method is to alleviate the cost of human annotation for nighttime images by transferring knowledge from standard daytime conditions. In addition to the method, a new dataset of road scenes is compiled; it consists of 35,000 images ranging from daytime to twilight time and to nighttime. Also, a subset of the nighttime images are densely annotated for method evaluation. Our experiments show that our method is effective for model adaptation from daytime scenes to nighttime scenes, without using extra human annotation.

12.Weakly Supervised Object Detection in Artworks pdf

We propose a method for the weakly supervised detection of objects in paintings. At training time, only image-level annotations are needed. This, combined with the efficiency of our multiple-instance learning method, enables one to learn new classes on-the-fly from globally annotated databases, avoiding the tedious task of manually marking objects. We show on several databases that dropping the instance-level annotations only yields mild performance losses. We also introduce a new database, IconArt, on which we perform detection experiments on classes that could not be learned on photographs, such as Jesus Child or Saint Sebastian. To the best of our knowledge, these are the first experiments dealing with the automatic (and in our case weakly supervised) detection of iconographic elements in paintings. We believe that such a method is of great benefit for helping art historians to explore large digital databases.

13.Medical Images Analysis in Cancer Diagnostic pdf

This paper shows results of computer analysis of images in the purpose of finding differences between medical images in order of their classifications in terms of separation malign tissue from a normal and benign tissue. The diagnostics of malign tissue is of the crucial importance in medicine. Therefore, ascertainment of the correlation between multifractals parameters and "chaotic" cells could be of the great appliance. This paper shows the application of multifractal analysis for additional help in cancer diagnosis, as well as diminishing. of the subjective factor and error probability

14.Co-Learning Feature Fusion Maps from PET-CT Images of Lung Cancer pdf

The analysis of multi-modality positron emission tomography and computed tomography (PET-CT) images requires combining the sensitivity of PET to detect abnormal regions with anatomical localization from CT. However, current methods for PET-CT image analysis either process the modalities separately or fuse information from each modality based on knowledge about the image analysis task. These methods generally do not consider the spatially varying visual characteristics that encode different information across the different modalities, which have different priorities at different locations. For example, a high abnormal PET uptake in the lungs is more meaningful for tumor detection than physiological PET uptake in the heart. Our aim is to improve fusion of the complementary information in multi-modality PET-CT with a new supervised convolutional neural network (CNN) that learns to fuse complementary information for multi-modality medical image analysis. Our CNN first encodes modality-specific features and then uses them to derive a spatially varying fusion map that quantifies the relative importance of each modality's features across different spatial locations. These fusion maps are then multiplied with the modality-specific feature maps to obtain a representation of the complementary multi-modality information at different locations, which can then be used for image analysis, e.g. region detection. We evaluated our CNN on a region detection problem using a dataset of PET-CT images of lung cancer. We compared our method to baseline techniques for multi-modality image analysis (pre-fused inputs, multi-branch techniques, multi-channel techniques) and demonstrated that our approach had a significantly higher accuracy ($p < 0.05$) than the baselines.

15.FashionNet: Personalized Outfit Recommendation with Deep Neural Network pdf

With the rapid growth of fashion-focused social networks and online shopping, intelligent fashion recommendation is now in great need. We design algorithms which automatically suggest users outfits (e.g. a shirt, together with a skirt and a pair of high-heel shoes), that fit their personal fashion preferences. Recommending sets, each of which is composed of multiple interacted items, is relatively new to recommender systems, which usually recommend individual items to users. We explore the use of deep networks for this challenging task. Our system, dubbed FashionNet, consists of two components, a feature network for feature extraction and a matching network for compatibility computation. The former is achieved through a deep convolutional network. And for the latter, we adopt a multi-layer fully-connected network structure. We design and compare three alternative architectures for FashionNet. To achieve personalized recommendation, we develop a two-stage training strategy, which uses the fine-tuning technique to transfer a general compatibility model to a model that embeds personal preference. Experiments on a large scale data set collected from a popular fashion-focused social network validate the effectiveness of the proposed networks.

16.Relative Saliency and Ranking: Models, Metrics, Data, and Benchmarks pdf

Salient object detection is a problem that has been considered in detail and many solutions proposed. In this paper, we argue that work to date has addressed a problem that is relatively ill-posed. Specifically, there is not universal agreement about what constitutes a salient object when multiple observers are queried. This implies that some objects are more likely to be judged salient than others, and implies a relative rank exists on salient objects. Initially, we present a novel deep learning solution based on a hierarchical representation of relative saliency and stage-wise refinement. Furthermore, we present data, analysis and benchmark baseline results towards addressing the problem of salient object ranking. Methods for deriving suitable ranked salient object instances are presented, along with metrics suitable to measuring algorithm performance. In addition, we show how a derived dataset can be successively refined to provide cleaned results that correlate well with pristine ground truth. Finally, we provide a comparison among prevailing algorithms that address salient object ranking or detection to establish initial baselines.

17.Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs pdf

The extension of image generation to video generation turns out to be a very difficult task, since the temporal dimension of videos introduces an extra challenge during the generation process. Besides, due to the limitation of memory and training stability, the generation becomes increasingly challenging with the increase of the resolution/duration of videos. In this work, we exploit the idea of progressive growing of Generative Adversarial Networks (GANs) for higher resolution video generation. In particular, we begin to produce video samples of low-resolution and short-duration, and then progressively increase both resolution and duration alone (or jointly) by adding new spatiotemporal convolutional layers to the current networks. Starting from the learning on a very raw-level spatial appearance and temporal movement of the video distribution, the proposed progressive method learns spatiotemporal information incrementally to generate higher resolution videos. Furthermore, we introduce a sliced version of Wasserstein GAN (SWGAN) loss to improve the distribution learning on the video data of high-dimension and mixed-spatiotemporal distribution. SWGAN loss replaces the distance between joint distributions by that of one-dimensional marginal distributions, making the loss easier to compute. We evaluate the proposed model on our collected face video dataset of 10,900 videos to generate photorealistic face videos of 256x256x32 resolution. In addition, our model also reaches a record inception score of 14.57 in unsupervised action recognition dataset UCF-101.

18.A method to Suppress Facial Expression in Posed and Spontaneous Videos pdf

We address the problem of suppressing facial expressions in videos because expressions can hinder the retrieval of important information in applications such as face recognition. To achieve this, we present an optical strain suppression method that removes any facial expression without requiring training for a specific expression. For each frame in a video, an optical strain map that provides the strain magnitude value at each pixel is generated; this strain map is then utilized to neutralize the expression by replacing pixels of high strain values with pixels from a reference face frame. Experimental results of testing the method on various expressions namely happiness, sadness, and anger for two publicly available data sets (i.e., BU-4DFE and AM-FED) show the ability of our method in suppressing facial expressions.

19.Learning To Simulate pdf

Simulation is a useful tool in situations where training data for machine learning models is costly to annotate or even hard to acquire. In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data. In contrast to prior art that hand-crafts these simulation parameters or adjusts only parts of the available parameters, our approach fully controls the simulator with the actual underlying goal of maximizing accuracy, rather than mimicking the real data distribution or randomly generating a large volume of data. We find that our approach (i) quickly converges to the optimal simulation parameters in controlled experiments and (ii) can indeed discover good sets of parameters for an image rendering simulator in actual computer vision applications.

20.Transfer Learning via Unsupervised Task Discovery for Visual Question Answering pdf

We study how to leverage off-the-shelf visual and linguistic data to cope with out-of-vocabulary answers in visual question answering. Existing large-scale visual data with annotations such as image class labels, bounding boxes and region descriptions are good sources for learning rich and diverse visual concepts. However, it is not straightforward how the visual concepts should be captured and transferred to visual question answering models due to missing link between question dependent answering models and visual data without question or task specification. We tackle this problem in two steps: 1) learning a task conditional visual classifier based on unsupervised task discovery and 2) transferring and adapting the task conditional visual classifier to visual question answering models. Specifically, we employ linguistic knowledge sources such as structured lexical database (e.g. Wordnet) and visual descriptions for unsupervised task discovery, and adapt a learned task conditional visual classifier to answering unit in a visual question answering model. We empirically show that the proposed algorithm generalizes to unseen answers successfully using the knowledge transferred from the visual data.

21.Unsupervised Learning via Meta-Learning pdf

A central goal of unsupervised learning is to acquire representations from unlabeled data or experience that can be used for more effective learning of downstream tasks from modest amounts of labeled data. Many prior unsupervised learning works aim to do so by developing proxy objectives based on reconstruction, disentanglement, prediction, and other metrics. Instead, we develop an unsupervised learning method that explicitly optimizes for the ability to learn a variety of tasks from small amounts of data. To do so, we construct tasks from unlabeled data in an automatic way and run meta-learning over the constructed tasks. Surprisingly, we find that relatively simple mechanisms for task design, such as clustering unsupervised representations, lead to good performance on a variety of downstream tasks. Our experiments across four image datasets indicate that our unsupervised meta-learning approach acquires a learning algorithm without any labeled data that is applicable to a wide range of downstream classification tasks, improving upon the representation learned by four prior unsupervised learning methods.