今日学术视野(2018.8.8)

作者: ZQtGe6 | 来源:发表于2018-08-08 05:02 被阅读99次

    cs.AI - 人工智能
    cs.CE - 计算工程、 金融和科学
    cs.CL - 计算与语言
    cs.CR - 加密与安全
    cs.CV - 机器视觉与模式识别
    cs.CY - 计算与社会
    cs.DC - 分布式、并行与集群计算
    cs.HC - 人机接口
    cs.IR - 信息检索
    cs.IT - 信息论
    cs.LG - 自动学习
    cs.NE - 神经与进化计算
    cs.RO - 机器人学
    cs.SD - 声音处理
    cs.SI - 社交网络与信息网络
    cs.SY - 系统与控制
    eess.AS - 语音处理
    eess.SP - 信号处理
    math.PR - 概率
    math.ST - 统计理论
    quant-ph - 量子物理
    stat.AP - 应用统计
    stat.ME - 统计方法论
    stat.ML - (统计)机器学习

    • [cs.AI]An Efficient Approach to Learning Chinese Judgment Document Similarity Based on Knowledge Summarization
    • [cs.AI]An Efficient Deep Reinforcement Learning Model for Urban Traffic Control
    • [cs.AI]Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering
    • [cs.AI]Error Detection in a Large-Scale Lexical Taxonomy
    • [cs.AI]Logical Semantics and Commonsense Knowledge: Where Did we Go Wrong, and How to Go Forward, Again
    • [cs.AI]Reasoning with Justifiable Exceptions in Contextual Hierarchies (Appendix)
    • [cs.AI]Smart City Development with Urban Transfer Learning
    • [cs.CE]Stock Price Correlation Coefficient Prediction with ARIMA-LSTM Hybrid Model
    • [cs.CL]Abstractive Summarization Improved by WordNet-based Extractive Sentences
    • [cs.CL]Instantiation
    • [cs.CL]LISA: Explaining Recurrent Neural Network Judgments via Layer-wIse Semantic Accumulation and Example to Pattern Transformation
    • [cs.CL]Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
    • [cs.CL]Residual Memory Networks: Feed-forward approach to learn long temporal dependencies
    • [cs.CL]Using Linguistic Cues for Analyzing Social Movements
    • [cs.CR]Active Learning for Wireless IoT Intrusion Detection
    • [cs.CR]Am I Responsible for End-User's Security? A Programmer's Perspective
    • [cs.CR]Assessing and countering reaction attacks against post-quantum public-key cryptosystems based on QC-LDPC codes
    • [cs.CR]Signal Jamming Attacks Against Communication-Based Train Control: Attack Impact and Countermeasure
    • [cs.CR]Understanding Software Developers' Approach towards Implementing Data Minimization
    • [cs.CV]3D Conceptual Design Using Deep Learning
    • [cs.CV]3D Depthwise Convolution: Reducing Model Parameters in 3D Vision Tasks
    • [cs.CV]A Multi-task Framework for Skin Lesion Detection and Segmentation
    • [cs.CV]A Study of Deep Feature Fusion based Methods for Classifying Multi-lead ECG
    • [cs.CV]Classification of Dermoscopy Images using Deep Learning
    • [cs.CV]Deep Learning Advances on Different 3D Data Representations: A Survey
    • [cs.CV]Deep Multi-Center Learning for Face Alignment
    • [cs.CV]Deep Shape Analysis on Abdominal Organs for Diabetes Prediction
    • [cs.CV]Deep Transfer Learning for EEG-based Brain Computer Interface
    • [cs.CV]DeepTAM: Deep Tracking and Mapping
    • [cs.CV]Defense Against Adversarial Attacks with Saak Transform
    • [cs.CV]Detailed Dense Inference with Convolutional Neural Networks via Discrete Wavelet Transform
    • [cs.CV]Dilated Convolutions in Neural Networks for Left Atrial Segmentation in 3D Gadolinium Enhanced-MRI
    • [cs.CV]Error Correction Maximization for Deep Image Hashing
    • [cs.CV]Gray-box Adversarial Training
    • [cs.CV]Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
    • [cs.CV]Improving Temporal Interpolation of Head and Body Pose using Gaussian Process Regression in a Matrix Completion Setting
    • [cs.CV]Incorporating Scalability in Unsupervised Spatio-Temporal Feature Learning
    • [cs.CV]Is Robustness the Cost of Accuracy? -- A Comprehensive Study on the Robustness of 18 Deep Image Classification Models
    • [cs.CV]Language Model Supervision for Handwriting Recognition Model Adaptation
    • [cs.CV]Learning Multi-scale Features for Foreground Segmentation
    • [cs.CV]Learning monocular depth estimation with unsupervised trinocular assumptions
    • [cs.CV]Learning to Align Images using Weak Geometric Supervision
    • [cs.CV]Liquid Pouring Monitoring via Rich Sensory Inputs
    • [cs.CV]Metal Artifact Reduction in Cone-Beam X-Ray CT via Ray Profile Correction
    • [cs.CV]Multi-Scale Supervised Network for Human Pose Estimation
    • [cs.CV]Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining
    • [cs.CV]Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation
    • [cs.CV]Pixel-level Semantics Guided Image Colorization
    • [cs.CV]Purely Geometric Scene Association and Retrieval - A Case for Macro Scale 3D Geometry
    • [cs.CV]Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture
    • [cs.CV]Self-Attention Recurrent Network for Saliency Detection
    • [cs.CV]Simultaneous Edge Alignment and Learning
    • [cs.CV]Skin Lesion Diagnosis using Ensembles, Unscaled Multi-Crop Evaluation and Loss Weighting
    • [cs.CV]Spherical Harmonic Residual Network for Diffusion Signal Harmonization
    • [cs.CV]Structure-Aware Shape Synthesis
    • [cs.CV]T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks
    • [cs.CV]Teacher Guided Architecture Search
    • [cs.CV]Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets
    • [cs.CV]Towards Closing the Gap in Weakly Supervised Semantic Segmentation with DCNNs: Combining Local and Global Models
    • [cs.CV]Tracklet Association Tracker: An End-to-End Learning-based Association Approach for Multi-Object Tracking
    • [cs.CV]Traits & Transferability of Adversarial Examples against Instance Segmentation & Object Detection
    • [cs.CV]Video Re-localization
    • [cs.CV]Visual Question Generation for Class Acquisition of Unknown Objects
    • [cs.CY]On Robot Revolution and Taxation
    • [cs.CY]Predicting Learning Status in MOOCs using LSTM
    • [cs.CY]Where The Light Gets In: Analyzing Web Censorship Mechanisms in India
    • [cs.DC]Edge Based Data-Driven Pipelines (Technical Report)
    • [cs.DC]Rapido: A Layer2 Payment System for Decentralized Currencies
    • [cs.HC]Kid on The Phone! Toward Automatic Detection of Children on Mobile Devices
    • [cs.IR]Automated Extraction of Personal Knowledge from Smartphone Push Notifications
    • [cs.IR]Evaluating Wikipedia as a source of information for disease understanding
    • [cs.IT]A Blockchain Example for Cooperative Interference Management
    • [cs.IT]A Flip-Syndrome-List Polar Decoder Architecture for Ultra-Low-Latency Communications
    • [cs.IT]Designing molecular circuit for approximate maximum a posteriori demodulation of concentration modulated signals
    • [cs.IT]Energy-Age Tradeoff in Status Update Communication Systems with Retransmission
    • [cs.IT]Fundamentals of Simultaneous Wireless Information and Power Transmission in Heterogeneous Networks: A Cell Load Perspective
    • [cs.IT]GLSE Precoders for Massive MIMO Systems: Analysis and Applications
    • [cs.IT]Improper Signaling versus Time-Sharing in the Two-User Gaussian Interference Channel with TIN
    • [cs.IT]Linearly Precoded Rate Splitting: Optimality and Non-Optimality for MIMO Broadcast Channels
    • [cs.IT]Millimeter Wave Location-Based Beamforming using Compressive Sensing
    • [cs.IT]Model-Aided Wireless Artificial Intelligence: Embedding Expert Knowledge in Deep Neural Networks Towards Wireless Systems Optimization
    • [cs.IT]New Viewpoint and Algorithms for Water-Filling Solutions in Wireless Communications
    • [cs.IT]On Lipschitz Bounds of General Convolutional Neural Networks
    • [cs.IT]On the Duality and File Size Hierarchy of Fractional Repetition Codes
    • [cs.IT]On the Optimality of the Kautz-Singleton Construction in Probabilistic Group Testing
    • [cs.IT]Robust Secrecy Energy Efficient Beamforming in MISOME-SWIPT Systems With Proportional Fairness
    • [cs.IT]Scalability Analysis of a LoRa Network under Imperfect Orthogonality
    • [cs.IT]Stability and Throughput Analysis of Multiple Access Networks with Finite Blocklength Constraints
    • [cs.IT]Super Resolution Phase Retrieval for Sparse Signals
    • [cs.IT]Two Practical Random-Subcarrier-Selection Methods for Secure Precise Wireless Transmission
    • [cs.LG]A Review of Learning with Deep Generative Models from perspective of graphical modeling
    • [cs.LG]A Review on Image- and Network-based Brain Data Analysis Techniques for Alzheimer's Disease Diagnosis Reveals a Gap in Developing Predictive Methods for Prognosis
    • [cs.LG]A Survey on Deep Transfer Learning
    • [cs.LG]A Survey on Surrogate Approaches to Non-negative Matrix Factorization
    • [cs.LG]Adversarial Vision Challenge
    • [cs.LG]Autoencoder Based Sample Selection for Self-Taught Learning
    • [cs.LG]Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams
    • [cs.LG]Concentration bounds for empirical conditional value-at-risk: The unbounded case
    • [cs.LG]DELIMIT PyTorch - An extension for Deep Learning in Diffusion Imaging
    • [cs.LG]Deep Reinforcement One-Shot Learning for Artificially Intelligent Classification Systems
    • [cs.LG]Designing Adaptive Neural Networks for Energy-Constrained Image Classification
    • [cs.LG]Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN
    • [cs.LG]Global Convergence to the Equilibrium of GANs using Variational Inequalities
    • [cs.LG]Hashing with Binary Matrix Pursuit
    • [cs.LG]Hybrid Subspace Learning for High-Dimensional Data
    • [cs.LG]Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
    • [cs.LG]Learning disentangled representation from 12-lead electrograms: application in localizing the origin of Ventricular Tachycardia
    • [cs.LG]Missing Value Imputation Based on Deep Generative Models
    • [cs.LG]Multi-objective optimization to explicitly account for model complexity when learning Bayesian Networks
    • [cs.LG]NIMFA: A Python Library for Nonnegative Matrix Factorization
    • [cs.LG]Regret Bounds for Reinforcement Learning via Markov Chain Concentration
    • [cs.LG]Structured Adversarial Attack: Towards General Implementation and Better Interpretability
    • [cs.LG]Using Machine Learning Safely in Automotive Software: An Assessment and Adaption of Software Process Requirements in ISO 26262
    • [cs.LG]code2seq: Generating Sequences from Structured Representations of Code
    • [cs.NE]A Cooperative Group Optimization System
    • [cs.NE]Geared Rotationally Identical and Invariant Convolutional Neural Network Systems
    • [cs.NE]GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware
    • [cs.NE]On Optimizing Deep Convolutional Neural Networks by Evolutionary Computing
    • [cs.RO]Momentum-Based Topology Estimation of Articulated Objects
    • [cs.RO]Nonlinear disturbance attenuation control of hydraulic robotics
    • [cs.SD]Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data
    • [cs.SI]CredSaT: Credibility Ranking of Users in Big Social Data incorporating Semantic Analysis and Temporal Factor
    • [cs.SY]Bionic Reflex Control Strategy for Robotic Finger with Kinematic Constraints
    • [eess.AS]Triplet Network with Attention for Speaker Diarization
    • [eess.SP]Effective Resource Sharing in Mobile-Cell Environments
    • [eess.SP]Spatial Deep Learning for Wireless Scheduling
    • [math.PR]About the Stein equation for the generalized inverse Gaussian and Kummer distributions
    • [math.PR]Beyond the Central Limit Theorem: Universal and Non-universal Simulations of Random Variables by General Mappings
    • [math.ST]α-Ball divergence and its applications to change-point problems for Banach-valued sequences
    • [math.ST]Bounded Statistics
    • [math.ST]Dynamical multiple regression in function spaces, under kernel regressors, with ARH(1) errors
    • [math.ST]Nuisance Parameters Free Changepoint Detection in Non-stationary Series
    • [math.ST]Prediction in Riemannian metrics derived from divergence functions
    • [math.ST]Sampling-based randomized designs for causal inference under the potential outcomes framework
    • [math.ST]Statistical Windows in Testing for the Initial Distribution of a Reversible Markov Chain
    • [math.ST]Strongly consistent autoregressive predictors in abstract Banach spaces
    • [quant-ph]Amortized Channel Divergence for Asymptotic Quantum Channel Discrimination
    • [quant-ph]One-Shot Coherence Distillation: The Full Story
    • [stat.AP]An Extreme Value Analysis of the Urban Skyline
    • [stat.AP]Associating Growth in Infancy and Cognitive Performance in Early Childhood: A functional data analysis approach
    • [stat.AP]Computationally efficient model selection for joint spikes and waveforms decoding
    • [stat.AP]Spline Regression with Automatic Knot Selection
    • [stat.ME]A hierarchical independent component analysis model for longitudinal Neuroimaging studies
    • [stat.ME]Diffusion approximations and control variates for MCMC
    • [stat.ME]Improved Estimation of Average Treatment Effects on the Treated: Local Efficiency, Double Robustness, and Beyond
    • [stat.ME]Inverse Conditional Probability Weighting with Clustered Data in Causal Inference
    • [stat.ME]Regularized matrix data clustering and its application to image analysis
    • [stat.ML]Multi-Objective Cognitive Model: a supervised approach for multi-subject fMRI analysis
    • [stat.ML]V-FCNN: Volumetric Fully Convolution Neural Network For Automatic Atrial Segmentation

    ·····································

    • [cs.AI]An Efficient Approach to Learning Chinese Judgment Document Similarity Based on Knowledge Summarization
    Yinglong Ma, Peng Zhang, Jiangang Ma
    http://arxiv.org/abs/1808.01843v1

    A previous similar case in common law systems can be used as a reference with respect to the current case such that identical situations can be treated similarly in every case. However, current approaches for judgment document similarity computation failed to capture the core semantics of judgment documents and therefore suffer from lower accuracy and higher computation complexity. In this paper, a knowledge block summarization based machine learning approach is proposed to compute the semantic similarity of Chinese judgment documents. By utilizing domain ontologies for judgment documents, the core semantics of Chinese judgment documents is summarized based on knowledge blocks. Then the WMD algorithm is used to calculate the similarity between knowledge blocks. At last, the related experiments were made to illustrate that our approach is very effective and efficient in achieving higher accuracy and faster computation speed in comparison with the traditional approaches.

    • [cs.AI]An Efficient Deep Reinforcement Learning Model for Urban Traffic Control
    Yilun Lin, Xingyuan Dai, Li Li, Fei-Yue Wang
    http://arxiv.org/abs/1808.01876v1

    Urban Traffic Control (UTC) plays an essential role in Intelligent Transportation System (ITS) but remains difficult. Since model-based UTC methods may not accurately describe the complex nature of traffic dynamics in all situations, model-free data-driven UTC methods, especially reinforcement learning (RL) based UTC methods, received increasing interests in the last decade. However, existing DL approaches did not propose an efficient algorithm to solve the complicated multiple intersections control problems whose state-action spaces are vast. To solve this problem, we propose a Deep Reinforcement Learning (DRL) algorithm that combines several tricks to master an appropriate control strategy within an acceptable time. This new algorithm relaxes the fixed traffic demand pattern assumption and reduces human invention in parameter tuning. Simulation experiments have shown that our method outperforms traditional rule-based approaches and has the potential to handle more complex traffic problems in the real world.

    • [cs.AI]Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering
    Deepak Gupta, Sarah Kohail, Pushpak Bhattacharyya
    http://arxiv.org/abs/1808.01650v1

    Answer triggering is the task of selecting the best-suited answer for a given question from a set of candidate answers if exists. In this paper, we present a hybrid deep learning model for answer triggering, which combines several dependency graph based alignment features, namely graph edit distance, graph-based similarity and dependency graph coverage, with dense vector embeddings from a Convolutional Neural Network (CNN). Our experiments on the WikiQA dataset show that such a combination can more accurately trigger a candidate answer compared to the previous state-of-the-art models. Comparative study on WikiQA dataset shows 5.86% absolute F-score improvement at the question level.

    • [cs.AI]Error Detection in a Large-Scale Lexical Taxonomy
    Sifan Liu, Hongzhi Wang
    http://arxiv.org/abs/1808.01690v1

    Knowledge base (KB) is an important aspect in artificial intelligence. One significant challenge faced by KB construction is that it contains many noises, which prevents its effective usage. Even though some KB cleansing algorithms have been proposed, they focus on the structure of the knowledge graph and neglect the relation between the concepts, which could be helpful to discover wrong relations in KB. Motived by this, we measure the relation of two concepts by the distance between their corresponding instances and detect errors within the intersection of the conflicting concept sets. For efficient and effective knowledge base cleansing, we first apply a distance-based Model to determine the conflicting concept sets using two different methods. Then, we propose and analyze several algorithms on how to detect and repairing the errors based on our model, where we use hash method for an efficient way to calculate distance. Experimental results demonstrate that the proposed approaches could cleanse the knowledge bases efficiently and effectively.

    • [cs.AI]Logical Semantics and Commonsense Knowledge: Where Did we Go Wrong, and How to Go Forward, Again
    Walid S. Saba
    http://arxiv.org/abs/1808.01741v1

    We argue that logical semantics might have faltered due to its failure in distinguishing between two fundamentally very different types of concepts: ontological concepts, that should be types in a strongly-typed ontology, and logical concepts, that are predicates corresponding to properties of and relations between objects of various ontological types. We will then show that accounting for these differences amounts to the integration of lexical and compositional semantics in one coherent framework, and to an embedding in our logical semantics of a strongly-typed ontology that reflects our commonsense view of the world and the way we talk about it in ordinary language. We will show that in such a framework a number of challenges in natural language semantics can be adequately and systematically treated.

    • [cs.AI]Reasoning with Justifiable Exceptions in Contextual Hierarchies (Appendix)
    Loris Bozzato, Luciano Serafini, Thomas Eiter
    http://arxiv.org/abs/1808.01874v1

    This paper is an appendix to the paper "Reasoning with Justifiable Exceptions in Contextual Hierarchies" by Bozzato, Serafini and Eiter, 2018. It provides further details on the language, the complexity results and the datalog translation introduced in the main paper.

    • [cs.AI]Smart City Development with Urban Transfer Learning
    Leye Wang, Bin Guo, Qiang Yang
    http://arxiv.org/abs/1808.01552v1

    The rapid development of big data techniques has offered great opportunities to develop smart city services in public safety, transportation management, city planning, etc. Meanwhile, the smart city development levels of different cities are still unbalanced. For a large of number of cities which just start development, the governments will face a critical cold-start problem, 'how to develop a new smart city service suffering from data scarcity?'. To address this problem, transfer learning is recently leveraged to accelerate the smart city development, which we term the urban transfer learning paradigm. This article investigates the common process of urban transfer learning, aiming to provide city governors and relevant practitioners with guidelines of applying this novel learning paradigm. Our guidelines include common transfer strategies to take, general steps to follow, and case studies to refer. We also summarize a few future research opportunities in urban transfer learning, and expect this article can attract more researchers into this promising area.

    • [cs.CE]Stock Price Correlation Coefficient Prediction with ARIMA-LSTM Hybrid Model
    Hyeong Kyu Choi
    http://arxiv.org/abs/1808.01560v1

    Predicting the price correlation of two assets for future time periods is important in portfolio optimization. We apply LSTM recurrent neural networks (RNN) in predicting the stock price correlation coefficient of two individual stocks. RNNs are competent in understanding temporal dependencies. The use of LSTM cells further enhances its long term predictive properties. To encompass both linearity and nonlinearity in the model, we adopt the ARIMA model as well. The ARIMA model filters linear tendencies in the data and passes on the residual value to the LSTM model. The ARIMA LSTM hybrid model is tested against other traditional predictive financial models such as the full historical model, constant correlation model, single index model and the multi group model. In our empirical study, the predictive ability of the ARIMA-LSTM model turned out superior to all other financial models by a significant scale. Our work implies that it is worth considering the ARIMA LSTM model to forecast correlation coefficient for portfolio optimization.

    • [cs.CL]Abstractive Summarization Improved by WordNet-based Extractive Sentences
    Niantao Xie, Sujian Li, Huiling Ren, Qibin Zhai
    http://arxiv.org/abs/1808.01426v1

    Recently, the seq2seq abstractive summarization models have achieved good results on the CNN/Daily Mail dataset. Still, how to improve abstractive methods with extractive methods is a good research direction, since extractive methods have their potentials of exploiting various efficient features for extracting important sentences in one text. In this paper, in order to improve the semantic relevance of abstractive summaries, we adopt the WordNet based sentence ranking algorithm to extract the sentences which are most semantically to one text. Then, we design a dual attentional seq2seq framework to generate summaries with consideration of the extracted information. At the same time, we combine pointer-generator and coverage mechanisms to solve the problems of out-of-vocabulary (OOV) words and duplicate words which exist in the abstractive models. Experiments on the CNN/Daily Mail dataset show that our models achieve competitive performance with the state-of-the-art ROUGE scores. Human evaluations also show that the summaries generated by our models have high semantic relevance to the original text.

    • [cs.CL]Instantiation
    Abhijeet Gupta, Gemma Boleda, Sebastian Pado
    http://arxiv.org/abs/1808.01662v1

    In computational linguistics, a large body of work exists on distributed modeling of lexical relations, focussing largely on lexical relations such as hypernymy (scientist -- person) that hold between two categories, as expressed by common nouns. In contrast, computational linguistics has paid little attention to entities denoted by proper nouns (Marie Curie, Mumbai, ...). These have investigated in detail by the Knowledge Representation and Semantic Web communities, but generally not with regard to their linguistic properties. Our paper closes this gap by investigating and modeling the lexical relation of instantiation, which holds between an entity-denoting and a category-denoting expression (Marie Curie -- scientist or Mumbai -- city). We present a new, principled dataset for the task of instantiation detection as well as experiments and analyses on this dataset. We obtain the following results: (a), entities belonging to one category form a region in distributional space, but the embedding for the category word is typically located outside this subspace; (b) it is easy to learn to distinguish entities from categories from distributional evidence, but due to (a), instantiation proper is much harder to learn when using common nouns as representations of categories; (c) this problem can be alleviated by using category representations based on entity rather than category word embeddings.

    • [cs.CL]LISA: Explaining Recurrent Neural Network Judgments via Layer-wIse Semantic Accumulation and Example to Pattern Transformation
    Pankaj Gupta, Hinrich Schütze
    http://arxiv.org/abs/1808.01591v1

    Recurrent neural networks (RNNs) are temporal networks and cumulative in nature that have shown promising results in various natural language processing tasks. Despite their success, it still remains a challenge to understand their hidden behavior. In this work, we analyze and interpret the cumulative nature of RNN via a proposed technique named as Layer-wIse-Semantic-Accumulation (LISA) for explaining decisions and detecting the most likely (i.e., saliency) patterns that the network relies on while decision making. We demonstrate (1) LISA: "How an RNN accumulates or builds semantics during its sequential processing for a given text example and expected response" (2) Example2pattern: "How the saliency patterns look like for each category in the data according to the network in decision making". We analyse the sensitiveness of RNNs about different inputs to check the increase or decrease in prediction scores and further extract the saliency patterns learned by the network. We employ two relation classification datasets: SemEval 10 Task 8 and TAC KBP Slot Filling to explain RNN predictions via the LISA and example2pattern.

    • [cs.CL]Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
    Daisy Stanton, Yuxuan Wang, RJ Skerry-Ryan
    http://arxiv.org/abs/1808.01410v1

    Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a state-of-the-art end-to-end text-to-speech synthesis system, to uncover expressive factors of variation in speaking style. In this work, we introduce the Text-Predicted Global Style Token (TP-GST) architecture, which treats GST combination weights or style embeddings as "virtual" speaking style labels within Tacotron. TP-GST learns to predict stylistic renderings from text alone, requiring neither explicit labels during training nor auxiliary inputs for inference. We show that, when trained on a dataset of expressive speech, our system generates audio with more pitch and energy variation than two state-of-the-art baseline models. We further demonstrate that TP-GSTs can synthesize speech with background noise removed, and corroborate these analyses with positive results on human-rated listener preference audiobook tasks. Finally, we demonstrate that multi-speaker TP-GST models successfully factorize speaker identity and speaking style. We provide a website with audio samples for each of our findings.

    • [cs.CL]Residual Memory Networks: Feed-forward approach to learn long temporal dependencies
    Murali Karthick Baskar, Martin Karafiat, Lukas Burget, Karel Vesely, Frantisek Grezl, Jan Honza Cernocky
    http://arxiv.org/abs/1808.01916v1

    Training deep recurrent neural network (RNN) architectures is complicated due to the increased network complexity. This disrupts the learning of higher order abstracts using deep RNN. In case of feed-forward networks training deep structures is simple and faster while learning long-term temporal information is not possible. In this paper we propose a residual memory neural network (RMN) architecture to model short-time dependencies using deep feed-forward layers having residual and time delayed connections. The residual connection paves way to construct deeper networks by enabling unhindered flow of gradients and the time delay units capture temporal information with shared weights. The number of layers in RMN signifies both the hierarchical processing depth and temporal depth. The computational complexity in training RMN is significantly less when compared to deep recurrent networks. RMN is further extended as bi-directional RMN (BRMN) to capture both past and future information. Experimental analysis is done on AMI corpus to substantiate the capability of RMN in learning long-term information and hierarchical information. Recognition performance of RMN trained with 300 hours of Switchboard corpus is compared with various state-of-the-art LVCSR systems. The results indicate that RMN and BRMN gains 6 % and 3.8 % relative improvement over LSTM and BLSTM networks.

    • [cs.CL]Using Linguistic Cues for Analyzing Social Movements
    Rezvaneh Rezapour
    http://arxiv.org/abs/1808.01742v1

    With the growth of social media usage, social activists try to leverage this platform to raise the awareness related to a social issue and engage the public worldwide. The broad use of social media platforms in recent years, made it easier for the people to stay up-to-date on the news related to regional and worldwide events. While social media, namely Twitter, assists social movements to connect with more people and mobilize the movement, traditional media such as news articles help in spreading the news related to the events in a broader aspect. In this study, we analyze linguistic features and cues, such as individualism vs. pluralism, sentiment and emotion to examine the relationship between the medium and discourse over time. We conduct this work in a specific application context, the "Black Lives Matter" (BLM) movement, and compare discussions related to this event in social media vs. news articles.

    • [cs.CR]Active Learning for Wireless IoT Intrusion Detection
    Kai Yang, Jie Ren, Yanqiao Zhu, Weiyi Zhang
    http://arxiv.org/abs/1808.01412v1

    Internet of Things (IoT) is becoming truly ubiquitous in our everyday life, but it also faces unique security challenges. Intrusion detection is critical for the security and safety of a wireless IoT network. This paper discusses the human-in-the-loop active learning approach for wireless intrusion detection. We first present the fundamental challenges against the design of a successful Intrusion Detection System (IDS) for wireless IoT network. We then briefly review the rudimentary concepts of active learning and propose its employment in the diverse applications of wireless intrusion detection. Experimental example is also presented to show the significant performance improvement of the active learning method over traditional supervised learning approach. While machine learning techniques have been widely employed for intrusion detection, the application of human-in-the-loop machine learning that leverages both machine and human intelligence to intrusion detection of IoT is still in its infancy. We hope this article can assist the readers in understanding the key concepts of active learning and spur further research in this area.

    • [cs.CR]Am I Responsible for End-User's Security? A Programmer's Perspective
    Chamila Wijayarathna, Nalin Asanka Gamagedara Arachchilage
    http://arxiv.org/abs/1808.01481v1

    Previous research has pointed that software applications should not depend on programmers to provide security for end-users as majority of programmers are not experts of computer security. On the other hand, some studies have revealed that security experts believe programmers have a major role to play in ensuring the end-users' security. However, there has been no investigation on what programmers perceive about their responsibility for the end-users' security of applications they develop. In this work, by conducting a qualitative experimental study with 40 software developers, we attempted to understand the programmer's perception on who is responsible for ensuring end-users' security of the applications they develop. Results revealed majority of programmers perceive that they are responsible for the end-users' security of applications they develop. Furthermore, results showed that even though programmers aware of things they need to do to ensure end-users' security, they do not often follow them. We believe these results would change the current view on the role that different stakeholders of the software development process (i.e. researchers, security experts, programmers and Application Programming Interface (API) developers) have to play in order to ensure the security of software applications.

    • [cs.CR]Assessing and countering reaction attacks against post-quantum public-key cryptosystems based on QC-LDPC codes
    Paolo Santini, Marco Baldi, Franco Chiaraluce
    http://arxiv.org/abs/1808.01945v1

    Code-based public-key cryptosystems based on QC-LDPC and QC-MDPC codes are promising post-quantum candidates to replace quantum vulnerable classical alternatives. However, a new type of attacks based on Bob's reactions have recently been introduced and appear to significantly reduce the length of the life of any keypair used in these systems. In this paper we estimate the complexity of all known reaction attacks against QC-LDPC and QC-MDPC code-based variants of the McEliece cryptosystem. We also show how the structure of the secret key and, in particular, the secret code rate affect the complexity of these attacks. It follows from our results that QC-LDPC code-based systems can indeed withstand reaction attacks, on condition that some specific decoding algorithms are used and the secret code has a sufficiently high rate.

    • [cs.CR]Signal Jamming Attacks Against Communication-Based Train Control: Attack Impact and Countermeasure
    Subhash Lakshminarayana, Jabir Shabbir Karachiwala, Sang-Yoon Chang, Girish Revadigar, Sristi Lakshmi Sravana Kumar, David K. Y. Yau, Yih-Chun Hu
    http://arxiv.org/abs/1808.01723v1

    We study the impact of signal jamming attacks against the communication based train control (CBTC) systems and develop the countermeasures to limit the attacks' impact. CBTC supports the train operation automation and moving-block signaling, which improves the transport efficiency. We consider an attacker jamming the wireless communication between the trains or the train to wayside access point, which can disable CBTC and the corresponding benefits. In contrast to prior work studying jamming only at the physical or link layer, we study the real impact of such attacks on end users, namely train journey time and passenger congestion. Our analysis employs a detailed model of leaky medium-based communication system (leaky waveguide or leaky feeder/coaxial cable) popularly used in CBTC systems. To counteract the jamming attacks, we develop a mitigation approach based on frequency hopping spread spectrum taking into account domain-specific structure of the leaky-medium CBTC systems. Specifically, compared with existing implementations of FHSS, we apply FHSS not only between the transmitter-receiver pair but also at the track-side repeaters. To demonstrate the feasibility of implementing this technology in CBTC systems, we develop a FHSS repeater prototype using software-defined radios on both leaky-medium and open-air (free-wave) channels. We perform extensive simulations driven by realistic running profiles of trains and real-world passenger data to provide insights into the jamming attack's impact and the effectiveness of the proposed countermeasure.

    • [cs.CR]Understanding Software Developers' Approach towards Implementing Data Minimization
    Awanthika Senarath, Nalin Asanka Gamagedara Arachchilage
    http://arxiv.org/abs/1808.01479v1

    Data Minimization (DM) is a privacy practice that requires minimizing the use of user data in software systems. However, continuous privacy incidents that compromise user data suggest that the requirements of DM are not adequately implemented in software systems. Therefore, it is important that we understand the problems faced by software developers when they attempt to implement DM in software systems. In this study, we investigate how 24 software developers implement DM in a software system design when they are asked to. Our findings revealed that developers find it difficult to implement DM when they are not aware of the potential of data they could collect at the design phase of systems. Furthermore, developers were inconsistent in how they implemented DM in their software designs.

    • [cs.CV]3D Conceptual Design Using Deep Learning
    Zhangsihao Yang, Haoliang Jiang, Zou Lan
    http://arxiv.org/abs/1808.01675v1

    This article proposes a data-driven methodology to achieve a fast design support, in order to generate or develop novel designs covering multiple object categories. This methodology implements two state-of-the-art Variational Autoencoder dealing with 3D model data. Our methodology constructs a self-defined loss function. The loss function, containing the outputs of certain layers in the autoencoder, obtains combination of different latent features from different 3D model categories. Additionally, this article provide detail explanation to utilize the Princeton ModelNet40 database, a comprehensive clean collection of 3D CAD models for objects. After convert the original 3D mesh file to voxel and point cloud data type, we enable to feed our autoencoder with data of the same size of dimension. The novelty of this work is to leverage the power of deep learning methods as an efficient latent feature extractor to explore unknown designing areas. Through this project, we expect the output can show a clear and smooth interpretation of model from different categories to develop a fast design support to generate novel shapes. This final report will explore 1) the theoretical ideas, 2) the progresses to implement Variantional Autoencoder to attain implicit features from input shapes, 3) the results of output shapes during training in selected domains of both 3D voxel data and 3D point cloud data, and 4) our conclusion and future work to achieve the more outstanding goal.

    • [cs.CV]3D Depthwise Convolution: Reducing Model Parameters in 3D Vision Tasks
    Rongtian Ye, Fangyu Liu, Liqiang Zhang
    http://arxiv.org/abs/1808.01556v1

    Standard 3D convolution operations require much larger amounts of memory and computation cost than 2D convolution operations. The fact has hindered the development of deep neural nets in many 3D vision tasks. In this paper, we investigate the possibility of applying depthwise separable convolutions in 3D scenario and introduce the use of 3D depthwise convolution. A 3D depthwise convolution splits a single standard 3D convolution into two separate steps, which would drastically reduce the number of parameters in 3D convolutions with more than one order of magnitude. We experiment with 3D depthwise convolution on popular CNN architectures and also compare it with a similar structure called pseudo-3D convolution. The results demonstrate that, with 3D depthwise convolutions, 3D vision tasks like classification and reconstruction can be carried out with more light-weighted neural networks while still delivering comparable performances.

    • [cs.CV]A Multi-task Framework for Skin Lesion Detection and Segmentation
    Sulaiman Vesal, Shreyas Malakarjun Patil, Nishant Ravikumar, Andreas Maier
    http://arxiv.org/abs/1808.01676v1

    Early detection and segmentation of skin lesions is crucial for timely diagnosis and treatment, necessary to improve the survival rate of patients. However, manual delineation is time consuming and subject to intra- and inter-observer variations among dermatologists. This underlines the need for an accurate and automatic approach to skin lesion segmentation. To tackle this issue, we propose a multi-task convolutional neural network (CNN) based, joint detection and segmentation framework, designed to initially localize the lesion and subsequently, segment it. A Faster region-based convolutional neural network' (Faster-RCNN) which comprises a region proposal network (RPN), is used to generate bounding boxes/region proposals, for lesion localization in each image. The proposed regions are subsequently refined using a softmax classifier and a bounding-box regressor. The refined bounding boxes are finally cropped and segmented usingSkinNet', a modified version of U-Net. We trained and evaluated the performance of our network, using the ISBI 2017 challenge and the PH2 datasets, and compared it with the state-of-the-art, using the official test data released as part of the challenge for the former. Our approach outperformed others in terms of Dice coefficients (>0.93), Jaccard index (>0.88), accuracy (>0.96) and sensitivity (>0.95), across five-fold cross validation experiments.

    • [cs.CV]A Study of Deep Feature Fusion based Methods for Classifying Multi-lead ECG
    Bin Chen, Wei Guo, Bin Li, Rober K. F. Teng, Mingjun Dai, Jianping Luo, Hui Wang
    http://arxiv.org/abs/1808.01721v1

    An automatic classification method has been studied to effectively detect and recognize Electrocardiogram (ECG). Based on the synchronizing and orthogonal relationships of multiple leads, we propose a Multi-branch Convolution and Residual Network (MBCRNet) with three kinds of feature fusion methods for automatic detection of normal and abnormal ECG signals. Experiments are conducted on the Chinese Cardiovascular Disease Database (CCDD). Through 10-fold cross-validation, we achieve an average accuracy of 87.04% and a sensitivity of 89.93%, which outperforms previous methods under the same database. It is also shown that the multi-lead feature fusion network can improve the classification accuracy over the network only with the single lead features.

    • [cs.CV]Classification of Dermoscopy Images using Deep Learning
    Nithin D Reddy
    http://arxiv.org/abs/1808.01607v1

    Skin cancer is one of the most common forms of cancer and its incidence is projected to rise over the next decade. Artificial intelligence is a viable solution to the issue of providing quality care to patients in areas lacking access to trained dermatologists. Considerable progress has been made in the use of automated applications for accurate classification of skin lesions from digital images. In this manuscript, we discuss the design and implementation of a deep learning algorithm for classification of dermoscopy images from the HAM10000 Dataset. We trained a convolutional neural network based on the ResNet50 architecture to accurately classify dermoscopy images of skin lesions into one of seven disease categories. Using our custom model, we obtained a balanced accuracy of 91% on the validation dataset.

    • [cs.CV]Deep Learning Advances on Different 3D Data Representations: A Survey
    Eman Ahmed, Alexandre Saint, Abd El Rahman Shabayek, Kseniya Cherenkova, Rig Das, Gleb Gusev, Djamila Aouada, Bjorn Ottersten
    http://arxiv.org/abs/1808.01462v1

    3D data is a valuable asset in the field of computer vision as it provides rich information about the full geometry of sensed objects and scenes. With the recent availability of large 3D datasets and the increase in computational power, it is today possible to consider applying deep learning to learn specific tasks on 3D data such as segmentation, recognition and correspondence. Depending on the considered 3D data representation, different challenges may be foreseen in using existent deep learning architectures. In this paper, we provide a comprehensive overview of various 3D data representations highlighting the difference between Euclidean and non-Euclidean ones. We also discuss how deep learning methods are applied on each representation, analyzing the challenges to overcome.

    • [cs.CV]Deep Multi-Center Learning for Face Alignment
    Zhiwen Shao, Hengliang Zhu, Xin Tan, Yangyang Hao, Lizhuang Ma
    http://arxiv.org/abs/1808.01558v1

    Facial landmarks are highly correlated with each other since a certain landmark can be estimated by its neighboring landmarks. Most of the existing deep learning methods only use one fully-connected layer called shape prediction layer to estimate the location of facial landmarks. In this paper, we propose a novel deep learning framework named Multi-Center Learning with multiple shape prediction layers for face alignment. In particular, each shape prediction layer emphasizes on the detection of a certain cluster of semantically relevant landmarks respectively. Challenging landmarks are focused firstly, and each cluster of landmarks is further optimized respectively. Moreover, to reduce the model complexity, we propose a model assembling method to integrate multiple shape prediction layers into one shape prediction layer. Extensive experiments demonstrate that our method is effective for handling complex occlusions and appearance variations with real-time performance. The code for our method is available at https://github.com/ZhiwenShao/MCNet-Extension.

    • [cs.CV]Deep Shape Analysis on Abdominal Organs for Diabetes Prediction
    Benjamin Gutierrez-Becker, Sergios Gatidis, Daniel Gutmann, Annette Peters, Christopher Schlett Fabian Bamberg, Christian Wachinger
    http://arxiv.org/abs/1808.01946v1

    Morphological analysis of organs based on images is a key task in medical imaging computing. Several approaches have been proposed for the quantitative assessment of morphological changes, and they have been widely used for the analysis of the effects of aging, disease and other factors in organ morphology. In this work, we propose a deep neural network for predicting diabetes on abdominal shapes. The network directly operates on raw point clouds without requiring mesh processing or shape alignment. Instead of relying on hand-crafted shape descriptors, an optimal representation is learned in the end-to-end training stage of the network. For comparison, we extend the state-of-the-art shape descriptor BrainPrint to the AbdomenPrint. Our results demonstrate that the network learns shape representations that better separates healthy and diabetic individuals than traditional representations.

    • [cs.CV]Deep Transfer Learning for EEG-based Brain Computer Interface
    Chuanqi Tan, Fuchun Sun, Wenchang Zhang
    http://arxiv.org/abs/1808.01752v1

    The electroencephalography classifier is the most important component of brain-computer interface based systems. There are two major problems hindering the improvement of it. First, traditional methods do not fully exploit multimodal information. Second, large-scale annotated EEG datasets are almost impossible to acquire because biological data acquisition is challenging and quality annotation is costly. Herein, we propose a novel deep transfer learning approach to solve these two problems. First, we model cognitive events based on EEG data by characterizing the data using EEG optical flow, which is designed to preserve multimodal EEG information in a uniform representation. Second, we design a deep transfer learning framework which is suitable for transferring knowledge by joint training, which contains a adversarial network and a special loss function. The experiments demonstrate that our approach, when applied to EEG classification tasks, has many advantages, such as robustness and accuracy.

    • [cs.CV]DeepTAM: Deep Tracking and Mapping
    Huizhong Zhou, Benjamin Ummenhofer, Thomas Brox
    http://arxiv.org/abs/1808.01900v1

    We present a system for keyframe-based dense camera tracking and depth map estimation that is entirely learned. For tracking, we estimate small pose increments between the current camera image and a synthetic viewpoint. This significantly simplifies the learning problem and alleviates the dataset bias for camera motions. Further, we show that generating a large number of pose hypotheses leads to more accurate predictions. For mapping, we accumulate information in a cost volume centered at the current depth estimate. The mapping network then combines the cost volume and the keyframe image to update the depth prediction, thereby effectively making use of depth measurements and image-based priors. Our approach yields state-of-the-art results with few images and is robust with respect to noisy camera poses. We demonstrate that the performance of our 6 DOF tracking competes with RGB-D tracking algorithms. We compare favorably against strong classic and deep learning powered dense depth algorithms.

    • [cs.CV]Defense Against Adversarial Attacks with Saak Transform
    Sibo Song, Yueru Chen, Ngai-Man Cheung, C. -C. Jay Kuo
    http://arxiv.org/abs/1808.01785v1

    Deep neural networks (DNNs) are known to be vulnerable to adversarial perturbations, which imposes a serious threat to DNN-based decision systems. In this paper, we propose to apply the lossy Saak transform to adversarially perturbed images as a preprocessing tool to defend against adversarial attacks. Saak transform is a recently-proposed state-of-the-art for computing the spatial-spectral representations of input images. Empirically, we observe that outputs of the Saak transform are very discriminative in differentiating adversarial examples from clean ones. Therefore, we propose a Saak transform based preprocessing method with three steps: 1) transforming an input image to a joint spatial-spectral representation via the forward Saak transform, 2) apply filtering to its high-frequency components, and, 3) reconstructing the image via the inverse Saak transform. The processed image is found to be robust against adversarial perturbations. We conduct extensive experiments to investigate various settings of the Saak transform and filtering functions. Without harming the decision performance on clean images, our method outperforms state-of-the-art adversarial defense methods by a substantial margin on both the CIFAR-10 and ImageNet datasets. Importantly, our results suggest that adversarial perturbations can be effectively and efficiently defended using state-of-the-art frequency analysis.

    • [cs.CV]Detailed Dense Inference with Convolutional Neural Networks via Discrete Wavelet Transform
    Lingni Ma, Jörg Stückler, Tao Wu, Daniel Cremers
    http://arxiv.org/abs/1808.01834v1

    Dense pixelwise prediction such as semantic segmentation is an up-to-date challenge for deep convolutional neural networks (CNNs). Many state-of-the-art approaches either tackle the loss of high-resolution information due to pooling in the encoder stage, or use dilated convolutions or high-resolution lanes to maintain detailed feature maps and predictions. Motivated by the structural analogy between multi-resolution wavelet analysis and the pooling/unpooling layers of CNNs, we introduce discrete wavelet transform (DWT) into the CNN encoder-decoder architecture and propose WCNN. The high-frequency wavelet coefficients are computed at encoder, which are later used at the decoder to unpooled jointly with coarse-resolution feature maps through the inverse DWT. The DWT/iDWT is further used to develop two wavelet pyramids to capture the global context, where the multi-resolution DWT is applied to successively reduce the spatial resolution and increase the receptive field. Experiment with the Cityscape dataset, the proposed WCNNs are computationally efficient and yield improvements the accuracy for high-resolution dense pixelwise prediction.

    • [cs.CV]Dilated Convolutions in Neural Networks for Left Atrial Segmentation in 3D Gadolinium Enhanced-MRI
    Sulaiman Vesal, Nishant Ravikumar, Andreas Maier
    http://arxiv.org/abs/1808.01673v1

    Segmentation of the left atrial chamber and assessing its morphology, are essential for improving our understanding of atrial fibrillation, the most common type of cardiac arrhythmia. Automation of this process in 3D gadolinium enhanced-MRI (GE-MRI) data is desirable, as manual delineation is time-consuming, challenging and observer-dependent. Recently, deep convolutional neural networks (CNNs) have gained tremendous traction and achieved state-of-the-art results in medical image segmentation. However, it is difficult to incorporate local and global information without using contracting (pooling) layers, which in turn reduces segmentation accuracy for smaller structures. In this paper, we propose a 3D CNN for volumetric segmentation of the left atrial chamber in LGE-MRI. Our network is based on the well known U-Net architecture. We employ a 3D fully convolutional network, with dilated convolutions in the lowest level of the network, and residual connections between encoder blocks to incorporate local and global knowledge. The results show that including global context through the use of dilated convolutions, helps in domain adaptation, and the overall segmentation accuracy is improved in comparison to a 3D U-Net.

    • [cs.CV]Error Correction Maximization for Deep Image Hashing
    Xiang Xu, Xiaofang Wang, Kris M. Kitani
    http://arxiv.org/abs/1808.01942v1

    We propose to use the concept of the Hamming bound to derive the optimal criteria for learning hash codes with a deep network. In particular, when the number of binary hash codes (typically the number of image categories) and code length are known, it is possible to derive an upper bound on the minimum Hamming distance between the hash codes. This upper bound can then be used to define the loss function for learning hash codes. By encouraging the margin (minimum Hamming distance) between the hash codes of different image categories to match the upper bound, we are able to learn theoretically optimal hash codes. Our experiments show that our method significantly outperforms competing deep learning-based approaches and obtains top performance on benchmark datasets.

    • [cs.CV]Gray-box Adversarial Training
    Vivek B. S., Konda Reddy Mopuri, R. Venkatesh Babu
    http://arxiv.org/abs/1808.01753v1

    Adversarial samples are perturbed inputs crafted to mislead the machine learning systems. A training mechanism, called adversarial training, which presents adversarial samples along with clean samples has been introduced to learn robust models. In order to scale adversarial training for large datasets, these perturbations can only be crafted using fast and simple methods (e.g., gradient ascent). However, it is shown that adversarial training converges to a degenerate minimum, where the model appears to be robust by generating weaker adversaries. As a result, the models are vulnerable to simple black-box attacks. In this paper we, (i) demonstrate the shortcomings of existing evaluation policy, (ii) introduce novel variants of white-box and black-box attacks, dubbed gray-box adversarial attacks" based on which we propose novel evaluation method to assess the robustness of the learned models, and (iii) propose a novel variant of adversarial training, named Graybox Adversarial Training" that uses intermediate versions of the models to seed the adversaries. Experimental evaluation demonstrates that the models trained using our method exhibit better robustness compared to both undefended and adversarially trained model

    • [cs.CV]Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
    Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Zejian Yuan, Xiaogang Wang
    http://arxiv.org/abs/1808.01571v1

    Person re-identification is an important task that requires learning discriminative visual features for distinguishing different person identities. Diverse auxiliary information has been utilized to improve the visual feature learning. In this paper, we propose to exploit natural language description as additional training supervisions for effective visual features. Compared with other auxiliary information, language can describe a specific person from more compact and semantic visual aspects, thus is complementary to the pixel-level image data. Our method not only learns better global visual feature with the supervision of the overall description but also enforces semantic consistencies between local visual and linguistic features, which is achieved by building global and local image-language associations. The global image-language association is established according to the identity labels, while the local association is based upon the implicit correspondences between image regions and noun phrases. Extensive experiments demonstrate the effectiveness of employing language as training supervisions with the two association schemes. Our method achieves state-of-the-art performance without utilizing any auxiliary information during testing and shows better performance than other joint embedding methods for the image-language association.

    • [cs.CV]Improving Temporal Interpolation of Head and Body Pose using Gaussian Process Regression in a Matrix Completion Setting
    Stephanie Tan, Hayley Hung
    http://arxiv.org/abs/1808.01837v1

    This paper presents a model for head and body pose estimation (HBPE) when labelled samples are highly sparse. The current state-of-the-art multimodal approach to HBPE utilizes the matrix completion method in a transductive setting to predict pose labels for unobserved samples. Based on this approach, the proposed method tackles HBPE when manually annotated ground truth labels are temporally sparse. We posit that the current state of the art approach oversimplifies the temporal sparsity assumption by using Laplacian smoothing. Our final solution uses: i) Gaussian process regression in place of Laplacian smoothing, ii) head and body coupling, and iii) nuclear norm minimization in the matrix completion setting. The model is applied to the challenging SALSA dataset for benchmark against the state-of-the-art method. Our presented formulation outperforms the state-of-the-art significantly in this particular setting, e.g. at 5% ground truth labels as training data, head pose accuracy and body pose accuracy is approximately 62% and 70%, respectively. As well as fitting a more flexible model to missing labels in time, we posit that our approach also loosens the head and body coupling constraint, allowing for a more expressive model of the head and body pose typically seen during conversational interaction in groups. This provides a new baseline to improve upon for future integration of multimodal sensor data for the purpose of HBPE.

    • [cs.CV]Incorporating Scalability in Unsupervised Spatio-Temporal Feature Learning
    Sujoy Paul, Sourya Roy, Amit K. Roy-Chowdhury
    http://arxiv.org/abs/1808.01727v1

    Deep neural networks are efficient learning machines which leverage upon a large amount of manually labeled data for learning discriminative features. However, acquiring substantial amount of supervised data, especially for videos can be a tedious job across various computer vision tasks. This necessitates learning of visual features from videos in an unsupervised setting. In this paper, we propose a computationally simple, yet effective, framework to learn spatio-temporal feature embedding from unlabeled videos. We train a Convolutional 3D Siamese network using positive and negative pairs mined from videos under certain probabilistic assumptions. Experimental results on three datasets demonstrate that our proposed framework is able to learn weights which can be used for same as well as cross dataset and tasks.

    • [cs.CV]Is Robustness the Cost of Accuracy? -- A Comprehensive Study on the Robustness of 18 Deep Image Classification Models
    Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, Yupeng Gao
    http://arxiv.org/abs/1808.01688v1

    The prediction accuracy has been the long-lasting and sole standard for comparing the performance of different image classification models, including the ImageNet competition. However, recent studies have highlighted the lack of robustness in well-trained deep neural networks to adversarial examples. Visually imperceptible perturbations to natural images can easily be crafted and mislead the image classifiers towards misclassification. To demystify the trade-offs between robustness and accuracy, in this paper we thoroughly benchmark 18 ImageNet models using multiple robustness metrics, including the distortion, success rate and transferability of adversarial examples between 306 pairs of models. Our extensive experimental results reveal several new insights: (1) linear scaling law - the empirical \ell_2 and \ell_\infty distortion metrics scale linearly with the logarithm of classification error; (2) model architecture is a more critical factor to robustness than model size, and the disclosed accuracy-robustness Pareto frontier can be used as an evaluation criterion for ImageNet model designers; (3) for a similar network architecture, increasing network depth slightly improves robustness in \ell_\infty distortion; (4) there exist models (in VGG family) that exhibit high adversarial transferability, while most adversarial examples crafted from one model can only be transferred within the same family. Experiment code is publicly available at \url{https://github.com/huanzhang12/Adversarial_Survey}.

    • [cs.CV]Language Model Supervision for Handwriting Recognition Model Adaptation
    Chris Tensmeyer, Curtis Wigington, Brian Davis, Seth Stewart, Tony Martinez, William Barrett
    http://arxiv.org/abs/1808.01423v1

    Training state-of-the-art offline handwriting recognition (HWR) models requires large labeled datasets, but unfortunately such datasets are not available in all languages and domains due to the high cost of manual labeling.We address this problem by showing how high resource languages can be leveraged to help train models for low resource languages.We propose a transfer learning methodology where we adapt HWR models trained on a source language to a target language that uses the same writing script.This methodology only requires labeled data in the source language, unlabeled data in the target language, and a language model of the target language. The language model is used in a bootstrapping fashion to refine predictions in the target language for use as ground truth in training the model.Using this approach we demonstrate improved transferability among French, English, and Spanish languages using both historical and modern handwriting datasets. In the best case, transferring with the proposed methodology results in character error rates nearly as good as full supervised training.

    • [cs.CV]Learning Multi-scale Features for Foreground Segmentation
    Long Ang Lim, Hacer Yalim Keles
    http://arxiv.org/abs/1808.01477v1

    Foreground segmentation algorithms aim segmenting moving objects from the background in a robust way under various challenging scenarios. Encoder-decoder type deep neural networks that are used in this domain recently perform impressive segmentation results. In this work, we propose a novel robust encoder-decoder structure neural network that can be trained end-to-end using only a few training examples. The proposed method extends the Feature Pooling Module (FPM) of FgSegNet by introducing features fusions inside this module, which is capable of extracting multi-scale features within images; resulting in a robust feature pooling against camera motion, which can alleviate the need of multi-scale inputs to the network. Our method outperforms all existing state-of-the-art methods in CDnet2014 dataset by an average overall F-Measure of 0.9847. We also evaluate the effectiveness of our method on SBI2015 and UCSD Background Subtraction datasets. The source code of the proposed method is made available at https://github.com/lim-anggun/FgSegNet_v2 .

    • [cs.CV]Learning monocular depth estimation with unsupervised trinocular assumptions
    Matteo Poggi, Fabio Tosi, Stefano Mattoccia
    http://arxiv.org/abs/1808.01606v1

    Obtaining accurate depth measurements out of a single image represents a fascinating solution to 3D sensing. CNNs led to considerable improvements in this field, and recent trends replaced the need for ground-truth labels with geometry-guided image reconstruction signals enabling unsupervised training. Currently, for this purpose, state-of-the-art techniques rely on images acquired with a binocular stereo rig to predict inverse depth (i.e., disparity) according to the aforementioned supervision principle. However, these methods suffer from well-known problems near occlusions, left image border, etc inherited from the stereo setup. Therefore, in this paper, we tackle these issues by moving to a trinocular domain for training. Assuming the central image as the reference, we train a CNN to infer disparity representations pairing such image with frames on its left and right side. This strategy allows obtaining depth maps not affected by typical stereo artifacts. Moreover, being trinocular datasets seldom available, we introduce a novel interleaved training procedure enabling to enforce the trinocular assumption outlined from current binocular datasets. Exhaustive experimental results on the KITTI dataset confirm that our proposal outperforms state-of-the-art methods for unsupervised monocular depth estimation trained on binocular stereo pairs as well as any known methods relying on other cues.

    • [cs.CV]Learning to Align Images using Weak Geometric Supervision
    Jing Dong, Byron Boots, Frank Dellaert, Ranveer Chandra, Sudipta N. Sinha
    http://arxiv.org/abs/1808.01424v1

    Image alignment tasks require accurate pixel correspondences, which are usually recovered by matching local feature descriptors. Such descriptors are often derived using supervised learning on existing datasets with ground truth correspondences. However, the cost of creating such datasets is usually prohibitive. In this paper, we propose a new approach to align two images related by an unknown 2D homography where the local descriptor is learned from scratch from the images and the homography is estimated simultaneously. Our key insight is that a siamese convolutional neural network can be trained jointly while iteratively updating the homography parameters by optimizing a single loss function. Our method is currently weakly supervised because the input images need to be roughly aligned. We have used this method to align images of different modalities such as RGB and near-infra-red (NIR) without using any prior labeled data. Images automatically aligned by our method were then used to train descriptors that generalize to new images. We also evaluated our method on RGB images. On the HPatches benchmark, our method achieves comparable accuracy to deep local descriptors that were trained offline in a supervised setting.

    • [cs.CV]Liquid Pouring Monitoring via Rich Sensory Inputs
    Tz-Ying Wu, Juan-Ting Lin, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun
    http://arxiv.org/abs/1808.01725v1

    Humans have the amazing ability to perform very subtle manipulation task using a closed-loop control system with imprecise mechanics (i.e., our body parts) but rich sensory information (e.g., vision, tactile, etc.). In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied. In this work, we take liquid pouring as a concrete example and aim at learning to continuously monitor whether liquid pouring is successful (e.g., no spilling) or not via rich sensory inputs. We mimic humans' rich sensories using synchronized observation from a chest-mounted camera and a wrist-mounted IMU sensor. Given many success and failure demonstrations of liquid pouring, we train a hierarchical LSTM with late fusion for monitoring. To improve the robustness of the system, we propose two auxiliary tasks during training: inferring (1) the initial state of containers and (2) forecasting the one-step future 3D trajectory of the hand with an adversarial training procedure. These tasks encourage our method to learn representation sensitive to container states and how objects are manipulated in 3D. With these novel components, our method achieves ~8% and ~11% better monitoring accuracy than the baseline method without auxiliary tasks on unseen containers and unseen users respectively.

    • [cs.CV]Metal Artifact Reduction in Cone-Beam X-Ray CT via Ray Profile Correction
    Sungsoo Ha, Klaus Mueller
    http://arxiv.org/abs/1808.01853v1

    In computed tomography (CT), metal implants increase the inconsistencies between the measured data and the linear attenuation assumption made by analytic CT reconstruction algorithms. The inconsistencies give rise to dark and bright bands and streaks in the reconstructed image, collectively called metal artifacts. These artifacts make it difficult for radiologists to render correct diagnostic decisions. We describe a data-driven metal artifact reduction (MAR) algorithm for image-guided spine surgery that applies to scenarios in which a prior CT scan of the patient is available. We tested the proposed method with two clinical datasets that were both obtained during spine surgery. Using the proposed method, we were not only able to remove the dark and bright streaks caused by the implanted screws but we also recovered the anatomical structures hidden by these artifacts. This results in an improved capability of surgeons to confirm the correctness of the implanted pedicle screw placements.

    • [cs.CV]Multi-Scale Supervised Network for Human Pose Estimation
    Lipeng Ke, Ming-Ching Chang, Honggang Qi, Siwei Lyu
    http://arxiv.org/abs/1808.01623v1

    Human pose estimation is an important topic in computer vision with many applications including gesture and activity recognition. However, pose estimation from image is challenging due to appearance variations, occlusions, clutter background, and complex activities. To alleviate these problems, we develop a robust pose estimation method based on the recent deep conv-deconv modules with two improvements: (1) multi-scale supervision of body keypoints, and (2) a global regression to improve structural consistency of keypoints. We refine keypoint detection heatmaps using layer-wise multi-scale supervision to better capture local contexts. Pose inference via keypoint association is optimized globally using a regression network at the end. Our method can effectively disambiguate keypoint matches in close proximity including the mismatch of left-right body parts, and better infer occluded parts. Experimental results show that our method achieves competitive performance among state-of-the-art methods on the MPII and FLIC datasets.

    • [cs.CV]Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining
    Guanbin Li, Xiang He, Wei Zhang, Huiyou Chang, Le Dong, Liang Lin
    http://arxiv.org/abs/1808.01491v1

    Single image rain streaks removal has recently witnessed substantial progress due to the development of deep convolutional neural networks. However, existing deep learning based methods either focus on the entrance and exit of the network by decomposing the input image into high and low frequency information and employing residual learning to reduce the mapping range, or focus on the introduction of cascaded learning scheme to decompose the task of rain streaks removal into multi-stages. These methods treat the convolutional neural network as an encapsulated end-to-end mapping module without deepening into the rationality and superiority of neural network design. In this paper, we delve into an effective end-to-end neural network structure for stronger feature expression and spatial correlation learning. Specifically, we propose a non-locally enhanced encoder-decoder network framework, which consists of a pooling indices embedded encoder-decoder network to efficiently learn increasingly abstract feature representation for more accurate rain streaks modeling while perfectly preserving the image detail. The proposed encoder-decoder framework is composed of a series of non-locally enhanced dense blocks that are designed to not only fully exploit hierarchical features from all the convolutional layers but also well capture the long-distance dependencies and structural information. Extensive experiments on synthetic and real datasets demonstrate that the proposed method can effectively remove rain-streaks on rainy image of various densities while well preserving the image details, which achieves significant improvements over the recent state-of-the-art methods.

    • [cs.CV]Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation
    Eddy Ilg, Tonmoy Saikia, Margret Keuper, Thomas Brox
    http://arxiv.org/abs/1808.01838v1

    Occlusions play an important role in disparity and optical flow estimation, since matching costs are not available in occluded areas and occlusions indicate depth or motion boundaries. Moreover, occlusions are relevant for motion segmentation and scene flow estimation. In this paper, we present an efficient learning-based approach to estimate occlusion areas jointly with disparities or optical flow. The estimated occlusions and motion boundaries clearly improve over the state-of-the-art. Moreover, we present networks with state-of-the-art performance on the popular KITTI benchmark and good generic performance. Making use of the estimated occlusions, we also show improved results on motion segmentation and scene flow estimation.

    • [cs.CV]Pixel-level Semantics Guided Image Colorization
    Jiaojiao Zhao, Li Liu, Cees G. M. Snoek, Jungong Han, Ling Shao
    http://arxiv.org/abs/1808.01597v1

    While many image colorization algorithms have recently shown the capability of producing plausible color versions from gray-scale photographs, they still suffer from the problems of context confusion and edge color bleeding. To address context confusion, we propose to incorporate the pixel-level object semantics to guide the image colorization. The rationale is that human beings perceive and distinguish colors based on the object's semantic categories. We propose a hierarchical neural network with two branches. One branch learns what the object is while the other branch learns the object's colors. The network jointly optimizes a semantic segmentation loss and a colorization loss. To attack edge color bleeding we generate more continuous color maps with sharp edges by adopting a joint bilateral upsamping layer at inference. Our network is trained on PASCAL VOC2012 and COCO-stuff with semantic segmentation labels and it produces more realistic and finer results compared to the colorization state-of-the-art.

    • [cs.CV]Purely Geometric Scene Association and Retrieval - A Case for Macro Scale 3D Geometry
    Rahul Sawhney, Fuxin Li, Henrik I. Christensen, Charles L. Isbell
    http://arxiv.org/abs/1808.01343v1

    We address the problems of measuring geometric similarity between 3D scenes, represented through point clouds or range data frames, and associating them. Our approach leverages macro-scale 3D structural geometry - the relative configuration of arbitrary surfaces and relationships among structures that are potentially far apart. We express such discriminative information in a viewpoint-invariant feature space. These are subsequently encoded in a frame-level signature that can be utilized to measure geometric similarity. Such a characterization is robust to noise, incomplete and partially overlapping data besides viewpoint changes. We show how it can be employed to select a diverse set of data frames which have structurally similar content, and how to validate whether views with similar geometric content are from the same scene. The problem is formulated as one of general purpose retrieval from an unannotated, spatio-temporally unordered database. Empirical analysis indicates that the presented approach thoroughly outperforms baselines on depth / range data. Its depth-only performance is competitive with state-of-the-art approaches with RGB or RGB-D inputs, including ones based on deep learning. Experiments show retrieval performance to hold up well with much sparser databases, which is indicative of the approach's robustness. The approach generalized well - it did not require dataset specific training, and scaled up in our experiments. Finally, we also demonstrate how geometrically diverse selection of views can result in richer 3D reconstructions.

    • [cs.CV]Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture
    Denis Tome, Matteo Toso, Lourdes Agapito, Chris Russell
    http://arxiv.org/abs/1808.01525v1

    We propose a CNN-based approach for multi-camera markerless motion capture of the human body. Unlike existing methods that first perform pose estimation on individual cameras and generate 3D models as post-processing, our approach makes use of 3D reasoning throughout a multi-stage approach. This novelty allows us to use provisional 3D models of human pose to rethink where the joints should be located in the image and to recover from past mistakes. Our principled refinement of 3D human poses lets us make use of image cues, even from images where we previously misdetected joints, to refine our estimates as part of an end-to-end approach. Finally, we demonstrate how the high-quality output of our multi-camera setup can be used as an additional training source to improve the accuracy of existing single camera models.

    • [cs.CV]Self-Attention Recurrent Network for Saliency Detection
    Fengdong Sun, Wenhui Li, Yuanyuan Guan
    http://arxiv.org/abs/1808.01634v1

    Feature maps in deep neural network generally contain different semantics. Existing methods often omit their characteristics that may lead to sub-optimal results. In this paper, we propose a novel end-to-end deep saliency network which could effectively utilize multi-scale feature maps according to their characteristics. Shallow layers often contain more local information, and deep layers have advantages in global semantics. Therefore, the network generates elaborate saliency maps by enhancing local and global information of feature maps in different layers. On one hand, local information of shallow layers is enhanced by a recurrent structure which shared convolution kernel at different time steps. On the other hand, global information of deep layers is utilized by a self-attention module, which generates different attention weights for salient objects and backgrounds thus achieve better performance. Experimental results on four widely used datasets demonstrate that our method has advantages in performance over existing algorithms.

    • [cs.CV]Simultaneous Edge Alignment and Learning
    Zhiding Yu, Weiyang Liu, Yang Zou, Chen Feng, Srikumar Ramalingam, B. V. K. Vijaya Kumar, Jan Kautz
    http://arxiv.org/abs/1808.01992v1

    Edge detection is among the most fundamental vision problems for its role in perceptual grouping and its wide applications. Recent advances in representation learning have led to considerable improvements in this area. Many state of the art edge detection models are learned with fully convolutional networks (FCNs). However, FCN-based edge learning tends to be vulnerable to misaligned labels due to the delicate structure of edges. While such problem was considered in evaluation benchmarks, similar issue has not been explicitly addressed in general edge learning. In this paper, we show that label misalignment can cause considerably degraded edge learning quality, and address this issue by proposing a simultaneous edge alignment and learning framework. To this end, we formulate a probabilistic model where edge alignment is treated as latent variable optimization, and is learned end-to-end during network training. Experiments show several applications of this work, including improved edge detection with state of the art performance, and automatic refinement of noisy annotations.

    • [cs.CV]Skin Lesion Diagnosis using Ensembles, Unscaled Multi-Crop Evaluation and Loss Weighting
    Nils Gessert, Thilo Sentker, Frederic Madesta, Rüdiger Schmitz, Helge Kniep, Ivo Baltruschat, René Werner, Alexander Schlaefer
    http://arxiv.org/abs/1808.01694v1

    In this paper we present the methods of our submission to the ISIC 2018 challenge for skin lesion diagnosis (Task 3). The dataset consists of 10000 images with seven image-level classes to be distinguished by an automated algorithm. We employ an ensemble of convolutional neural networks for this task. In particular, we fine-tune pretrained state-of-the-art deep learning models such as Densenet, SENet and ResNeXt. We identify heavy class imbalance as a key problem for this challenge and consider multiple balancing approaches such as loss weighting and balanced batch sampling. Another important feature of our pipeline is the use of a vast amount of unscaled crops for evaluation. Last, we consider meta learning approaches for the final predictions. Our team placed second at the challenge while being the best approach using only publicly available data.

    • [cs.CV]Spherical Harmonic Residual Network for Diffusion Signal Harmonization
    Simon Koppers, Luke Bloy, Jeffrey I. Berman, Chantal M. W. Tax, J. Christopher Edgar, Dorit Merhof
    http://arxiv.org/abs/1808.01595v1

    Diffusion imaging is an important method in the field of neuroscience, as it is sensitive to changes within the tissue microstructure of the human brain. However, a major challenge when using MRI to derive quantitative measures is that the use of different scanners, as used in multi-site group studies, introduces measurement variability. This can lead to an increased variance in quantitative metrics, even if the same brain is scanned. Contrary to the assumption that these characteristics are comparable and similar, small changes in these values are observed in many clinical studies, hence harmonization of the signals is essential. In this paper, we present a method that does not require additional preprocessing, such as segmentation or registration, and harmonizes the signal based on a deep learning residual network. For this purpose, a training database is required, which consist of the same subjects, scanned on different scanners. The results show that harmonized signals are significantly more similar to the ground truth signal compared to no harmonization, but also improve in comparison to another deep learning method. The same effect is also demonstrated in commonly used metrics derived from the diffusion MRI signal.

    • [cs.CV]Structure-Aware Shape Synthesis
    Elena Balashova, Vivek Singh, Jiangping Wang, Brian Teixeira, Terrence Chen, Thomas Funkhouser
    http://arxiv.org/abs/1808.01427v1

    We propose a new procedure to guide training of a data-driven shape generative model using a structure-aware loss function. Complex 3D shapes often can be summarized using a coarsely defined structure which is consistent and robust across variety of observations. However, existing synthesis techniques do not account for structure during training, and thus often generate implausible and structurally unrealistic shapes. During training, we enforce structural constraints in order to enforce consistency and structure across the entire manifold. We propose a novel methodology for training 3D generative models that incorporates structural information into an end-to-end training pipeline.

    • [cs.CV]T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks
    Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
    http://arxiv.org/abs/1808.01454v1

    Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire. We propose a framework, trained on synthetic image-depth pairs and unpaired real images, that comprises an image translation network for enhancing realism of input images, followed by a depth prediction network. A key idea is having the first network act as a wide-spectrum input translator, taking in either synthetic or real images, and ideally producing minimally modified realistic images. This is done via a reconstruction loss when the training input is real, and GAN loss when synthetic, removing the need for heuristic self-regularization. The second network is trained on a task loss for synthetic image-depth pairs, with extra GAN loss to unify real and synthetic feature distributions. Importantly, the framework can be trained end-to-end, leading to good results, even surpassing early deep-learning methods that use real paired data.

    • [cs.CV]Teacher Guided Architecture Search
    Pouya Bashivan, Mark Tensen, James J DiCarlo
    http://arxiv.org/abs/1808.01405v1

    Strong improvements in network performance in vision tasks have resulted from the search of alternative network architectures, and prior work has shown that this search process can be automated and guided by evaluating candidate network performance following limited training (Performance Guided Architecture Search or PGAS). However, because of the large architecture search spaces and the high computational cost associated with evaluating each candidate model, further gains in computational efficiency are needed. Here we present a method termed Teacher Guided Search for Architectures by Generation and Evaluation (TG-SAGE) that produces up to an order of magnitude in search efficiency over PGAS methods. Specifically, TG-SAGE guides each step of the architecture search by evaluating the similarity of internal representations of the candidate networks with those of the (fixed) teacher network. We show that this procedure leads to significant reduction in required per-sample training and that, this advantage holds for two different search spaces of architectures, and two different search algorithms. We further show that in the space of convolutional cells for visual categorization, TG-SAGE finds a cell structure with similar performance as was previously found using other methods but at a total computational cost that is two orders of magnitude lower than Neural Architecture Search (NAS) and more than four times lower than progressive neural architecture search (PNAS). These results suggest that TG-SAGE can be used to accelerate network architecture search in cases where one has access to some or all of the internal representations of a teacher network of interest, such as the brain.

    • [cs.CV]Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets
    Henry Kvinge, Elin Farnell, Michael Kirby, Chris Peterson
    http://arxiv.org/abs/1808.01686v1

    A fundamental question in many data analysis settings is the problem of discerning the "natural" dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of Secant-Avoidance Projection (SAP). Intuitively, the SAP algorithm seeks to determine a projection which best preserves the lengths of all secants between points in a data set; by applying the algorithm to find the best projections to vector spaces of various dimensions, one may infer the dimension of the manifold of origination. That is, one may learn the dimension at which it is possible to construct a diffeomorphic copy of the data in a lower-dimensional Euclidean space. Using Whitney's embedding theorem, we can relate this information to the natural dimension of the data. A drawback of the SAP algorithm is that a data set with T points has O(T^2) secants, making the computation and storage of all secants infeasible for very large data sets. In this paper, we propose a novel algorithm that generalizes the SAP algorithm with an emphasis on addressing this issue. That is, we propose a hierarchical secant-based dimensionality-reduction method, which can be employed for data sets where explicitly calculating all secants is not feasible.

    • [cs.CV]Towards Closing the Gap in Weakly Supervised Semantic Segmentation with DCNNs: Combining Local and Global Models
    Christoph Mayer, Radu Timofte, Grégory Paul
    http://arxiv.org/abs/1808.01625v1

    Generating training sets for deep convolutional neural networks is a bottleneck for modern real-world applications. This is a demanding tasks for applications where annotating training data is costly, such as in semantic segmentation. In the literature, there is still a gap between the performance achieved by a network trained on full and on weak annotations. In this paper, we establish a strategy to measure this gap and to identify the ingredients necessary to close it. On scribbles, we establish state-of-the-art results comparable to the latest published ones (Tang et al., 2018, arXiv:1804.01346): we obtain a gap in mIoU of 2.4% without CRF (2.8% in Tang et al., 2018, arXiv:1804.01346), and 2.9% with CRF post-processing (2.3% in Tang et al., 2018, arXiv:1804.01346). However, we use completely different ideas: combining local and global annotator models and regularising their prediction to train DeepLabV2. Finally, closing the gap was reported only recently for bounding boxes in Khoreva et al. (arXiv:1603.07485v2), by requiring 10x more training images. By simulating varying amounts of pixel-level annotations respecting scribble human annotations statistics, we show that our training strategy reacts to small increases in the amount of annotations and requires only 2-5x more annotated pixels, closing the gap with only 3.1% of all pixels annotated. This work contributes new ideas towards closing the gap in real-world applications.

    • [cs.CV]Tracklet Association Tracker: An End-to-End Learning-based Association Approach for Multi-Object Tracking
    Han Shen, Lichao Huang, Chang Huang, Wei Xu
    http://arxiv.org/abs/1808.01562v1

    Traditional multiple object tracking methods divide the task into two parts: affinity learning and data association. The separation of the task requires to define a hand-crafted training goal in affinity learning stage and a hand-crafted cost function of data association stage, which prevents the tracking goals from learning directly from the feature. In this paper, we present a new multiple object tracking (MOT) framework with data-driven association method, named as Tracklet Association Tracker (TAT). The framework aims at gluing feature learning and data association into a unity by a bi-level optimization formulation so that the association results can be directly learned from features. To boost the performance, we also adopt the popular hierarchical association and perform the necessary alignment and selection of raw detection responses. Our model trains over 20X faster than a similar approach, and achieves the state-of-the-art performance on both MOT2016 and MOT2017 benchmarks.

    • [cs.CV]Traits & Transferability of Adversarial Examples against Instance Segmentation & Object Detection
    Raghav Gurbaxani, Shivank Mishra
    http://arxiv.org/abs/1808.01452v1

    Despite the recent advancements in deploying neural networks for image classification, it has been found that adversarial examples are able to fool these models leading them to misclassify the images. Since these models are now being widely deployed, we provide an insight on the threat of these adversarial examples by evaluating their characteristics and transferability to more complex models that utilize Image Classification as a subtask. We demonstrate the ineffectiveness of adversarial examples when applied to Instance Segmentation & Object Detection models. We show that this ineffectiveness arises from the inability of adversarial examples to withstand transformations such as scaling or a change in lighting conditions. Moreover, we show that there exists a small threshold below which the adversarial property is retained while applying these input transformations. Additionally, these attacks demonstrate weak cross-network transferability across neural network architectures, e.g. VGG16 and ResNet50, however, the attack may fool both the networks if passed sequentially through networks during its formation. The lack of scalability and transferability challenges the question of how adversarial images would be effective in the real world.

    • [cs.CV]Video Re-localization
    Yang Feng, Lin Ma, Wei Liu, Tong Zhang, Jiebo Luo
    http://arxiv.org/abs/1808.01575v1

    Many methods have been developed to help people find the video contents they want efficiently. However, there are still some unsolved problems in this area. For example, given a query video and a reference video, how to accurately localize a segment in the reference video such that the segment semantically corresponds to the query video? We define a distinctively new task, namely \textbf{video re-localization}, to address this scenario. Video re-localization is an important emerging technology implicating many applications, such as fast seeking in videos, video copy detection, video surveillance, etc. Meanwhile, it is also a challenging research task because the visual appearance of a semantic concept in videos can have large variations. The first hurdle to clear for the video re-localization task is the lack of existing datasets. It is labor expensive to collect pairs of videos with semantic coherence or correspondence and label the corresponding segments. We first exploit and reorganize the videos in ActivityNet to form a new dataset for video re-localization research, which consists of about 10,000 videos of diverse visual appearances associated with localized boundary information. Subsequently, we propose an innovative cross gated bilinear matching model such that every time-step in the reference video is matched against the attentively weighted query video. Consequently, the prediction of the starting and ending time is formulated as a classification problem based on the matching results. Extensive experimental results show that the proposed method outperforms the competing methods. Our code is available at: https://github.com/fengyang0317/video_reloc.

    • [cs.CV]Visual Question Generation for Class Acquisition of Unknown Objects
    Kohei Uehara, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, Tatsuya Harada
    http://arxiv.org/abs/1808.01821v1

    Traditional image recognition methods only consider objects belonging to already learned classes. However, since training a recognition model with every object class in the world is unfeasible, a way of getting information on unknown objects (i.e., objects whose class has not been learned) is necessary. A way for an image recognition system to learn new classes could be asking a human about objects that are unknown. In this paper, we propose a method for generating questions about unknown objects in an image, as means to get information about classes that have not been learned. Our method consists of a module for proposing objects, a module for identifying unknown objects, and a module for generating questions about unknown objects. The experimental results via human evaluation show that our method can successfully get information about unknown objects in an image dataset. Our code and dataset are available at https://github.com/mil-tokyo/vqg-unknown.

    • [cs.CY]On Robot Revolution and Taxation
    Tshilidzi Marwala
    http://arxiv.org/abs/1808.01666v1

    Advances in artificial intelligence are resulting in the rapid automation of the work force. The tools that are used to automate are called robots. Bill Gates proposed that in order to deal with the problem of the loss of jobs and reduction of the tax revenue we ought to tax the robots. The problem with taxing the robots is that it is not easy to know what a robot is. This article studies the definition of a robot and the implication of advances in robotics on taxation. It is evident from this article that it is a difficult task to establish what a robot is and what is not a robot. It concludes that taxing robots is the same as increasing corporate tax.

    • [cs.CY]Predicting Learning Status in MOOCs using LSTM
    Zhemin Liu, Feng Xiong, Kaifa Zou, Hongzhi Wang
    http://arxiv.org/abs/1808.01616v1

    Real-time and open online course resources of MOOCs have attracted a large number of learners in recent years. However, many new questions were emerging about the high dropout rate of learners. For MOOCs platform, predicting the learning status of MOOCs learners in real time with high accuracy is the crucial task, and it also help improve the quality of MOOCs teaching. The prediction task in this paper is inherently a time series prediction problem, and can be treated as time series classification problem, hence this paper proposed a prediction model based on RNNLSTMs and optimization techniques which can be used to predict learners' learning status. Using datasets provided by Chinese University MOOCs as the inputs of model, the average accuracy of model's outputs was about 90%.

    • [cs.CY]Where The Light Gets In: Analyzing Web Censorship Mechanisms in India
    Tarun Kumar Yadav, Akshat Sinha, Devashish Gosain, Piyush Sharma, Sambuddho Chakravarty
    http://arxiv.org/abs/1808.01708v1

    This paper presents a detailed study of the Internet censorship in India. We consolidated a list of potentially blocked websites from various public sources to assess censorship mechanisms used by nine major ISPs. To begin with, we demonstrate that existing censorship detection tools like OONI are grossly inaccurate. We thus developed various techniques and heuristics to correctly assess censorship and study the underlying mechanism involved in these ISPs. At every step we corroborated our finding manually to test the efficacy of our approach, a step largely ignored by others. We fortify our findings by adjudging the coverage and consistency of censorship infrastructure, broadly in terms of average number of network paths and requested domains the infrastructure surveils. Our results indicate a clear disparity among the ISPs, on how they install censorship infrastructure. For instance, in Idea network we observed the censorious middleboxes on over 90% of our tested intra-AS paths whereas for Vodafone, it is as low as 2.5%. We conclude our research by devising our own novel anti-censorship strategies, that does not depend on third party tools (like proxies, Tor and VPNs etc.). We managed to anti-censor all blocked websites in all ISPs under test.

    • [cs.DC]Edge Based Data-Driven Pipelines (Technical Report)
    Eduard Gibert Renart, Daniel Balouek-Thomert, Manish Parashar
    http://arxiv.org/abs/1808.01353v1

    This research reports investigates an edge on-device stream processing platform, which extends the serverless com- puting model to the edge to help facilitate real-time data analytics across the cloud and edge in a uniform manner. We investigate associated use cases and architectural design. We deployed and tested our system on edge devices (Raspberry Pi and Android Phone), which proves that stream processing analytics can be performed at the edge of the network with single board computers in a real-time fashion.

    • [cs.DC]Rapido: A Layer2 Payment System for Decentralized Currencies
    Changting Lin, Ming Ma, Xun Wang, Zhenguang Liu, Jianhai Chen, Shouling Ji
    http://arxiv.org/abs/1808.01561v1

    Bitcoin blockchain faces the bitcoin scalability problem, for which bitcoin's blocks contain the transactions on the bitcoin network. The on-chain transaction processing capacity of the bitcoin network is limited by the average block creation time of 10 minutes and the block size limit. These jointly constrain the network's throughput. The transaction processing capacity maximum is estimated between 3.3 and 7 transactions per second (TPS). A Layer2 Network, named Lightning Network, is proposed and activated solutions to address this problem. LN operates on top of the bitcoin network as a cache to allow payments to be affected that are not immediately put on the blockchain. However, it also brings some drawbacks. In this paper, we observe a specific payment issue among current LN, which requires additional claims to blockchain and is time-consuming. We call the issue as shares issue. Therefore, we propose Rapido to explicitly address the shares issue. Furthermore, a new smart contract, D-HTLC, is equipped with Rapido as the payment protocol. We finally provide a proof of concept implementation and simulation for both Rapido and LN, in which Rapdio not only mitigates the shares issue but also mitigates the skewness issue thus is proved to be more applicable for various transactions than LN.

    • [cs.HC]Kid on The Phone! Toward Automatic Detection of Children on Mobile Devices
    Toan Nguyen, Aditi Roy, Nasir Memon
    http://arxiv.org/abs/1808.01680v1

    Studies have shown that children can be exposed to smart devices at a very early age. This has important implications on research in children-computer interaction, children online safety and early education. Many systems have been built based on such research. In this work, we present multiple techniques to automatically detect the presence of a child on a smart device, which could be used as the first step on such systems. Our methods distinguish children from adults based on behavioral differences while operating a touch-enabled modern computing device. Behavioral differences are extracted from data recorded by the touchscreen and built-in sensors. To evaluate the effectiveness of the proposed methods, a new data set has been created from 50 children and adults who interacted with off-the-shelf applications on smart phones. Results show that it is possible to achieve 99% accuracy and less than 0.5% error rate after 8 consecutive touch gestures using only touch information or 5 seconds of sensor reading. If information is used from multiple sensors, then only after 3 gestures, similar performance could be achieved.

    • [cs.IR]Automated Extraction of Personal Knowledge from Smartphone Push Notifications
    Yuanchun Li, Ziyue Yang, Yao Guo, Xiangqun Chen, Yuvraj Agarwal, Jason Hong
    http://arxiv.org/abs/1808.02013v1

    Personalized services are in need of a rich and powerful personal knowledge base, i.e. a knowledge base containing information about the user. This paper proposes an approach to extracting personal knowledge from smartphone push notifications, which are used by mobile systems and apps to inform users of a rich range of information. Our solution is based on the insight that most notifications are formatted using templates, while knowledge entities can be usually found within the parameters to the templates. As defining all the notification templates and their semantic rules are impractical due to the huge number of notification templates used by potentially millions of apps, we propose an automated approach for personal knowledge extraction from push notifications. We first discover notification templates through pattern mining, then use machine learning to understand the template semantics. Based on the templates and their semantics, we are able to translate notification text into knowledge facts automatically. Users' privacy is preserved as we only need to upload the templates to the server for model training, which do not contain any personal information. According to our experiments with about 120 million push notifications from 100,000 smartphone users, our system is able to extract personal knowledge accurately and efficiently.

    • [cs.IR]Evaluating Wikipedia as a source of information for disease understanding
    Eduardo P. Garcia del Valle, Gerardo Lagunes Garcia, Lucia Prieto Santamaria, Massimiliano Zanin, Alejandro Rodriguez-Gonzalez, Ernestina Menasalvas Ruiz
    http://arxiv.org/abs/1808.01459v1

    The increasing availability of biological data is improving our understanding of diseases and providing new insight into their underlying relationships. Thanks to the improvements on both text mining techniques and computational capacity, the combination of biological data with semantic information obtained from medical publications has proven to be a very promising path. However, the limitations in the access to these data and their lack of structure pose challenges to this approach. In this document we propose the use of Wikipedia - the free online encyclopedia - as a source of accessible textual information for disease understanding research. To check its validity, we compare its performance in the determination of relationships between diseases with that of PubMed, one of the most consulted data sources of medical texts. The obtained results suggest that the information extracted from Wikipedia is as relevant as that obtained from PubMed abstracts (i.e. the free access portion of its articles), although further research is proposed to verify its reliability for medical studies.

    • [cs.IT]A Blockchain Example for Cooperative Interference Management
    Aly El Gamal, Hesham El Gamal
    http://arxiv.org/abs/1808.01538v1

    We present an example where a distributed coordinated protocol supported by a blockchain-enabled monetary mechanism leads to achieving optimal information theoretic degrees of freedom gains. The considered setting is that of a linear interference network, where cooperative transmission is allowed, but at no cost in terms of the overall backhaul load. In other words, the average number of messages assigned to a transmitter is one. We show that a simple monetary mechanism that consists only of one coin type can enable the achievability of the optimal centralized solution. The proposed greedy distributed algorithm relies on incentivizing the users to share their resources in one channel use, in return of credit they receive for maximizing their rate gains in future channel uses. This example is the first in its class and it opens the door for constructing a unified framework for blockchain-enabled monetary mechanisms for optimal interference management and spectrum sharing.

    • [cs.IT]A Flip-Syndrome-List Polar Decoder Architecture for Ultra-Low-Latency Communications
    Huazi Zhang, Jiajie Tong, Rong Li, Pengcheng Qiu, Yourui Huangfu, Chen Xu, Xianbin Wang, Jun Wang
    http://arxiv.org/abs/1808.01756v1

    We consider practical hardware implementation of Polar decoders. To reduce latency due to the serial nature of successive cancellation (SC), existing optimizations improve parallelism with two approaches, i.e., multi-bit decision or reduced path splitting. In this paper, we combine the two procedures into one with an error-pattern-based architecture. It simultaneously generates a set of candidate paths for multiple bits with pre-stored patterns. For rate-1 (R1) or single parity-check (SPC) nodes, we prove that a small number of deterministic patterns are required to guarantee performance preservation. For general nodes, low-weight error patterns are indexed by syndrome in a look-up table and retrieved in O(1) time. The proposed flip-syndrome-list (FSL) decoder fully parallelizes all constituent code blocks without sacrificing performance, thus is suitable for ultra-low-latency applications. Meanwhile, two code construction optimizations are presented to further reduce complexity and improve performance, respectively.

    • [cs.IT]Designing molecular circuit for approximate maximum a posteriori demodulation of concentration modulated signals
    Chun Tung Chou
    http://arxiv.org/abs/1808.01543v1

    Motivated by the fact that living cells use molecular circuits (i.e. a set of chemical reactions) for information processing, this paper investigates the problem of designing molecular circuits for demodulation. In our earlier work, we use a Markovian approach to derive a demodulator for diffusion-based molecular communication. The demodulation filters take the form of an ordinary differential equation which computes the log-posteriori probability of observing a transmission symbol. This work considers the realisation of these demodulation filters using molecular circuits assuming the transmission symbols are rectangular pulses of the same duration but different amplitudes, i.e. concentration modulation. This paper makes a number of contributions. First, we use time-scale separation and renewal theory to analytically derive an approximation of the demodulation filter from our earlier work. Second, we present a method to turn this approximation into a molecular circuit. By using simulation, we show that the output of the derived molecular circuit is approximately equal to the log-posteriori probability calculated by the exact demodulation filter if the log-posteriori probability is positive. Third, we demonstrate that a biochemical circuit in yeast behaves similar to the derived molecular circuit and is therefore a candidate for implementing the derived molecular circuit.

    • [cs.IT]Energy-Age Tradeoff in Status Update Communication Systems with Retransmission
    Jie Gong, Xiang Chen, Xiao Ma
    http://arxiv.org/abs/1808.01720v1

    Age-of-information is a novel performance metric in communication systems to indicate the freshness of the latest received data, which has wide applications in monitoring and control scenarios. Another important performance metric in these applications is energy consumption, since monitors or sensors are usually energy constrained. In this paper, we study the energy-age tradeoff in a status update system where data transmission from a source to a receiver may encounter failure due to channel error. As the status sensing process consumes energy, when a transmission failure happens, the source may either retransmit the existing data to save energy for sensing, or sense and transmit a new update to minimize age-of-information. A threshold-based retransmission policy is considered where each update is allowed to be transmitted no more than M times. Closed-form average age-of-information and energy consumption is derived and expressed as a function of channel failure probability and maximum number of retransmissions M. Numerical simulations validate our analytical results, and illustrate the tradeoff between average age-of-information and energy consumption.

    • [cs.IT]Fundamentals of Simultaneous Wireless Information and Power Transmission in Heterogeneous Networks: A Cell Load Perspective
    Chun-Hung Liu, Chi-Sheng Hsu
    http://arxiv.org/abs/1808.01323v1

    In a heterogeneous cellular network (HetNet) consisting of multiple different types (tiers) of base stations (BSs), the void cell event in which a BS does not have any users has been shown to exist due to user-centric BS association and its probability is dominated by the cell load of each tier. Such a void cell phenomenon has not been well characterized in the modeling and analytical framework of simultaneous wireless information and power transmission (SWIPT) in a HetNet. This paper aims to accurately exploit the fundamental performance limits of the SWIPT between a BS and its user by modeling the cell-load impact on the downlink and uplink transmissions of each BS. We first characterize the power-splitting receiver architecture at a user and analyze the statistical properties and limits of its harvested power and energy, which reveals how much of the average energy can be harvested by users and how likely the self-powered sustainability of users can be achieved. We then derive the downlink and uplink rates that characterize the cell-load and user association effects and use them to define the energy efficiency of a user. The optimality of the energy efficiency is investigated, which maximizes the SWIPT performance of the receiver architecture for different user association and network deployment scenarios.

    • [cs.IT]GLSE Precoders for Massive MIMO Systems: Analysis and Applications
    Ali Bereyhi, Mohammad Ali Sedaghat, Ralf R. Müller, Georg Fischer
    http://arxiv.org/abs/1808.01880v1

    This paper proposes the class of Generalized Least-Square-Error (GLSE) precoders for multiuser massive MIMO systems. For a generic transmit constellation, GLSE precoders minimize the interference at user terminals assuring that given constraints on the transmit signals are satisfied. The general form of these precoders enables us to impose multiple restrictions at the transmit signal such as limited peak power and restricted number of active transmit antennas. Invoking the replica method from statistical mechanics, we study the performance of GLSE precoders in the large-system limit. We show that the output symbols of these precoders are identically distributed and their statistics are described with an equivalent scalar GLSE precoder. Using the asymptotic results, we further address some applications of the GLSE precoders; namely forming transmit signals over a restricted alphabet and transmit antenna selection. Our investigations demonstrate that a computationally efficient GLSE precoder requires 41\% fewer active transmit antennas than conventional selection protocols in order to achieve a desired level of input-output distortion.

    • [cs.IT]Improper Signaling versus Time-Sharing in the Two-User Gaussian Interference Channel with TIN
    Christoph Hellings, Wolfgang Utschick
    http://arxiv.org/abs/1808.01611v1

    So-called improper complex signals have been shown to be beneficial in the single-antenna two-user Gaussian interference channel under the assumptions that all input signals are Gaussian and that we treat interference as noise (TIN). This result has been obtained under a restriction to pure strategies without time-sharing, and it was extended to the case where the rates, but not the transmit powers, may be averaged over several transmit strategies. In this paper, we drop such restrictions and discuss the most general case of time-sharing where both the rates and the powers may be averaged. Since this information theoretic notion of time-sharing cannot be expressed by means of a convex hull of the rate region, we have to account for the possibility of time-sharing already during the optimization of the transmit strategy. By studying the properties of the resulting optimization problem using Lagrange duality, we obtain a surprising result: proper signals can be proven to be optimal if time-sharing is allowed.

    • [cs.IT]Linearly Precoded Rate Splitting: Optimality and Non-Optimality for MIMO Broadcast Channels
    Zheng Li, Sheng Yang, Shlomo Shamai
    http://arxiv.org/abs/1808.01810v1

    In this paper, we consider a general K-user multiple-input multiple-output (MIMO) broadcast channel (BC). We assume that the channel state is deterministic and known to all the nodes. While the capacity region is well known to be achievable with dirty paper coding (DPC), we are interested in the simpler linearly precoded transmission schemes. First, using a simple two-user example, we show that any linear precoding scheme with only private streams can have an unbounded gap to the sum capacity of the channel. Second, we propose a rate-splitting (RS) scheme with minimum mean square error (MMSE) precoding, and demonstrate that the proposed scheme achieves the whole capacity region to within a constant gap in the two-user case. Third, we prove that the proposed scheme does not enjoy the same optimality in the three-user case, which shows the non-optimality of the proposed RS scheme in general. Through a simple pathological example, our study reveals a fundamental gap between the transmitter-side and the receiver-side interference mitigation.

    • [cs.IT]Millimeter Wave Location-Based Beamforming using Compressive Sensing
    Ahmed Abdelreheem, Ehab Mahmoud Mohamed, Hamada Esmaiel
    http://arxiv.org/abs/1808.01512v1

    This paper develops a location based analog beamforming (BF) technique using compressive sensing (CS) to be feasible for millimeter wave (mmWave) wireless communication systems. The proposed scheme is based on exploiting the benefits of CS and localization to reduce mmWave beamforming (BF) complexity and enhance its performance compared with conventional mmWave analog BF techniques. CS theory is used to exploit the sparse nature of the mmWave propagation channel to estimate both the angle of departures (AoDs) and the angle of arrivals (AoAs) of the mmWave channel, and knowing the node location effectively reduces the number of BF vectors required for constructing the sensing matrix. Hence, a high accurate mmWave BF with a low set-up time can be obtained. Simulation analysis confirms the high effectiveness of the proposed mmWave BF technique compared to the conventional exhaustive search BF and the CS based BF without localization using random measurements

    • [cs.IT]Model-Aided Wireless Artificial Intelligence: Embedding Expert Knowledge in Deep Neural Networks Towards Wireless Systems Optimization
    Alessio Zappone, Marco Di Renzo, Mérouane Debbah, Thanh Tu Lam, Xuewen Qian
    http://arxiv.org/abs/1808.01672v1

    Deep learning based on artificial neural networks is a powerful machine learning method that, in the last few years, has been successfully used to realize tasks, e.g., image classification, speech recognition, translation of languages, etc., that are usually simple to execute by human beings but extremely difficult to perform by machines. This is one of the reasons why deep learning is considered to be one of the main enablers to realize the notion of artificial intelligence. The current methodology in deep learning methods consists of employing a data-driven approach in order to identify the best architecture of an artificial neural network that allows one to fit input-output data pairs. Once the artificial neural network is trained, it is capable of responding to never-observed inputs by providing the optimum output based on past acquired knowledge. In this context, a recent trend in the deep learning community is to complement pure data-driven approaches with prior information based on expert knowledge. This work describes two methods that implement this strategy in the context of wireless communications, also providing specific case-studies to assess the performance compared to pure data-driven approaches.

    • [cs.IT]New Viewpoint and Algorithms for Water-Filling Solutions in Wireless Communications
    Chengwen Xing, Yindi Jing, Shuai Wang, Shaodan Ma, H. Vincent Poor
    http://arxiv.org/abs/1808.01707v1

    Water-filling solutions play an important role in the designs for wireless communications, e.g., transmit covariance matrix design. A traditional physical understanding is to use the analogy of pouring water over a pool with fluctuating bottom. Numerous variants of water-filling solutions have been discovered during the evolution of wireless networks. To obtain the solution values, iterative computations are required, even for simple cases with compact mathematical formulations. Thus, algorithm design is a key issue for the practical use of water-filling solutions, which however has been given marginal attention in the literature. Many existing algorithms are designed on a case-by-case basis for the variations of water-filling solutions and/or with overly complex logics. In this paper, a new viewpoint for water-filling solutions is proposed to understand the problem dynamically by considering changes in the increasing rates on different subchannels. This fresh viewpoint provides useful mechanism and fundamental information in finding the optimization solution values. Based on the new understanding, a novel and comprehensive method for practical water-filling algorithm design is proposed, which can be used for systems with various performance metrics and power constraints, even for systems with imperfect channel state information.

    • [cs.IT]On Lipschitz Bounds of General Convolutional Neural Networks
    Dongmian Zou, Radu Balan, Maneesh Singh
    http://arxiv.org/abs/1808.01415v1

    Many convolutional neural networks (CNNs) have a feed-forward structure. In this paper, a linear program that estimates the Lipschitz bound of such CNNs is proposed. Several CNNs, including the scattering networks, the AlexNet and the GoogleNet, are studied numerically and compared to the theoretical bounds. Next, concentration inequalities of the output distribution to a stationary random input signal expressed in terms of the Lipschitz bound are established. The Lipschitz bound is further used to establish a nonlinear discriminant analysis designed to measure the separation between features of different classes.

    • [cs.IT]On the Duality and File Size Hierarchy of Fractional Repetition Codes
    Bing Zhu, Kenneth W. Shum, Hui Li
    http://arxiv.org/abs/1808.01933v1

    Distributed storage systems that deploy erasure codes can provide better features such as lower storage overhead and higher data reliability. In this paper, we focus on fractional repetition (FR) codes, which are a class of storage codes characterized by the features of uncoded exact repair and minimum repair bandwidth. We study the duality of FR codes, and investigate the relationship between the supported file size of an FR code and its dual code. Based on the established relationship, we derive an improved dual bound on the supported file size of FR codes. We further show that FR codes constructed from t-designs are optimal when the size of the stored file is sufficiently large. Moreover, we present the tensor product technique for combining FR codes, and elaborate on the file size hierarchy of resulting codes.

    • [cs.IT]On the Optimality of the Kautz-Singleton Construction in Probabilistic Group Testing
    Huseyin A. Inan, Peter Kairouz, Mary Wootters, Ayfer Ozgur
    http://arxiv.org/abs/1808.01457v1

    We consider the probabilistic group testing problem where d random defective items in a large population of N items are identified with high probability by applying binary tests. It is known that \Theta(d \log N) tests are necessary and sufficient to recover the defective set with vanishing probability of error. However, to the best of our knowledge, there is no explicit (deterministic) construction achieving \Theta(d \log N) tests in general. In this work, we show that a famous construction introduced by Kautz and Singleton for the combinatorial group testing problem (which is known to be suboptimal for combinatorial group testing for moderate values of d) achieves the order optimal \Theta(d \log N) tests in the probabilistic group testing problem. This provides the first strongly explicit construction achieving the order optimal result in the probabilistic group testing setting. To prove the order-optimality of Kautz and Singleton's construction in the probabilistic setting, we provide a novel analysis of the probability of a non-defective item being covered by a random defective set directly, rather than arguing from combinatorial properties of the underlying code, which has been the main approach in the literature. Furthermore, we use a recursive technique to convert this construction into one that can also be efficiently decoded with only a log-log factor increase in the number of tests.

    • [cs.IT]Robust Secrecy Energy Efficient Beamforming in MISOME-SWIPT Systems With Proportional Fairness
    Yanjie Dong, Md. Jahangir Hossain, Julian Cheng, Victor C. M. Leung
    http://arxiv.org/abs/1808.02004v1

    The joint design of beamforming vector and artificial noise covariance matrix is investigated for multiple-input-single-output-multiple-eavesdropper simultaneous wireless information and power transferring (MISOME-SWIPT) systems. A secrecy energy efficiency (SEE) minimization problem is formulated in the MISOME-SWIPT system with imperfect channel state information and proportional secrecy rate constraints. Since the formulated SEE minimization problem is non-convex, it is first recast into a series of convex problems in order to obtain the optimal solution with a reasonable computational complexity. Numerical results are used to verify the performance of the proposed algorithm and to reveal practical insights.

    • [cs.IT]Scalability Analysis of a LoRa Network under Imperfect Orthogonality
    Aamir Mahmood, Emiliano Sisinni, Lakshmikanth Guntupalli, Raúl Rondón, Syed Ali Hassan, Mikael Gidlund
    http://arxiv.org/abs/1808.01761v1

    Low-power wide-area network (LPWAN) technologies are gaining momentum for internet-of-things (IoT) applications since they promise wide coverage to a massive number of battery-operated devices using grant-free medium access. LoRaWAN, with its physical (PHY) layer design and regulatory efforts, has emerged as the widely adopted LPWAN solution. By using chirp spread spectrum modulation with qausi-orthogonal spreading factors (SFs), LoRa PHY offers coverage to wide-area applications while supporting high-density of devices. However, thus far its scalability performance has been inadequately modeled and the effect of interference resulting from the imperfect orthogonality of the SFs has not been considered. In this paper, we present an analytical model of a single-cell LoRa system that accounts for the impact of interference among transmissions over the same SF (co-SF) as well as different SFs (inter-SF). By modeling the interference field as Poisson point process under duty-cycled ALOHA, we derive the signal-to-interference ratio (SIR) distributions for several interference conditions. Results show that, for a duty cycle as low as 0.33%, the network performance under co-SF interference alone is considerably optimistic as the inclusion of inter-SF interference unveils a further drop in the success probability and the coverage probability of approximately 10% and 15%, respectively for 1500 devices in a LoRa channel. Finally, we illustrate how our analysis can characterize the critical device density with respect to cell size for a given reliability target.

    • [cs.IT]Stability and Throughput Analysis of Multiple Access Networks with Finite Blocklength Constraints
    Christos K. Kourtellaris, Constantinos Psomas, Ioannis Krikidis
    http://arxiv.org/abs/1808.01986v1

    Motivated by the demand of ultra reliable and low latency communications, we employ tools from information theory, stochastic processes and queueing theory, in order to provide a comprehensive framework regarding the analysis of a Time Division Multiple Access (TDMA) network with bursty traffic, in the finite blocklength regime. Specifically, we re-examine the stability conditions, evaluate the optimal throughput, and identify the optimal trade off between data packet size and latency. The evaluation is performed both numerically and via the proposed approximations that result in closed form expressions. Then, we examine the stability conditions and the performance of the Multiple Access Relay Channel with TDMA scheduling, subject to finite blocklength constraints, by applying a cognitive cooperation protocol that assumes relaying is enabled when sources are idle. Finally, we propose the novel Batch-And-Forward (BAF) strategy, that can significantly enhance the performance of cooperative networks in the finite blocklength regime, as well as reduce the requirement in metadata. The BAF strategy is quite versatile, thus, it can be embedded in existing cooperative protocols, without imposing additional complexity on the overall scheme.

    • [cs.IT]Super Resolution Phase Retrieval for Sparse Signals
    Gilles Baechler, Miranda Kreković, Juri Ranieri, Amina Chebira, Yue M. Lu, Martin Vetterli
    http://arxiv.org/abs/1808.01961v1

    In a variety of fields, in particular those involving imaging and optics, we often measure signals whose phase is missing or has been irremediably distorted. Phase retrieval attempts to recover the phase information of a signal from the magnitude of its Fourier transform to enable the reconstruction of the original signal. Solving the phase retrieval problem is equivalent to recovering a signal from its auto-correlation function. In this paper, we assume the original signal to be sparse; this is a natural assumption in many applications, such as X-ray crystallography, speckle imaging and blind channel estimation. We propose an algorithm that resolves the phase retrieval problem in three stages: i) we leverage the finite rate of innovation sampling theory to super-resolve the auto-correlation function from a limited number of samples, ii) we design a greedy algorithm that identifies the locations of a sparse solution given the super-resolved auto-correlation function, iii) we recover the amplitudes of the atoms given their locations and the measured auto-correlation function. Unlike traditional approaches that recover a discrete approximation of the underlying signal, our algorithm estimates the signal on a continuous domain, which makes it the first of its kind. Along with the algorithm, we derive its performance bound with a theoretical analysis and propose a set of enhancements to improve its computational complexity and noise resilience. Finally, we demonstrate the benefits of the proposed method via a comparison against Charge Flipping, a notable algorithm in crystallography.

    • [cs.IT]Two Practical Random-Subcarrier-Selection Methods for Secure Precise Wireless Transmission
    Tong Shen, Shuo Zhang, Riqing Chen, Jin Wang, Jinsong Hu, Feng Shu, Jiangzhou Wang
    http://arxiv.org/abs/1808.01896v1

    In secure precise directional modulation (DM) networks, two practical random-subcarrier-selection (RSS) methods are proposed to transmit confidential message to the desired user per orthogonal frequency division multiplexing (OFDM) symbol with only single receive power peak formed by constructing random subcarrier set and performing a randomization procedure. This scheme completely addresses the crucial problem facing secure precise wireless transmission (SPWT), how to achieve the SPWT per OFDM symbol while the traditional SPWT holds only in statistically average sense. Several necessary conditions for SPWT per OFDM is derived and proposed: randomly distributed, sparse, and distinct subcarrier distance between two pair of adjacent antennas. Two random subcarrier sets (RSSs), quadratic subcarrier set (QSS) and prime subcarrier set (PSS), are constructed, where the former means the subcarrier index associated with any antennas is the square of antenna index, and the latter implies that the subcarrier indices over all antennas are prime numbers. Subsequently, following those conditions for SPWT per OFDM, a random factor is defined, and a randomization procedure (RP) is proposed. Its detailed process includes the following steps: integer mod, ordering, blocking, and block interleaving where BI is repeated until the random factor is greater than the predefined threshold. This yields a single main receive energy peak (SMREP) at the desired position with other positions, outside the SMREP, harvesting only weak receive energy seriously corrupted by AN.

    • [cs.LG]A Review of Learning with Deep Generative Models from perspective of graphical modeling
    Zhijian Ou
    http://arxiv.org/abs/1808.01630v1

    This document aims to provide a review on learning with deep generative models (DGMs), which is an highly-active area in machine learning and more generally, artificial intelligence. This review is not meant to be a tutorial, but when necessary, we provide self-contained derivations for completeness. This review has two features. First, though there are different perspectives to classify DGMs, we choose to organize this review from the perspective of graphical modeling, because the learning methods for directed DGMs and undirected DGMs are fundamentally different. Second, we differentiate model definitions from model learning algorithms, since different learning algorithms can be applied to solve the learning problem on the same model, and an algorithm can be applied to learn different models. We thus separate model definition and model learning, with more emphasis on reviewing, differentiating and connecting different learning algorithms. We also discuss promising future research directions. This review is by no means comprehensive as the field is evolving rapidly. The authors apologize in advance for any missed papers and inaccuracies in descriptions. Corrections and comments are highly welcome.

    • [cs.LG]A Review on Image- and Network-based Brain Data Analysis Techniques for Alzheimer's Disease Diagnosis Reveals a Gap in Developing Predictive Methods for Prognosis
    Mayssa Soussia, Islem Rekik
    http://arxiv.org/abs/1808.01951v1

    Unveiling pathological brain changes associated with Alzheimer's disease (AD) is a challenging task especially that people do not show symptoms of dementia until it is late. Over the past years, neuroimaging techniques paved the way for computer-based diagnosis and prognosis to facilitate the automation of medical decision support and help clinicians identify cognitively intact subjects that are at high-risk of developing AD. As a progressive neurodegenerative disorder, researchers investigated how AD affects the brain using different approaches: 1) image-based methods where mainly neuroimaging modalities are used to provide early AD biomarkers, and 2) network-based methods which focus on functional and structural brain connectivities to give insights into how AD alters brain wiring. In this study, we reviewed neuroimaging-based technical methods developed for AD and mild-cognitive impairment (MCI) classification and prediction tasks, selected by screening all MICCAI proceedings published between 2010 and 2016. We included papers that fit into image-based or network-based categories. The majority of papers focused on classifying MCI vs. AD brain states, which has enabled the discovery of discriminative or altered brain regions and connections. However, very few works aimed to predict MCI progression based on early neuroimaging-based observations. Despite the high importance of reliably identifying which early MCI patient will convert to AD, remain stable or reverse to normal over months/years, predictive models are still lagging behind.

    • [cs.LG]A Survey on Deep Transfer Learning
    Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, Chunfang Liu
    http://arxiv.org/abs/1808.01974v1

    As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.

    • [cs.LG]A Survey on Surrogate Approaches to Non-negative Matrix Factorization
    Pascal Fernsel, Peter Maaß
    http://arxiv.org/abs/1808.01975v1

    Motivated by applications in hyperspectral imaging we investigate methods for approximating a high-dimensional non-negative matrix \mathbf{\mathit{Y}} by a product of two lower-dimensional, non-negative matrices \mathbf{\mathit{K}} and \mathbf{\mathit{X}}. This so-called non-negative matrix factorization is based on defining suitable Tikhonov functionals, which combine a discrepancy measure for \mathbf{\mathit{Y}}\approx\mathbf{\mathit{KX}} with penalty terms for enforcing additional properties of \mathbf{\mathit{K}} and \mathbf{\mathit{X}}. The minimization is based on alternating minimization with respect to \mathbf{\mathit{K}} or \mathbf{\mathit{X}}, where in each iteration step one replaces the original Tikhonov functional by a locally defined surrogate functional. The choice of surrogate functionals is crucial: It should allow a comparatively simple minimization and simultaneously its first order optimality condition should lead to multiplicative update rules, which automatically preserve non-negativity of the iterates. We review the most standard construction principles for surrogate functionals for Frobenius-norm and Kullback-Leibler discrepancy measures. We extend the known surrogate constructions by a general framework, which allows to add a large variety of penalty terms. The paper finishes by deriving the corresponding alternating minimization schemes explicitely and by applying these methods to MALDI imaging data.

    • [cs.LG]Adversarial Vision Challenge
    Wieland Brendel, Jonas Rauber, Alexey Kurakin, Nicolas Papernot, Behar Veliqi, Marcel Salathé, Sharada P. Mohanty, Matthias Bethge
    http://arxiv.org/abs/1808.01976v1

    The NIPS 2018 Adversarial Vision Challenge is a competition to facilitate measurable progress towards robust machine vision models and more generally applicable adversarial attacks. This document is an updated version of our competition proposal that was accepted in the competition track of 32nd Conference on Neural Information Processing Systems (NIPS 2018).

    • [cs.LG]Autoencoder Based Sample Selection for Self-Taught Learning
    Siwei Feng, Marco F. Duarte
    http://arxiv.org/abs/1808.01574v1

    Self-taught learning is a technique that uses a large number of unlabeled data as source samples to improve the task performance on target samples. Compared with other transfer learning techniques, self-taught learning can be applied to a broader set of scenarios due to the loose restrictions on source data. However, knowledge transferred from source samples that are not sufficiently related to the target domain may negatively influence the target learner, which is referred to as negative transfer. In this paper, we propose a metric for the relevance between a source sample and target samples. To be more specific, both source and target samples are reconstructed through a single-layer autoencoder with a linear relationship between source samples and target samples simultaneously enforced. An l_{2,1}-norm sparsity constraint is imposed on the transformation matrix to identify source samples relevant to the target domain. Source domain samples that are deemed relevant are assigned pseudo-labels reflecting their relevance to target domain samples, and are combined with target samples in order to provide an expanded training set for classifier training. Local data structures are also preserved during source sample selection through spectral graph analysis. Promising results in extensive experiments show the advantages of the proposed approach.

    • [cs.LG]Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams
    Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrović, Amir Zandieh, Aida Mousavifar, Ola Svensson
    http://arxiv.org/abs/1808.01842v1

    Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm SALSA for streaming submodular maximization. It is the first low-memory, single-pass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that SALSA significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.

    • [cs.LG]Concentration bounds for empirical conditional value-at-risk: The unbounded case
    Ravi Kumar Kolla, Prashanth L. A., Sanjay P. Bhat, Krishna Jagannathan
    http://arxiv.org/abs/1808.01739v1

    In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event. Conditional Value-at-Risk (CVaR) is a popular risk measure for modeling the aforementioned objective. We consider the problem of estimating CVaR from i.i.d. samples of an unbounded random variable, which is either sub-Gaussian or sub-exponential. We derive a novel one-sided concentration bound for a natural sample-based CVaR estimator in this setting. Our bound relies on a concentration result for a quantile-based estimator for Value-at-Risk (VaR), which may be of independent interest.

    • [cs.LG]DELIMIT PyTorch - An extension for Deep Learning in Diffusion Imaging
    Simon Koppers, Dorit Merhof
    http://arxiv.org/abs/1808.01517v1

    DELIMIT is a framework extension for deep learning in diffusion imaging, which extends the basic framework PyTorch towards spherical signals. Based on several novel layers, deep learning can be applied to spherical diffusion imaging data in a very convenient way. First, two spherical harmonic interpolation layers are added to the extension, which allow to transform the signal from spherical surface space into the spherical harmonic space, and vice versa. In addition, a local spherical convolution layer is introduced that adds the possibility to include gradient neighborhood information within the network. Furthermore, these extensions can also be utilized for the preprocessing of diffusion signals.

    • [cs.LG]Deep Reinforcement One-Shot Learning for Artificially Intelligent Classification Systems
    Anton Puzanov, Kobi Cohen
    http://arxiv.org/abs/1808.01527v1

    In recent years there has been a sharp rise in networking applications, in which significant events need to be classified but only a few training instances are available. These are known as cases of one-shot learning. Examples include analyzing network traffic under zero-day attacks, and computer vision tasks by sensor networks deployed in the field. To handle this challenging task, organizations often use human analysts to classify events under high uncertainty. Existing algorithms use a threshold-based mechanism to decide whether to classify an object automatically or send it to an analyst for deeper inspection. However, this approach leads to a significant waste of resources since it does not take the practical temporal constraints of system resources into account. Our contribution is threefold. First, we develop a novel Deep Reinforcement One-shot Learning (DeROL) framework to address this challenge. The basic idea of the DeROL algorithm is to train a deep-Q network to obtain a policy which is oblivious to the unseen classes in the testing data. Then, in real-time, DeROL maps the current state of the one-shot learning process to operational actions based on the trained deep-Q network, to maximize the objective function. Second, we develop the first open-source software for practical artificially intelligent one-shot classification systems with limited resources for the benefit of researchers in related fields. Third, we present an extensive experimental study using the OMNIGLOT dataset for computer vision tasks and the UNSW-NB15 dataset for intrusion detection tasks that demonstrates the versatility and efficiency of the DeROL framework.

    • [cs.LG]Designing Adaptive Neural Networks for Energy-Constrained Image Classification
    Dimitrios Stamoulis, Ting-Wu, Chin, Anand Krishnan Prakash, Haocheng Fang, Sribhuvan Sajja, Mitchell Bognar, Diana Marculescu
    http://arxiv.org/abs/1808.01550v1

    As convolutional neural networks (CNNs) enable state-of-the-art computer vision applications, their high energy consumption has emerged as a key impediment to their deployment on embedded and mobile devices. Towards efficient image classification under hardware constraints, prior work has proposed adaptive CNNs, i.e., systems of networks with different accuracy and computation characteristics, where a selection scheme adaptively selects the network to be evaluated for each input image. While previous efforts have investigated different network selection schemes, we find that they do not necessarily result in energy savings when deployed on mobile systems. The key limitation of existing methods is that they learn only how data should be processed among the CNNs and not the network architectures, with each network being treated as a blackbox. To address this limitation, we pursue a more powerful design paradigm where the architecture settings of the CNNs are treated as hyper-parameters to be globally optimized. We cast the design of adaptive CNNs as a hyper-parameter optimization problem with respect to energy, accuracy, and communication constraints imposed by the mobile device. To efficiently solve this problem, we adapt Bayesian optimization to the properties of the design space, reaching near-optimal configurations in few tens of function evaluations. Our method reduces the energy consumed for image classification on a mobile device by up to 6x, compared to the best previously published work that uses CNNs as blackboxes. Finally, we evaluate two image classification practices, i.e., classifying all images locally versus over the cloud under energy and communication constraints.

    • [cs.LG]Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN
    Dror Freirich, Ron Meir, Aviv Tamar
    http://arxiv.org/abs/1808.01960v1

    The recently proposed distributional approach to reinforcement learning (DiRL) is centered on learning the distribution of the reward-to-go, often referred to as the value distribution. In this work, we show that the distributional Bellman equation, which drives DiRL methods, is equivalent to a generative adversarial network (GAN) model. In this formulation, DiRL can be seen as learning a deep generative model of the value distribution, driven by the discrepancy between the distribution of the current value, and the distribution of the sum of current reward and next value. We use this insight to propose a GAN-based approach to DiRL, which leverages the strengths of GANs in learning distributions of high-dimensional data. In particular, we show that our GAN approach can be used for DiRL with multivariate rewards, an important setting which cannot be tackled with prior methods. The multivariate setting also allows us to unify learning the distribution of values and state transitions, and we exploit this idea to devise a novel exploration method that is driven by the discrepancy in estimating both values and states.

    • [cs.LG]Global Convergence to the Equilibrium of GANs using Variational Inequalities
    Ian Gemp, Sridhar Mahadevan
    http://arxiv.org/abs/1808.01531v1

    In optimization, the negative gradient of a function denotes the direction of steepest descent. Furthermore, traveling in any direction orthogonal to the gradient maintains the value of the function. In this work, we show that these orthogonal directions that are ignored by gradient descent can be critical in equilibrium problems. Equilibrium problems have drawn heightened attention in machine learning due to the emergence of the Generative Adversarial Network (GAN). We use the framework of Variational Inequalities to analyze popular training algorithms for a fundamental GAN variant: the Wasserstein Linear-Quadratic GAN. We show that the steepest descent direction causes divergence from the equilibrium, and guaranteed convergence to the equilibrium is achieved through following a particular orthogonal direction. We call this successful technique Crossing-the-Curl, named for its mathematical derivation as well as its intuition: identify the game's axis of rotation and move "across" space in the direction towards smaller "curling".

    • [cs.LG]Hashing with Binary Matrix Pursuit
    Fatih Cakir, Kun He, Stan Sclaroff
    http://arxiv.org/abs/1808.01990v1

    We propose theoretical and empirical improvements for two-stage hashing methods. We first provide a theoretical analysis on the quality of the binary codes and show that, under mild assumptions, a residual learning scheme can construct binary codes that fit any neighborhood structure with arbitrary accuracy. Secondly, we show that with high-capacity hash functions such as CNNs, binary code inference can be greatly simplified for many standard neighborhood definitions, yielding smaller optimization problems and more robust codes. Incorporating our findings, we propose a novel two-stage hashing method that significantly outperforms previous hashing studies on widely used image retrieval benchmarks.

    • [cs.LG]Hybrid Subspace Learning for High-Dimensional Data
    Micol Marchetti-Bowick, Benjamin J. Lengerich, Ankur P. Parikh, Eric P. Xing
    http://arxiv.org/abs/1808.01687v1

    The high-dimensional data setting, in which p >> n, is a challenging statistical paradigm that appears in many real-world problems. In this setting, learning a compact, low-dimensional representation of the data can substantially help distinguish signal from noise. One way to achieve this goal is to perform subspace learning to estimate a small set of latent features that capture the majority of the variance in the original data. Most existing subspace learning models, such as PCA, assume that the data can be fully represented by its embedding in one or more latent subspaces. However, in this work, we argue that this assumption is not suitable for many high-dimensional datasets; often only some variables can easily be projected to a low-dimensional space. We propose a hybrid dimensionality reduction technique in which some features are mapped to a low-dimensional subspace while others remain in the original space. Our model leads to more accurate estimation of the latent space and lower reconstruction error. We present a simple optimization procedure for the resulting biconvex problem and show synthetic data results that demonstrate the advantages of our approach over existing methods. Finally, we demonstrate the effectiveness of this method for extracting meaningful features from both gene expression and video background subtraction datasets.

    • [cs.LG]Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
    Raul Puri, Robert Kirby, Nikolai Yakovenko, Bryan Catanzaro
    http://arxiv.org/abs/1808.01371v1

    Recent work has shown how to train Convolutional Neural Networks (CNNs) rapidly on large image datasets, then transfer the knowledge gained from these models to a variety of tasks. Following [Radford 2017], in this work, we demonstrate similar scalability and transfer for Recurrent Neural Networks (RNNs) for Natural Language tasks. By utilizing mixed precision arithmetic and a 32k batch size distributed across 128 NVIDIA Tesla V100 GPUs, we are able to train a character-level 4096-dimension multiplicative LSTM (mLSTM) for unsupervised text reconstruction over 3 epochs of the 40 GB Amazon Reviews dataset in four hours. This runtime compares favorably with previous work taking one month to train the same size and configuration for one epoch over the same dataset. Converging large batch RNN models can be challenging. Recent work has suggested scaling the learning rate as a function of batch size, but we find that simply scaling the learning rate as a function of batch size leads either to significantly worse convergence or immediate divergence for this problem. We provide a learning rate schedule that allows our model to converge with a 32k batch size. Since our model converges over the Amazon Reviews dataset in hours, and our compute requirement of 128 Tesla V100 GPUs, while substantial, is commercially available, this work opens up large scale unsupervised NLP training to most commercial applications and deep learning researchers. A model can be trained over most public or private text datasets overnight.

    • [cs.LG]Learning disentangled representation from 12-lead electrograms: application in localizing the origin of Ventricular Tachycardia
    Prashnna K Gyawali, B. Milan Horacek, John L. Sapp, Linwei Wang
    http://arxiv.org/abs/1808.01524v1

    The increasing availability of electrocardiogram (ECG) data has motivated the use of data-driven models for automating various clinical tasks based on ECG data. The development of subject-specific models are limited by the cost and difficulty of obtaining sufficient training data for each individual. The alternative of population model, however, faces challenges caused by the significant inter-subject variations within the ECG data. We address this challenge by investigating for the first time the problem of learning representations for clinically-informative variables while disentangling other factors of variations within the ECG data. In this work, we present a conditional variational autoencoder (VAE) to extract the subject-specific adjustment to the ECG data, conditioned on task-specific representations learned from a deterministic encoder. To encourage the representation for inter-subject variations to be independent from the task-specific representation, maximum mean discrepancy is used to match all the moments between the distributions learned by the VAE conditioning on the code from the deterministic encoder. The learning of the task-specific representation is regularized by a weak supervision in the form of contrastive regularization. We apply the proposed method to a novel yet important clinical task of classifying the origin of ventricular tachycardia (VT) into pre-defined segments, demonstrating the efficacy of the proposed method against the standard VAE.

    • [cs.LG]Missing Value Imputation Based on Deep Generative Models
    Hongbao Zhang, Pengtao Xie, Eric Xing
    http://arxiv.org/abs/1808.01684v1

    Missing values widely exist in many real-world datasets, which hinders the performing of advanced data analytics. Properly filling these missing values is crucial but challenging, especially when the missing rate is high. Many approaches have been proposed for missing value imputation (MVI), but they are mostly heuristics-based, lacking a principled foundation and do not perform satisfactorily in practice. In this paper, we propose a probabilistic framework based on deep generative models for MVI. Under this framework, imputing the missing entries amounts to seeking a fixed-point solution between two conditional distributions defined on the missing entries and latent variables respectively. These distributions are parameterized by deep neural networks (DNNs) which possess high approximation power and can capture the nonlinear relationships between missing entries and the observed values. The learning of weight parameters of DNNs is performed by maximizing an approximation of the log-likelihood of observed values. We conducted extensive evaluation on 13 datasets and compared with 11 baselines methods, where our methods largely outperforms the baselines.

    • [cs.LG]Multi-objective optimization to explicitly account for model complexity when learning Bayesian Networks
    Paolo Cazzaniga, Marco S. Nobile, Daniele Ramazzotti
    http://arxiv.org/abs/1808.01345v1

    Bayesian Networks have been widely used in the last decades in many fields, to describe statistical dependencies among random variables. In general, learning the structure of such models is a problem with considerable theoretical interest that still poses many challenges. On the one hand, this is a well-known NP-complete problem, which is practically hardened by the huge search space of possible solutions. On the other hand, the phenomenon of I-equivalence, i.e., different graphical structures underpinning the same set of statistical dependencies, may lead to multimodal fitness landscapes further hindering maximum likelihood approaches to solve the task. Despite all these difficulties, greedy search methods based on a likelihood score coupled with a regularization term to account for model complexity, have been shown to be surprisingly effective in practice. In this paper, we consider the formulation of the task of learning the structure of Bayesian Networks as an optimization problem based on a likelihood score. Nevertheless, our approach do not adjust this score by means of any of the complexity terms proposed in the literature; instead, it accounts directly for the complexity of the discovered solutions by exploiting a multi-objective optimization procedure. To this extent, we adopt NSGA-II and define the first objective function to be the likelihood of a solution and the second to be the number of selected arcs. We thoroughly analyze the behavior of our method on a wide set of simulated data, and we discuss the performance considering the goodness of the inferred solutions both in terms of their objective functions and with respect to the retrieved structure. Our results show that NSGA-II can converge to solutions characterized by better likelihood and less arcs than classic approaches, although paradoxically frequently characterized by a lower similarity to the target network.

    • [cs.LG]NIMFA: A Python Library for Nonnegative Matrix Factorization
    Marinka Zitnik, Blaz Zupan
    http://arxiv.org/abs/1808.01743v1

    NIMFA is an open-source Python library that provides a unified interface to nonnegative matrix factorization algorithms. It includes implementations of state-of-the-art factorization methods, initialization approaches, and quality scoring. It supports both dense and sparse matrix representation. NIMFA's component-based implementation and hierarchical design should help the users to employ already implemented techniques or design and code new strategies for matrix factorization tasks.

    • [cs.LG]Regret Bounds for Reinforcement Learning via Markov Chain Concentration
    Ronald Ortner
    http://arxiv.org/abs/1808.01813v1

    We give a simple optimistic algorithm for which it is easy to derive regret bounds of \tilde{O}(\sqrt{t_{\rm mix} SAT}) after T steps in uniformly ergodic MDPs with S states, A actions, and mixing time parameter t_{\rm mix}. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.

    • [cs.LG]Structured Adversarial Attack: Towards General Implementation and Better Interpretability
    Kaidi Xu, Sijia Liu, Pu Zhao, Pin-Yu Chen, Huan Zhang, Deniz Erdogmus, Yanzhi Wang, Xue Lin
    http://arxiv.org/abs/1808.01664v1

    When generating adversarial examples to attack deep neural networks (DNNs), \ell_p norm of the added perturbation is usually used to measure the similarity between original image and adversarial example. However, such adversarial attacks may fail to capture key infomation hidden in the input. This work develops a more general attack model i.e., the structured attack that explores group sparsity in adversarial perturbations by sliding a mask through images aiming for extracting key structures. An ADMM (alternating direction method of multipliers)-based framework is proposed that can split the original problem into a sequence of analytically solvable subproblems and can be generalized to implement other state-of-the-art attacks. Strong group sparsity is achieved in adversarial perturbations even with the same level of distortion in terms of \ell_p norm as the state-of-the-art attacks. Extensive experimental results on MNIST, CIFAR-10 and ImageNet show that our attack could be much stronger (in terms of smaller \ell_0 distortion) than the existing ones, and its better interpretability from group sparse structures aids in uncovering the origins of adversarial examples.

    • [cs.LG]Using Machine Learning Safely in Automotive Software: An Assessment and Adaption of Software Process Requirements in ISO 26262
    Rick Salay, Krzysztof Czarnecki
    http://arxiv.org/abs/1808.01614v1

    The use of machine learning (ML) is on the rise in many sectors of software development, and automotive software development is no different. In particular, Advanced Driver Assistance Systems (ADAS) and Automated Driving Systems (ADS) are two areas where ML plays a significant role. In automotive development, safety is a critical objective, and the emergence of standards such as ISO 26262 has helped focus industry practices to address safety in a systematic and consistent way. Unfortunately, these standards were not designed to accommodate technologies such as ML or the type of functionality that is provided by an ADS and this has created a conflict between the need to innovate and the need to improve safety. In this report, we take steps to address this conflict by doing a detailed assessment and adaption of ISO 26262 for ML, specifically in the context of supervised learning. First we analyze the key factors that are the source of the conflict. Then we assess each software development process requirement (Part 6 of ISO 26262) for applicability to ML. Where there are gaps, we propose new requirements to address the gaps. Finally we discuss the application of this adapted and extended variant of Part 6 to ML development scenarios.

    • [cs.LG]code2seq: Generating Sequences from Structured Representations of Code
    Uri Alon, Omer Levy, Eran Yahav
    http://arxiv.org/abs/1808.01400v1

    The ability to generate natural language sequences from source code snippets can be used for code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine translation (NMT), have achieved state-of-the-art performance on these tasks by treating source code as a sequence of tokens. We present {\rm {\scriptsize CODE2SEQ}}: an alternative approach that leverages the syntactic structure of programming languages to better encode source code. Our model represents a code snippet as the set of paths in its abstract syntax tree (AST) and uses attention to select the relevant paths during decoding, much like contemporary NMT models. We demonstrate the effectiveness of our approach for two tasks, two programming languages, and four datasets of up to 16M examples. Our model significantly outperforms previous models that were specifically designed for programming languages, as well as general state-of-the-art NMT models.

    • [cs.NE]A Cooperative Group Optimization System
    Xiao-Feng Xie, Jiming Liu, Zun-Jing Wang
    http://arxiv.org/abs/1808.01342v1

    A cooperative group optimization (CGO) system is presented to implement CGO cases by integrating the advantages of the cooperative group and low-level algorithm portfolio design. Following the nature-inspired paradigm of a cooperative group, the agents not only explore in a parallel way with their individual memory, but also cooperate with their peers through the group memory. Each agent holds a portfolio of (heterogeneous) embedded search heuristics (ESHs), in which each ESH can drive the group into a stand-alone CGO case, and hybrid CGO cases in an algorithmic space can be defined by low-level cooperative search among a portfolio of ESHs through customized memory sharing. The optimization process might also be facilitated by a passive group leader through encoding knowledge in the search landscape. Based on a concrete framework, CGO cases are defined by a script assembling over instances of algorithmic components in a toolbox. A multilayer design of the script, with the support of the inherent updatable graph in the memory protocol, enables a simple way to address the challenge of accumulating heterogeneous ESHs and defining customized portfolios without any additional code. The CGO system is implemented for solving the constrained optimization problem with some generic components and only a few domain-specific components. Guided by the insights from algorithm portfolio design, customized CGO cases based on basic search operators can achieve competitive performance over existing algorithms as compared on a set of commonly-used benchmark instances. This work might provide a basic step toward a user-oriented development framework, since the algorithmic space might be easily evolved by accumulating competent ESHs.

    • [cs.NE]Geared Rotationally Identical and Invariant Convolutional Neural Network Systems
    ShihChung B. Lo, Ph. D., Matthew T. Freedman, M. D., Seong K. Mun, Ph. D., Heang-Ping Chan, Ph. D
    http://arxiv.org/abs/1808.01280v1

    Theorems and techniques to form different types of transformationally invariant processing and to produce the same output quantitatively based on either transformationally invariant operators or symmetric operations have recently been introduced by the authors. In this study, we further propose to compose a geared rotationally identical CNN system (GRI-CNN) with a small angle increment by connecting networks of participated processes at the first flatten layer. Using an ordinary CNN structure as a base, requirements for constructing a GRI-CNN include the use of either symmetric input vector or kernels with an angle increment that can form a complete cycle as a "gearwheel". Four basic GRI-CNN structures were studied. Each of them can produce quantitatively identical output results when a rotation angle of the input vector is evenly divisible by the increment angle of the gear. Our study showed when a rotated input vector does not match to a gear angle, the GRI-CNN can also produce a highly consistent result. With an ultra-fine increment angle (e.g., 1 degree or 0.1 degree), a virtually isotropic CNN system can be constructed.

    • [cs.NE]GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware
    Ananda Samajdar, Parth Mannan, Kartikay Garg, Tushar Krishna
    http://arxiv.org/abs/1808.01363v1

    Modern deep learning systems rely on (a) a hand-tuned neural network topology, (b) massive amounts of labeled training data, and (c) extensive training over large-scale compute resources to build a system that can perform efficient image classification or speech recognition. Unfortunately, we are still far away from implementing adaptive general purpose intelligent systems which would need to learn autonomously in unknown environments and may not have access to some or any of these three components. Reinforcement learning and evolutionary algorithm (EA) based methods circumvent this problem by continuously interacting with the environment and updating the models based on obtained rewards. However, deploying these algorithms on ubiquitous autonomous agents at the edge (robots/drones) demands extremely high energy-efficiency due to (i) tight power and energy budgets, (ii) continuous/lifelong interaction with the environment, (iii) intermittent or no connectivity to the cloud to run heavy-weight processing. To address this need, we present GENESYS, an HW-SW prototype of an EA-based learning system, that comprises a closed loop learning engine called EvE and an inference engine called ADAM. EvE can evolve the topology and weights of neural networks completely in hardware for the task at hand, without requiring hand-optimization or backpropagation training. ADAM continuously interacts with the environment and is optimized for efficiently running the irregular neural networks generated by EvE. GENESYS identifies and leverages multiple unique avenues of parallelism unique to EAs that we term 'gene'- level parallelism, and 'population'-level parallelism. We ran GENESYS with a suite of environments from OpenAI gym and observed 2-5 orders of magnitude higher energy-efficiency over state-of-the-art embedded and desktop CPU and GPU systems.

    • [cs.NE]On Optimizing Deep Convolutional Neural Networks by Evolutionary Computing
    M. U. B. Dias, D. D. N. De Silva, S. Fernando
    http://arxiv.org/abs/1808.01766v1

    Optimization for deep networks is currently a very active area of research. As neural networks become deeper, the ability in manually optimizing the network becomes harder. Mini-batch normalization, identification of effective respective fields, momentum updates, introduction of residual blocks, learning rate adoption, etc. have been proposed to speed up the rate of convergent in manual training process while keeping the higher accuracy level. However, the problem of finding optimal topological structure for a given problem is becoming a challenging task need to be addressed immediately. Few researchers have attempted to optimize the network structure using evolutionary computing approaches. Among them, few have successfully evolved networks with reinforcement learning and long-short-term memory. A very few has applied evolutionary programming into deep convolution neural networks. These attempts are mainly evolved the network structure and then subsequently optimized the hyper-parameters of the network. However, a mechanism to evolve the deep network structure under the techniques currently being practiced in manual process is still absent. Incorporation of such techniques into chromosomes level of evolutionary computing, certainly can take us to better topological deep structures. The paper concludes by identifying the gap between evolutionary based deep neural networks and deep neural networks. Further, it proposes some insights for optimizing deep neural networks using evolutionary computing techniques.

    • [cs.RO]Momentum-Based Topology Estimation of Articulated Objects
    Yeshasvi Tirupachuri, Silvio Traversaro, Francesco Nori, Daniele Pucci
    http://arxiv.org/abs/1808.01639v1

    Articulated objects like doors, drawers, valves, and tools are pervasive in our everyday unstructured dynamic environments. Articulation models describe the joint nature between the different parts of an articulated object. As most of these objects are passive, a robot has to interact with them to infer all the articulation models to understand the object topology. We present an algorithm to estimate the inherent articulation models by exploiting the momentum of the articulated system and the interaction wrench while manipulating. The proposed algorithm can work with any degrees of freedom objects. We validate our approach with experiments in a simulation environment that is noisy and demonstrate that we can estimate articulation models.

    • [cs.RO]Nonlinear disturbance attenuation control of hydraulic robotics
    Peng Lu, Timothy Sandy, Jonas Buchli
    http://arxiv.org/abs/1808.01445v1

    This paper presents a novel nonlinear disturbance rejection control for hydraulic robots. This method requires two third-order filters as well as inverse dynamics in order to estimate the disturbances. All the parameters for the third-order filters are pre-defined. The proposed method is nonlinear, which does not require the linearization of the rigid body dynamics. The estimated disturbances are used by the nonlinear controller in order to achieve disturbance attenuation. The performance of the proposed approach is compared with existing approaches. Finally, the tracking performance and robustness of the proposed approach is validated extensively on real hardware by performing different tasks under either internal or both internal and external disturbances. The experimental results demonstrate the robustness and superior tracking performance of the proposed approach.

    • [cs.SD]Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data
    Yuanbo Hou, Qiuqiang Kong, Shengchen Li
    http://arxiv.org/abs/1808.01935v1

    Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequential labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a Convolutional Recurrent Neural Network followed by a Connectionist Temporal Classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an Area Under Curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with Max Pooling and Average Pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip.

    • [cs.SI]CredSaT: Credibility Ranking of Users in Big Social Data incorporating Semantic Analysis and Temporal Factor
    Bilal Abu-Salih, P. Wongthongtham, KY Chan, Z. Dengya
    http://arxiv.org/abs/1808.01413v1

    The widespread use of big social data has pointed the research community in several significant directions. In particular, the notion of social trust has attracted a great deal of attention from information processors | computer scientists and information consumers | formal organizations. This is evident in various applications such as recommendation systems, viral marketing and expertise retrieval. Hence, it is essential to have frameworks that can temporally measure users credibility in all domains categorised under big social data. This paper presents CredSaT (Credibility incorporating Semantic analysis and Temporal factor): a fine-grained users credibility analysis framework for big social data. A novel metric that includes both new and current features, as well as the temporal factor, is harnessed to establish the credibility ranking of users. Experiments on real-world dataset demonstrate the effectiveness and applicability of our model to indicate highly domain-based trustworthy users. Further, CredSaT shows the capacity in capturing spammers and other anomalous users.

    • [cs.SY]Bionic Reflex Control Strategy for Robotic Finger with Kinematic Constraints
    Narkhede Kunal Sanjay, Shyamanta M. Hazarika
    http://arxiv.org/abs/1808.02000v1

    This paper presents a bionic reflex control strategy for a kinematically constrained robotic finger. Here, the bionic reflex is achieved through a force tracking impedance control strategy. The dynamic model of the finger is reduced subject to kinematic constraints. Thereafter, an impedance control strategy that allows exact tracking of forces is discussed. Simulation results for a single finger holding a rectangular object against a flat surface are presented. Bionic reflex response time is of the order of milliseconds.

    • [eess.AS]Triplet Network with Attention for Speaker Diarization
    Huan Song, Megan Willi, Jayaraman J. Thiagarajan, Visar Berisha, Andreas Spanias
    http://arxiv.org/abs/1808.01535v1

    In automatic speech processing systems, speaker diarization is a crucial front-end component to separate segments from different speakers. Inspired by the recent success of deep neural networks (DNNs) in semantic inferencing, triplet loss-based architectures have been successfully used for this problem. However, existing work utilizes conventional i-vectors as the input representation and builds simple fully connected networks for metric learning, thus not fully leveraging the modeling power of DNN architectures. This paper investigates the importance of learning effective representations from the sequences directly in metric learning pipelines for speaker diarization. More specifically, we propose to employ attention models to learn embeddings and the metric jointly in an end-to-end fashion. Experiments are conducted on the CALLHOME conversational speech corpus. The diarization results demonstrate that, besides providing a unified model, the proposed approach achieves improved performance when compared against existing approaches.

    • [eess.SP]Effective Resource Sharing in Mobile-Cell Environments
    Shan Jaffry, Syed Faraz Hasan, Xiang Gui
    http://arxiv.org/abs/1808.01700v1

    The mobile users on board vehicles often experience low quality of service due to the vehicular penetration effect, especially at the cell edges. The so-called mobile-cells are installed inside public transport vehicles to serve the commuters. On one end, the mobile-cells have a wireless backhaul connection with the nearest base station, and on the other, they connect wirelessly to the in-vehicle users over access links. This paper integrates the mobile-cells within the cellular networks by reusing their sub-channels. Firstly, this paper proposes an algorithm that allows spectrum sharing for access-link with out-of-vehicle cellular users or MC's backhaul-links. Secondly, it proposes a scheme for controlling the transmit power over the access link to mitigate interference to the backhaul-link, while maintaining high link quality for in-vehicle users.

    • [eess.SP]Spatial Deep Learning for Wireless Scheduling
    Wei Cui, Kaiming Shen, Wei Yu
    http://arxiv.org/abs/1808.01486v1

    The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource and computationally intensive, because channel estimation is expensive in dense networks; further, finding even a locally optimal solution of the resulting optimization problem may be computationally complex. This paper shows that by using a deep learning approach, it is possible to bypass channel estimation and to schedule links efficiently based solely on the geographic locations of transmitters and receivers. This is accomplished by using locally optimal schedules generated using a fractional programming method for randomly deployed device-to-device networks as training data, and by using a novel neural network architecture that takes the geographic spatial convolutions of the interfering or interfered neighboring nodes as input over multiple feedback stages to learn the optimum solution. The resulting neural network gives near-optimal performance for sum-rate maximization and is capable of generalizing to larger deployment areas and to deployments of different link densities. Finally, this paper proposes a novel scheduling approach that utilizes the sum-rate optimal scheduling heuristics over judiciously chosen subsets of links to provide fair scheduling across the network.

    • [math.PR]About the Stein equation for the generalized inverse Gaussian and Kummer distributions
    Essomanda Konzou, Angelo Koudou
    http://arxiv.org/abs/1808.01781v1

    We propose a Stein characterization of the Kummer distribution on (0, \infty). This result follows from our observation that the density of the Kummer distribution satisfies a certain differential equation, leading to a solution of the related Stein equation. A bound is derived for the solution, under a condition on the parameters. The derivation of this bound is carried out using the same framework as in Gaunt 2017 [A Stein characterisation of the generalized hyper-bolic distribution. ESAIM: Probability and Statistics, 21, 303--316] in the case of the generalized inverse Gaussian distribution, which we revisit by correcting a minor error in the latter paper.

    • [math.PR]Beyond the Central Limit Theorem: Universal and Non-universal Simulations of Random Variables by General Mappings
    Lei Yu
    http://arxiv.org/abs/1808.01750v1

    The Central Limit Theorem states that a standard Gaussian random variable can be simulated within any level of approximation error (measured by the Kolmogorov-Smirnov distance) from an i.i.d. real-valued random vector X^{n}\sim P_{X}^{n} by a normalized sum mapping (as n\to\infty). Moreover given the mean and variance of X, this linear function is independent of the distribution P_{X}. Such simulation problems (in which the simulation mappings are independent of P_{X}, or equivalently P_{X} is unknown a prior) are referred to as being universal. In this paper, we consider both universal and non-universal simulations of random variables with arbitrary target distributions Q_{Y} by general mappings, not limited to linear ones. We derive the fastest convergence rate of the approximation errors for such problems. Interestingly, we show that for discontinuous or absolutely continuous P_{X}, the approximation error for the universal simulation is almost as small as that for the non-universal one; and moreover, for both universal and non-universal simulations, the approximation errors by general mappings are strictly smaller than those by linear mappings. Specifically, for both universal and non-universal simulations, if P_{X} is discontinuous, then the approximation error decays at least exponentially fast as n\to\infty; if P_{X} is absolutely continuous, then only one-dimensional X is sufficient to simulate Y exactly or arbitrarily well. For continuous but not absolutely continuous P_{X}, using a non-universal simulator, one-dimensional X is still sufficient to simulate Y exactly, however using a universal simulator, we only show that the approximation error decays sup-exponentially fast. Furthermore, we also generalize these results to simulation from Markov processes, and simulation of random elements (or general random variables).

    • [math.ST]α-Ball divergence and its applications to change-point problems for Banach-valued sequences
    Qiang Zhang, Wenliang Pan, Xin Chen, Xueqin Wang
    http://arxiv.org/abs/1808.01544v1

    In this paper, we extend a measure of divergence between two distributions: Ball divergence, to a new one: \alpha-Ball divergence. With this new notion, we propose its sample statistic which can be used to test whether two weakly dependent sequences of Banach-valued random vectors have the same distribution. The properties of \alpha-Ball divergence and its sample statistic, as Ball divergence has, are inspected and shown to hold for random sequences which are functionals of some absolutely regular sequences. We further apply the sample statistic to change-point problems for a sequence of weakly dependent Banach-valued observations with multiple possible change-points. Our procedure does not require any assumptions on special change-point type. It could detect the number of change-points as well as their locations. We also prove the consistency of the estimated change-point locations. Extensive simulation studies and analyses of two interesting real data sets about wind direction and bitcoin price illustrate that our procedure has considerable advantages over other existing competitors, especially when observations are non-Euclidean or there are distributional changes in the variance.

    • [math.ST]Bounded Statistics
    Pranava Chaitanya Jayanti, Konstantina Trivisa
    http://arxiv.org/abs/1808.01393v1

    If two probability density functions (PDFs) have values for their first n moments which are quite close to each other (upper bounds of their differences are known), can it be expected that the PDFs themselves are very similar? Shown below is an algorithm to quantitatively estimate this "similarity" between the given PDFs, depending on how many moments one has information about. This method involves the concept of functions behaving "similarly" at certain "length scales", which is also precisely defined. This technique could find use in data analysis, to compare a data set with a PDF or another data set, without having to fit a functional form to the data.

    • [math.ST]Dynamical multiple regression in function spaces, under kernel regressors, with ARH(1) errors
    M. D. Ruiz-Medina, D. Miranda, R. M. Espejo
    http://arxiv.org/abs/1808.01655v1

    A linear multiple regression model in function spaces is formulated, under temporal correlated errors. This formulation involves kernel regressors. A generalized least-squared regression parameter estimator is derived. Its asymptotic normality and strong consistency is obtained, under suitable conditions. The correlation analysis is based on a componentwise estimator of the residual autocorrelation operator. When the dependence structure of the functional error term is unknown, a plug-in generalized least-squared regression parameter estimator is formulated. Its strong-consistency is proved as well. A simulation study is undertaken to illustrate the performance of the presented approach, under different regularity conditions. An application to financial panel data is also considered.

    • [math.ST]Nuisance Parameters Free Changepoint Detection in Non-stationary Series
    Michal Pešta, Martin Wendler
    http://arxiv.org/abs/1808.01905v1

    Detecting abrupt changes in the mean of a time series, so-called changepoints, is important for many applications. However, many procedures rely on the estimation of nuisance parameters (like long-run variance). Under the alternative (a change in mean), estimators might be biased and data-adaptive rules for the choice of tuning parameters might not work as expected. If the data is not stationary, but heteroscedastic, this becomes more challenging. The aim of this paper is to present and investigate two changepoint tests, which involve neither nuisance nor tuning parameters. This is achieved by combing self-normalization and wild bootstrap. We study the asymptotic behavior and show the consistency of the bootstrap under the hypothesis as well as under the alternative, assuming mild conditions on the weak dependence of the time series and allowing the variance to change over time. As a by-product of the proposed tests, a changepoint estimator is introduced and its consistency is proved. The results are illustrated through a simulation study, which demonstrates computational efficiency of the developed methods. The new tests will also be applied to real data examples from finance and hydrology.

    • [math.ST]Prediction in Riemannian metrics derived from divergence functions
    Henryk Gzyl
    http://arxiv.org/abs/1808.01638v1

    Divergence functions are interesting discrepancy measures. Even though they are not true distances, we can use them to measure how separated two points are. Curiously enough, when they are applied to random variables, they lead to a notion of best predictor that coincides with usual best predictor in Euclidean distance. Given a divergence function, we can derive from it a Riemannian metric, which leads to a distance in which means and best predictors do not coincide with their Euclidean counterparts. It is the purpose of this note to study the Riemannian metric derived from the divergence function as well as its use in prediction theory.

    • [math.ST]Sampling-based randomized designs for causal inference under the potential outcomes framework
    Zach Branson, Tirthankar Dasgupta
    http://arxiv.org/abs/1808.01691v1

    We establish the inferential properties of the mean-difference estimator for the average treatment effect in randomized experiments where each unit in a population of interest is randomized to one of two treatments and then units within treatment groups are randomly sampled. The properties of this estimator are well-understood in the experimental design scenario where first units are randomly sampled and then treatment is randomly assigned, but this is not the case for the aforementioned scenario where the sampling and treatment assignment stages are reversed. We find that the mean-difference estimator under this experimental design scenario is more precise than under the sample-first-randomize-second design, but only when there is treatment effect heterogeneity in the population. We also explore to what extent pre-treatment measurements can be used to improve upon the mean-difference estimator for this experimental design.

    • [math.ST]Statistical Windows in Testing for the Initial Distribution of a Reversible Markov Chain
    Quentin Berthet, Varun Kanade
    http://arxiv.org/abs/1808.01857v1

    We study the problem of hypothesis testing between two discrete distributions, where we only have access to samples after the action of a known reversible Markov chain, playing the role of noise. We derive instance-dependent minimax rates for the sample complexity of this problem, and show how its dependence in time is related to the spectral properties of the Markov chain. We show that there exists a wide statistical window, in terms of sample complexity for hypothesis testing between different pairs of initial distributions. We illustrate these results in several concrete examples.

    • [math.ST]Strongly consistent autoregressive predictors in abstract Banach spaces
    MD Ruiz-Medina, J. Alvarez-Liebana
    http://arxiv.org/abs/1808.01659v1

    This work derives new results on strong consistent estimation and prediction for autoregressive processes of order 1 in a separable Banach space B. The consistency results are obtained for the component-wise estimator of the autocorrelation operator in the norm of the space L(B) of bounded linear operators on B. The strong consistency of the associated plug-in predictor then follows in the B-norm. A Gelfand triple is defined through the Hilbert space constructed in Kuelbs (1970)' lemma. A Hilbert--Schmidt embedding introduces the Reproducing Kernel Hilbert space (RKHS), generated by the autocovariance operator, into the Hilbert space conforming the Rigged Hilbert space structure. This paper extends the work of Bosq (2000) and Labbas and Mourid 2002.

    • [quant-ph]Amortized Channel Divergence for Asymptotic Quantum Channel Discrimination
    Mario Berta, Christoph Hirche, Eneet Kaur, Mark M. Wilde
    http://arxiv.org/abs/1808.01498v1

    It is well known that for the discrimination of classical and quantum channels in the finite, non-asymptotic regime, adaptive strategies can give an advantage over non-adaptive strategies. However, Hayashi [IEEE Trans. Inf. Theory 55(8), 3807 (2009)] showed that in the asymptotic regime, the exponential error rate for the discrimination of classical channels is not improved in the adaptive setting. We extend this result in several ways. First, we establish the strong Stein's lemma for classical-quantum channels by showing that asymptotically the exponential error rate for classical-quantum channel discrimination is not improved by adaptive strategies. Second, we recover many other classes of channels for which adaptive strategies do not lead to an asymptotic advantage. Third, we give various converse bounds on the power of adaptive protocols for general asymptotic quantum channel discrimination. Intriguingly, it remains open whether adaptive protocols can improve the exponential error rate for quantum channel discrimination in the asymmetric Stein setting. Our proofs are based on the concept of amortized distinguishability of quantum channels, which we analyse using data-processing inequalities.

    • [quant-ph]One-Shot Coherence Distillation: The Full Story
    Qi Zhao, Yunchao Liu, Xiao Yuan, Eric Chitambar, Andreas Winter
    http://arxiv.org/abs/1808.01885v1

    The resource framework of quantum coherence was introduced by Baumgratz, Cramer and Plenio [PRL 113, 140401 (2014)] and further developed by Winter and Yang [PRL 116, 120404 (2016)]. We consider the one-shot problem of distilling pure coherence from a single instance of a given resource state. Specifically, we determine the distillable coherence with a given fidelity under incoherent operations (IO) through a generalisation of the Winter-Yang protocol. This is compared to the distillable coherence under maximal incoherent operations (MIO) and dephasing-covariant incoherent operations (DIO), which can be cast as a semidefinite programme, that has been presented previously by Regula et al. [PRL 121, 010401 (2018)]. Our results are given in terms of a smoothed min-relative entropy distance from the incoherent set of states, and a variant of the hypothesis-testing relative entropy distance, respectively. The one-shot distillable coherence is also related to one-shot randomness extraction. Moreover, from the one-shot formulas under IO, MIO, DIO, we can recover the optimal distillable rate in the many-copy asymptotics, yielding the relative entropy of coherence. These results can be compared with previous work by some of the present authors [Zhao et al., PRL 120, 070403 (2018)] on one-shot coherence formation under IO, MIO, DIO and also SIO. This shows that the amount of distillable coherence is essentially the same for IO, DIO, and MIO, despite the fact that the three classes of operations are very different. We also relate the distillable coherence under strictly incoherent operations (SIO) to a constrained hypothesis testing problem and explicitly show the existence of bound coherence under SIO in the asymptotic regime.

    • [stat.AP]An Extreme Value Analysis of the Urban Skyline
    Jonathan Auerbach, Phyllis Wan
    http://arxiv.org/abs/1808.01514v1

    The world's urban population is expected to grow fifty percent by the year 2050 and exceed six billion. The major challenges confronting cities, such as sustainability, safety, and equality, will depend on the infrastructure developed to accommodate the increase. Urban planners have long debated the consequences of vertical growth, the concentration of residents by constructing tall buildings, over horizontal growth, the dispersal of residents by expanding urban boundaries. Yet relatively little work has modeled the vertical growth of cities if present trends continue and quantified the likelihood and therefore urgency of these consequences. We regard tall buildings as random exceedances over a threshold and use extreme value analysis to characterize the skyscrapers that will dominate the urban skyline in 2050 if present trends continue. We find forty-one thousand skyscrapers will surpass 150 meters and 40 floors, an increase of eight percent a year, far outpacing the expected urban population growth of two percent a year. The typical tall skyscraper will not be noticeably taller, and the tallest will likely exceed one thousand meters but not one mile. If a mile-high skyscraper is constructed, we predict it will hold fewer occupants than many mile-highs currently designed. We predict roughly three-quarters the number of floors of the Mile-High Tower, two-thirds of Next Tokyo's Sky Mile Tower, and half the floors of Frank Lloyd Wright's The Illinois, three prominent plans of the mile-high skyscraper vision. However, we anticipate the relationship between floor and height will vary considerably across cities.

    • [stat.AP]Associating Growth in Infancy and Cognitive Performance in Early Childhood: A functional data analysis approach
    Pantelis Z. Hadjipantelis, Kyunghee Han, Jane-Ling Wang, Seungmi Yang, Richard M. Martin, Michael S. Kramer, Emily Oken, Hans-Georg Müller
    http://arxiv.org/abs/1808.01384v1

    Physical growth traits can be naturally represented by continuous functions. In a large dataset of infancy growth patterns, we develop a practical approach to infer statistical associations between growth-trajectories and IQ performance in early childhood. The main objective of this study is to show how to assess physical growth curves and detect if particular infancy growth patterns are associated with differences in IQ (Full-scale WASI scores) in later ages using a semi-parametric functional response model. Additionally, we investigate the association between different growth measurements in terms of their cross-correlation with each other, their correlation with later IQ, as well as their time-varying dynamics. This analysis framework can easily incorporate or learn population information in a non-parametric way, rendering the existence of prior population charts partially redundant.

    • [stat.AP]Computationally efficient model selection for joint spikes and waveforms decoding
    Francesca Matano, Valérie Ventura
    http://arxiv.org/abs/1808.01693v1

    A recent paradigm for decoding behavioral variables or stimuli from neuron ensembles relies on joint models for electrode spike trains and their waveforms, which, in principle, is more efficient than decoding from electrode spike trains alone or from sorted neuron spike trains. In this paper, we decode the velocity of arm reaches of a rhesus macaque monkey to show that including waveform features indiscriminately in a joint decoding model can contribute more noise and bias than useful information about the kinematics, and thus degrade decoding performance. We also show that selecting which waveform features should enter the model to lower the prediction risk can boost decoding performance substantially. For the data analyzed here, a stepwise search for a low risk electrode spikes and waveforms joint model yielded a low risk Bayesian model that is 30% more efficient than the corresponding risk minimized Bayesian model based on electrode spike trains alone. The joint model was also comparably efficient to decoding from a risk minimized model based only on sorted neuron spike trains and hash, confirming previous results that one can do away with the problematic spike sorting step in decoding applications. We were able to search for low risk joint models through a large model space thanks to a short cut formula, which accelerates large matrix inversions in stepwise searches for models based on Gaussian linear observation equations.

    • [stat.AP]Spline Regression with Automatic Knot Selection
    Vivien Goepp, Olivier Bouaziz, Grégory Nuel
    http://arxiv.org/abs/1808.01770v1

    In this paper we introduce a new method for automatically selecting knots in spline regression. The approach consists in setting a large number of initial knots and fitting the spline regression through a penalized likelihood procedure called adaptive ridge. The proposed method is similar to penalized spline regression methods (e.g. P-splines), with the noticeable difference that the output is a sparse spline regression with a small number of knots. We show that our method called A-spline, for adaptive splines yields sparse regression models with high interpretability, while having similar predictive performance similar to penalized spline regression methods. A-spline is applied both to simulated and real dataset. A fast and publicly available implementation in R is provided along with this paper.

    • [stat.ME]A hierarchical independent component analysis model for longitudinal Neuroimaging studies
    Yikai Wang, Ying Guo
    http://arxiv.org/abs/1808.01557v1

    In recent years, longitudinal neuroimaging study has become increasingly popular in neuroscience research to investigate disease-related changes in brain functions. In current neuroscience literature, one of the most commonly used tools to extract and characterize brain functional networks is independent component analysis (ICA). However, existing ICA methods are not suited for modelling repeatedly measured imaging data. In this paper, we propose a novel longitudinal independent component model (L-ICA) which provides a formal modeling framework for extending ICA to longitudinal studies. By incorporating subject-specific random effects and visit-specific covariate effects, L-ICA is able to provide more accurate estimates of changes in brain functional networks on both the population- and individual-level, borrow information across repeated scans within the same subject to increase statistical power in detecting covariate effects on the networks, and allow for model-based prediction for brain networks changes caused by disease progression, treatment or neurodevelopment. We develop a fully traceable exact EM algorithm to obtain maximum likelihood estimates of L-ICA. We further develop a subspace-based approximate EM algorithm which greatly reduce the computation time while still retaining high accuracy. Moreover, we present a statistical testing procedure for examining covariate effects on brain network changes. Simulation results demonstrate the advantages of our proposed methods. We apply L-ICA to ADNI2 study to investigate changes in brain functional networks in Alzheimer disease. Results from the L-ICA provide biologically insightful findings which are not revealed using existing methods.

    • [stat.ME]Diffusion approximations and control variates for MCMC
    Nicolas Brosse, Alain Durmus, Sean Meyn, Eric Moulines
    http://arxiv.org/abs/1808.01665v1

    A new methodology is presented for the construction of control variates to reduce the variance of additive functionals of Markov Chain Monte Carlo (MCMC) samplers. Our control variates are defined as linear combinations of functions whose coefficients are obtained by minimizing a proxy for the asymptotic variance. The construction is theoretically justified by two new results. We first show that the asymptotic variances of some well-known MCMC algorithms, including the Random Walk Metropolis and the (Metropolis) Unadjusted/Adjusted Langevin Algorithm, are close to the asymptotic variance of the Langevin diffusion. Second, we provide an explicit representation of the optimal coefficients minimizing the asymptotic variance of the Langevin diffusion. Several examples of Bayesian inference problems demonstrate that the corresponding reduction in the variance is significant, and that in some cases it can be dramatic.

    • [stat.ME]Improved Estimation of Average Treatment Effects on the Treated: Local Efficiency, Double Robustness, and Beyond
    Heng Shu, Zhiqiang Tan
    http://arxiv.org/abs/1808.01408v1

    Estimation of average treatment effects on the treated (ATT) is an important topic of causal inference in econometrics and statistics. This problem seems to be often treated as a simple modification or extension of that of estimating overall average treatment effects (ATE). However, the propensity score is no longer ancillary for estimation of ATT, in contrast with estimation of ATE. In this article, we review semiparametric theory for estimation of ATT and the use of efficient influence functionsto derive augmented inverse probability weighted (AIPW) estimators that are locally efficient and doubly robust. Moreover, we discuss improved estimation over AIPW by developing calibrated regression and likelihood estimators that are not only locally efficient and doubly robust, but also intrinsically efficient in achieving smaller variances than AIPW estimators when a propensity score model is correctly specified but an outcome regression model may be misspecified. Finally, we present two simulation studies and an econometric application to demonstrate the advantage of the proposed methods when compared with existing methods.

    • [stat.ME]Inverse Conditional Probability Weighting with Clustered Data in Causal Inference
    Zhulin He
    http://arxiv.org/abs/1808.01647v1

    Estimating the average treatment causal effect in clustered data often involves dealing with unmeasured cluster-specific confounding variables. Such variables may be correlated with the measured unit covariates and outcome. When the correlations are ignored, the causal effect estimation can be biased. By utilizing sufficient statistics, we propose an inverse conditional probability weighting (ICPW) method, which is robust to both (i) the correlation between the unmeasured cluster-specific confounding variable and the covariates and (ii) the correlation between the unmeasured cluster-specific confounding variable and the outcome. Assumptions and conditions for the ICPW method are presented. We establish the asymptotic properties of the proposed estimators. Simulation studies and a case study are presented for illustration.

    • [stat.ME]Regularized matrix data clustering and its application to image analysis
    Xu Gao, Weining Shen, Hernando Ombao
    http://arxiv.org/abs/1808.01749v1

    In this paper, we propose a regularized mixture probabilistic model to cluster matrix data and apply it to brain signals. The approach is able to capture the sparsity (low rank, small/zero values) of the original signals by introducing regularization terms into the likelihood function. Through a modified EM algorithm, our method achieves the optimal solution with low computational cost. Theoretical results are also provided to establish the consistency of the proposed estimators. Simulations show the advantages of the proposed method over other existing methods. We also apply the approach to two real datasets from different experiments. Promising results imply that the proposed method successfully characterizes signals with different patterns while yielding insightful scientific interpretation.

    • [stat.ML]Multi-Objective Cognitive Model: a supervised approach for multi-subject fMRI analysis
    Muhammad Yousefnezhad, Daoqiang Zhang
    http://arxiv.org/abs/1808.01642v1

    In order to decode the human brain, Multivariate Pattern (MVP) classification generates cognitive models by using functional Magnetic Resonance Imaging (fMRI) datasets. As a standard pipeline in the MVP analysis, brain patterns in multi-subject fMRI dataset must be mapped to a shared space and then a classification model is generated by employing the mapped patterns. However, the MVP models may not provide stable performance on a new fMRI dataset because the standard pipeline uses disjoint steps for generating these models. Indeed, each step in the pipeline includes an objective function with independent optimization approach, where the best solution of each step may not be optimum for the next steps. For tackling the mentioned issue, this paper introduces the Multi-Objective Cognitive Model (MOCM) that utilizes an integrated objective function for MVP analysis rather than just using those disjoint steps. For solving the integrated problem, we proposed a customized multi-objective optimization approach, where all possible solutions are firstly generated, and then our method ranks and selects the robust solutions as the final results. Empirical studies confirm that the proposed method can generate superior performance in comparison with other techniques.

    • [stat.ML]V-FCNN: Volumetric Fully Convolution Neural Network For Automatic Atrial Segmentation
    Nicoló Savioli, Giovanni Montana, Pablo Lamata
    http://arxiv.org/abs/1808.01944v1

    Atrial Fibrillation (AF) is a common electro-physiological cardiac disorder that causes changes in the anatomy of the atria. A better characterization of these changes is desirable for the definition of clinical biomarkers, and thus there is a need of its fully automatic segmentation from clinical images. In this work we present an architecture based in 3D-convolution kernels, a Volumetric Fully Convolution Neural Network (V-FCNN), able to segment the entire volume in one-shot, and consequently integrate the implicit spatial redundancy present in high resolution images. A loss function based on the mixture of both Mean Square Error (MSE) and Dice Loss (DL) is used, in an attempt to combine the ability to capture the bulk shape and the reduction of local errors products by over segmentation. Results demonstrate a reasonable performance in the middle region of the atria, and the impact of the challenges of capturing the variability of the pulmonary veins or the identification of the valve plane that separates the atria to the ventricle.

    相关文章

      网友评论

        本文标题:今日学术视野(2018.8.8)

        本文链接:https://www.haomeiwen.com/subject/vkzzvftx.html