cs.AI - 人工智能
cs.CE - 计算工程、 金融和科学
cs.CL - 计算与语言
cs.CR - 加密与安全
cs.CV - 机器视觉与模式识别
cs.CY - 计算与社会
cs.DC - 分布式、并行与集群计算
cs.GR - 计算机图形学
cs.HC - 人机接口
cs.IR - 信息检索
cs.IT - 信息论
cs.LG - 自动学习
cs.RO - 机器人学
cs.SD - 声音处理
econ.EM - 计量经济学
eess.SP - 信号处理
hep-lat - 高能物理晶格
math.OC - 优化与控制
math.PR - 概率
math.ST - 统计理论
physics.chem-ph -化学物理
physics.comp-ph - 计算物理学
physics.soc-ph - 物理学与社会
q-bio.QM - 定量方法
quant-ph - 量子物理
stat.AP - 应用统计
stat.CO - 统计计算
stat.ME - 统计方法论
stat.ML - (统计)机器学习
• [cs.AI]Nintendo Super Smash Bros. Melee: An "Untouchable" Agent
• [cs.AI]Social Emotion Mining Techniques for Facebook Posts Reaction Prediction
• [cs.CE]Shape optimization in laminar flow with a label-guided variational autoencoder
• [cs.CL]A Novel Way of Identifying Cyber Predators
• [cs.CL]Aspect Extraction and Sentiment Classification of Mobile Apps using App-Store Reviews
• [cs.CL]Contextualized Word Representations for Reading Comprehension
• [cs.CL]Inducing Interpretability in Knowledge Graph Embeddings
• [cs.CL]Learning Interpretable Spatial Operations in a Rich 3D Blocks World
• [cs.CL]Long-Range Correlation Underlying Childhood Language and Generative Models
• [cs.CL]Modulating and attending the source image during encoding improves Multimodal Translation
• [cs.CL]Multi-Task Learning for Mental Health using Social Media Text
• [cs.CL]On the Benefit of Combining Neural, Statistical and External Features for Fake News Identification
• [cs.CL]Scale Up Event Extraction Learning via Automatic Training Data Generation
• [cs.CL]Stochastic Answer Networks for Machine Reading Comprehension
• [cs.CL]Word Sense Disambiguation with LSTM: Do We Really Need 100 Billion Words?
• [cs.CR]Improving Malware Detection Accuracy by Extracting Icon Information
• [cs.CR]Performance Analysis and Application of Mobile Blockchain
• [cs.CV]3D Facial Expression Reconstruction using Cascaded Regression
• [cs.CV]3D Hand Pose Estimation: From Current Achievements to Future Goals
• [cs.CV]An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification
• [cs.CV]Autonomous UAV Navigation with Domain Adaptation
• [cs.CV]Bayesian Joint Matrix Decomposition for Data Integration with Heterogeneous Noise
• [cs.CV]Can We Teach Computers to Understand Art? Domain Adaptation for Enhancing Deep Networks Capacity to De-Abstract Art
• [cs.CV]CycleGAN Face-off
• [cs.CV]Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2
• [cs.CV]Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review
• [cs.CV]Distributed Mapper
• [cs.CV]Dynamics Transfer GAN: Generating Video by Transferring Arbitrary Temporal Dynamics from a Source Video to a Single Target Image
• [cs.CV]Error Correction for Dense Semantic Image Labeling
• [cs.CV]FHEDN: A based on context modeling Feature Hierarchy Encoder-Decoder Network for face detection
• [cs.CV]Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images
• [cs.CV]Geometry Guided Adversarial Facial Expression Synthesis
• [cs.CV]Learning Nested Sparse Structures in Deep Neural Networks
• [cs.CV]Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing
• [cs.CV]MapNet: Geometry-Aware Learning of Maps for Camera Localization
• [cs.CV]NAG: Network for Adversary Generation
• [cs.CV]SPP-Net: Deep Absolute Pose Regression with Synthetic Views
• [cs.CV]Single-Shot Multi-Person 3D Body Pose Estimation From Monocular RGB Input
• [cs.CV]Sketch Layer Separation in Multi-Spectral Historical Document Images
• [cs.CV]The Effectiveness of Data Augmentation for Detection of Gastrointestinal Diseases from Endoscopical Images
• [cs.CV]Unsupervised Feature Learning for Audio Analysis
• [cs.CV]Using a single RGB frame for real time 3D hand pose estimation in the wild
• [cs.CV]Visual aesthetic analysis using deep neural network: model and techniques to increase accuracy without transfer learning
• [cs.CY]Cogniculture: Towards a Better Human-Machine Co-evolution
• [cs.CY]Fairness in Machine Learning: Lessons from Political Philosophy
• [cs.DC]An Efficient Multi-core Implementation of the Jaya Optimisation Algorithm
• [cs.GR]A Deep Recurrent Framework for Cleaning Motion Capture Data
• [cs.HC]Usability of Humanly Computable Passwords
• [cs.IR]Fast Nearest-Neighbor Classification using RNN in Domains with Large Number of Classes
• [cs.IR]Interactions between Health Searchers and Search Engines
• [cs.IR]Semi-supervised Multimodal Hashing
• [cs.IR]SneakPeek: Interest Mining of Images based on User Interaction
• [cs.IT]A Framework for Optimizing Multi-cell NOMA: Delivering Demand with Less Resource
• [cs.IT]Achieving Private Information Retrieval Capacity in Distributed Storage Using an Arbitrary Linear Code
• [cs.IT]Age Minimization in Energy Harvesting Communications: Energy-Controlled Delays
• [cs.IT]Caching and Coded Delivery over Gaussian Broadcast Channels for Energy Efficiency
• [cs.IT]Compressive Phase Retrieval of Structured Signal
• [cs.IT]Data Aggregation Over Multiple Access Wireless Sensors Network
• [cs.IT]Hybrid Analog-Digital Beamforming for Massive MIMO Systems
• [cs.IT]Low-Latency Multiuser Two-Way Wireless Relaying for Spectral and Energy Efficiencies
• [cs.IT]Multi-cell Massive MIMO Beamforming in Assuring QoS for Large Numbers of Users
• [cs.IT]Multilevel Diversity Coding with Secure Regeneration: Separate Coding Achieves the MBR Point
• [cs.IT]On Stochastic Orders and Fast Fading Multiuser Channels with Statistical CSIT
• [cs.IT]Optimal Odd Arm Identification with Fixed Confidence
• [cs.IT]Optimal locally repairable codes via elliptic curves
• [cs.IT]Overcoming Endurance Issue: UAV-Enabled Communications with Proactive Caching
• [cs.IT]Progressive Bit-Flipping Decoding of Polar Codes Over Layered Critical Sets
• [cs.IT]Short-Packet Two-Way Amplify-and-Forward Relaying
• [cs.IT]Ulam Sphere Size Analysis for Permutation and Multipermutation Codes Correcting Translocation Errors
• [cs.LG]A General Memory-Bounded Learning Algorithm
• [cs.LG]Bayesian Q-learning with Assumed Density Filtering
• [cs.LG]Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent
• [cs.LG]DGCNN: Disordered Graph Convolutional Neural Network Based on the Gaussian Mixture Model
• [cs.LG]Generalized Zero-Shot Learning via Synthesized Examples
• [cs.LG]Gradient Normalization & Depth Based Decay For Deep Learning
• [cs.LG]Learning Modality-Invariant Representations for Speech and Images
• [cs.LG]MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments
• [cs.LG]Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
• [cs.LG]Peephole: Predicting Network Performance Before Training
• [cs.LG]Robust Deep Reinforcement Learning with Adversarial Attacks
• [cs.LG]Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently
• [cs.LG]StrassenNets: Deep learning with a multiplication budget
• [cs.RO]A cable-driven parallel manipulator with force sensing capabilities for high-accuracy tissue endomicroscopy
• [cs.RO]ESD CYCLOPS: A new robotic surgical system for GI surgery
• [cs.RO]Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand Observations and Continuous Control
• [cs.RO]Surgical task-space optimisation of the CYCLOPS robotic system
• [cs.RO]Towards Fully Environment-Aware UAVs: Real-Time Path Planning with Online 3D Wind Field Prediction in Complex Terrain
• [cs.SD]A Cascade Architecture for Keyword Spotting on Mobile Devices
• [cs.SD]Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models
• [econ.EM]A Random Attention Model
• [eess.SP]Identifying the Mislabeled Training Samples of ECG Signals using Machine Learning
• [eess.SP]Noise Level Estimation for Overcomplete Dictionary Learning Based on Tight Asymptotic Bounds
• [eess.SP]Wireless Energy Beamforming Using Signal Strength Feedback
• [hep-lat]Towards reduction of autocorrelation in HMC by machine learning
• [math.OC]A Non-Cooperative Game Approach to Autonomous Racing
• [math.OC]Novel model-based heuristics for energy optimal motion planning of an autonomous vehicle using A*
• [math.PR]Statistical manifolds from optimal transport
• [math.ST]Asymptotically optimal empirical Bayes inference in a piecewise constant sequence model
• [math.ST]False Discovery Control for Pairwise Comparisons - An Asymptotic Solution to Williams, Jones and Tukey's Conjecture
• [math.ST]Finite sample Bernstein - von Mises theorems for functionals and spectral projectors of covariance matrix
• [math.ST]Posterior distribution existence and error control in Banach spaces
• [math.ST]Stochastic Restricted Biased Estimators in misspecified regression model with incomplete prior information
• [math.ST]Testing homogeneity of proportions from sparse binomial data with a large number of groups
• [physics.chem-ph]Reinforced dynamics of large atomic and molecular systems
• [physics.comp-ph]DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics
• [physics.soc-ph]Crowdsourcing accurately and robustly predicts Supreme Court decisions
• [q-bio.QM]Variational auto-encoding of protein sequences
• [quant-ph]A Characterization of Antidegradable Qubit Channels
• [stat.AP]A practical guide and software for analysing pairwise comparison experiments
• [stat.AP]Examining the Effects of Objective Hurricane Risks and Community Resilience on Risk Perceptions of Hurricanes at the County Level in the U.S. Gulf Coast: An Innovative Approach
• [stat.AP]Reliability-centered maintenance: analyzing failure in harvest sugarcane machine using some generalizations of the Weibull distribution
• [stat.CO]Fast nonparametric near-maximum likelihood estimation of a mixing density
• [stat.ME]Analysis-of-marginal-Tail-Means - a new method for robust parameter optimization
• [stat.ME]Comparative analysis of criteria for filtering time series of word usage frequencies
• [stat.ME]Comparing Graph Spectra of Adjacency and Laplacian Matrices
• [stat.ME]Dynamic Mixed Frequency Synthesis for Economic Nowcasting
• [stat.ME]Ensembles of Regularized Linear Models
• [stat.ME]Exceedance as a measure of sparsity
• [stat.ME]Maximum entropy low-rank matrix recovery
• [stat.ML]Capsule Network Performance on Complex Data
• [stat.ML]Causal Inference for Observational Time-Series with Encoder-Decoder Networks
• [stat.ML]Elastic-net regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection
• [stat.ML]Fast Low-Rank Matrix Estimation without the Condition Number
• [stat.ML]Identifiability of Kronecker-structured Dictionaries for Tensor Data
• [stat.ML]On Quadratic Penalties in Elastic Weight Consolidation
• [stat.ML]Sensitivity Analysis for Predictive Uncertainty in Bayesian Neural Networks
• [stat.ML]The PhaseLift for Non-quadratic Gaussian Measurements
• [stat.ML]Variational Inference over Non-differentiable Cardiac Simulators using Bayesian Optimization
·····································
• [cs.AI]Nintendo Super Smash Bros. Melee: An "Untouchable" Agent
Ben Parr, Deepak Dilipkumar, Yuan Liu
http://arxiv.org/abs/1712.03280v1
Nintendo's Super Smash Bros. Melee fighting game can be emulated on modern hardware allowing us to inspect internal memory states, such as character positions. We created an AI that avoids being hit by training using these internal memory states and outputting controller button presses. After training on a month's worth of Melee matches, our best agent learned to avoid the toughest AI built into the game for a full minute 74.6% of the time.
• [cs.AI]Social Emotion Mining Techniques for Facebook Posts Reaction Prediction
Florian Krebs, Bruno Lubascher, Tobias Moers, Pieter Schaap, Gerasimos Spanakis
http://arxiv.org/abs/1712.03249v1
As of February 2016 Facebook allows users to express their experienced emotions about a post by using five so-called `reactions'. This research paper proposes and evaluates alternative methods for predicting these reactions to user posts on public pages of firms/companies (like supermarket chains). For this purpose, we collected posts (and their reactions) from Facebook pages of large supermarket chains and constructed a dataset which is available for other researches. In order to predict the distribution of reactions of a new post, neural network architectures (convolutional and recurrent neural networks) were tested using pretrained word embeddings. Results of the neural networks were improved by introducing a bootstrapping approach for sentiment and emotion mining on the comments for each post. The final model (a combination of neural network and a baseline emotion miner) is able to predict the reaction distribution on Facebook posts with a mean squared error (or misclassification rate) of 0.135.
• [cs.CE]Shape optimization in laminar flow with a label-guided variational autoencoder
Stephan Eismann, Stefan Bartzsch, Stefano Ermon
http://arxiv.org/abs/1712.03599v1
Computational design optimization in fluid dynamics usually requires to solve non-linear partial differential equations numerically. In this work, we explore a Bayesian optimization approach to minimize an object's drag coefficient in laminar flow based on predicting drag directly from the object shape. Jointly training an architecture combining a variational autoencoder mapping shapes to latent representations and Gaussian process regression allows us to generate improved shapes in the two dimensional case we consider.
• [cs.CL]A Novel Way of Identifying Cyber Predators
Dan Liu, Ching Yee Suen, Olga Ormandjieva
http://arxiv.org/abs/1712.03903v1
Recurrent Neural Networks with Long Short-Term Memory cell (LSTM-RNN) have impressive ability in sequence data processing, particularly for language model building and text classification. This research proposes the combination of sentiment analysis, new approach of sentence vectors and LSTM-RNN as a novel way for Sexual Predator Identification (SPI). LSTM-RNN language model is applied to generate sentence vectors which are the last hidden states in the language model. Sentence vectors are fed into another LSTM-RNN classifier, so as to capture suspicious conversations. Hidden state enables to generate vectors for sentences never seen before. Fasttext is used to filter the contents of conversations and generate a sentiment score so as to identify potential predators. The experiment achieves a record-breaking accuracy and precision of 100% with recall of 81.10%, exceeding the top-ranked result in the SPI competition.
• [cs.CL]Aspect Extraction and Sentiment Classification of Mobile Apps using App-Store Reviews
Sharmistha Dey
http://arxiv.org/abs/1712.03430v1
Understanding of customer sentiment can be useful for product development. On top of that if the priorities for the development order can be known, then development procedure become simpler. This work has tried to address this issue in the mobile app domain. Along with aspect and opinion extraction this work has also categorized the extracted aspects ac-cording to their importance. This can help developers to focus their time and energy at the right place.
• [cs.CL]Contextualized Word Representations for Reading Comprehension
Shimi Salant, Jonathan Berant
http://arxiv.org/abs/1712.03609v1
Reading a document and extracting an answer to a question about its content has attracted substantial attention recently, where most work has focused on the interaction between the question and the document. In this work we evaluate the importance of context when the question and the document are each read on their own. We take a standard neural architecture for the task of reading comprehension, and show that by providing rich contextualized word representations from a large language model, and allowing the model to choose between context dependent and context independent word representations, we can dramatically improve performance and reach state-of-the-art performance on the competitive SQuAD dataset.
• [cs.CL]Inducing Interpretability in Knowledge Graph Embeddings
Chandrahas, Tathagata Sengupta, Cibi Pragadeesh, Partha Pratim Talukdar
http://arxiv.org/abs/1712.03547v1
We study the problem of inducing interpretability in KG embeddings. Specifically, we explore the Universal Schema (Riedel et al., 2013) and propose a method to induce interpretability. There have been many vector space models proposed for the problem, however, most of these methods don't address the interpretability (semantics) of individual dimensions. In this work, we study this problem and propose a method for inducing interpretability in KG embeddings using entity co-occurrence statistics. The proposed method significantly improves the interpretability, while maintaining comparable performance in other KG tasks.
• [cs.CL]Learning Interpretable Spatial Operations in a Rich 3D Blocks World
Yonatan Bisk, Kevin J. Shih, Yejin Choi, Daniel Marcu
http://arxiv.org/abs/1712.03463v1
In this paper, we study the problem of mapping natural language instructions to complex spatial actions in a 3D blocks world. We first introduce a new dataset that pairs complex 3D spatial operations to rich natural language descriptions that require complex spatial and pragmatic interpretations such as "mirroring", "twisting", and "balancing". This dataset, built on the simulation environment of Bisk, Yuret, and Marcu (2016), attains language that is significantly richer and more complex, while also doubling the size of the original dataset in the 2D environment with 100 new world configurations and 250,000 tokens. In addition, we propose a new neural architecture that achieves competitive results while automatically discovering an inventory of interpretable spatial operations (Figure 5)
• [cs.CL]Long-Range Correlation Underlying Childhood Language and Generative Models
Kumiko Tanaka-Ishii
http://arxiv.org/abs/1712.03645v1
Long-range correlation, a property of time series exhibiting long-term memory, is mainly studied in the statistical physics domain and has been reported to exist in natural language. Using a state-of-the-art method for such analysis, long-range correlation is first shown to occur in long CHILDES data sets. To understand why, Bayesian generative models of language, originally proposed in the cognitive scientific domain, are investigated. Among representative models, the Simon model was found to exhibit surprisingly good long-range correlation, but not the Pitman-Yor model. Since the Simon model is known not to correctly reflect the vocabulary growth of natural language, a simple new model is devised as a conjunct of the Simon and Pitman-Yor models, such that long-range correlation holds with a correct vocabulary growth rate. The investigation overall suggests that uniform sampling is one cause of long-range correlation and could thus have a relation with actual linguistic processes.
• [cs.CL]Modulating and attending the source image during encoding improves Multimodal Translation
Jean-Benoit Delbrouck, Stéphane Dupont
http://arxiv.org/abs/1712.03449v1
We propose a new and fully end-to-end approach for multimodal translation where the source text encoder modulates the entire visual input processing using conditional batch normalization, in order to compute the most informative image features for our task. Additionally, we propose a new attention mechanism derived from this original idea, where the attention model for the visual input is conditioned on the source text encoder representations. In the paper, we detail our models as well as the image analysis pipeline. Finally, we report experimental results. They are, as far as we know, the new state of the art on three different test sets.
• [cs.CL]Multi-Task Learning for Mental Health using Social Media Text
Adrian Benton, Margaret Mitchell, Dirk Hovy
http://arxiv.org/abs/1712.03538v1
We introduce initial groundwork for estimating suicide risk and mental health in a deep learning framework. By modeling multiple conditions, the system learns to make predictions about suicide risk and mental health at a low false positive rate. Conditions are modeled as tasks in a multi-task learning (MTL) framework, with gender prediction as an additional auxiliary task. We demonstrate the effectiveness of multi-task learning by comparison to a well-tuned single-task baseline with the same number of parameters. Our best MTL model predicts potential suicide attempt, as well as the presence of atypical mental health, with AUC > 0.8. We also find additional large improvements using multi-task learning on mental health tasks with limited training data.
• [cs.CL]On the Benefit of Combining Neural, Statistical and External Features for Fake News Identification
Gaurav Bhatt, Aman Sharma, Shivam Sharma, Ankush Nagpal, Balasubramanian Raman, Ankush Mittal
http://arxiv.org/abs/1712.03935v1
Identifying the veracity of a news article is an interesting problem while automating this process can be a challenging task. Detection of a news article as fake is still an open question as it is contingent on many factors which the current state-of-the-art models fail to incorporate. In this paper, we explore a subtask to fake news identification, and that is stance detection. Given a news article, the task is to determine the relevance of the body and its claim. We present a novel idea that combines the neural, statistical and external features to provide an efficient solution to this problem. We compute the neural embedding from the deep recurrent model, statistical features from the weighted n-gram bag-of-words model and handcrafted external features with the help of feature engineering heuristics. Finally, using deep neural layer all the features are combined, thereby classifying the headline-body news pair as agree, disagree, discuss, or unrelated. We compare our proposed technique with the current state-of-the-art models on the fake news challenge dataset. Through extensive experiments, we find that the proposed model outperforms all the state-of-the-art techniques including the submissions to the fake news challenge.
• [cs.CL]Scale Up Event Extraction Learning via Automatic Training Data Generation
Ying Zeng, Yansong Feng, Rong Ma, Zheng Wang, Rui Yan, Chongde Shi, Dongyan Zhao
http://arxiv.org/abs/1712.03665v1
The task of event extraction has long been investigated in a supervised learning paradigm, which is bound by the number and the quality of the training instances. Existing training data must be manually generated through a combination of expert domain knowledge and extensive human involvement. However, due to drastic efforts required in annotating text, the resultant datasets are usually small, which severally affects the quality of the learned model, making it hard to generalize. Our work develops an automatic approach for generating training data for event extraction. Our approach allows us to scale up event extraction training instances from thousands to hundreds of thousands, and it does this at a much lower cost than a manual approach. We achieve this by employing distant supervision to automatically create event annotations from unlabelled text using existing structured knowledge bases or tables.We then develop a neural network model with post inference to transfer the knowledge extracted from structured knowledge bases to automatically annotate typed events with corresponding arguments in text.We evaluate our approach by using the knowledge extracted from Freebase to label texts from Wikipedia articles. Experimental results show that our approach can generate a large number of high quality training instances. We show that this large volume of training data not only leads to a better event extractor, but also allows us to detect multiple typed events.
• [cs.CL]Stochastic Answer Networks for Machine Reading Comprehension
Xiaodong Liu, Yelong Shen, Kevin Duh, Jianfeng Gao
http://arxiv.org/abs/1712.03556v1
We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension. Compared to previous work such as ReasoNet, the unique feature is the use of a kind of stochastic prediction dropout on the answer module (final layer) of the neural network during the training. We show that this simple trick improves robustness and achieves results competitive to the state-of-the-art on the Stanford Question Answering Dataset (SQuAD).
• [cs.CL]Word Sense Disambiguation with LSTM: Do We Really Need 100 Billion Words?
Minh Le, Marten Postma, Jacopo Urbani
http://arxiv.org/abs/1712.03376v1
Recently, Yuan et al. (2016) have shown the e ectiveness of using Long Short-Term Memory (LSTM) for performing Word Sense Disambiguation (WSD). Their proposed technique outperformed the previous state-of-the-art with several benchmarks, but neither the training data nor the source code was released. This paper presents the results of a reproduction study of this technique using only openly available datasets (GigaWord, SemCore, OMSTI) and software (TensorFlow). From them, it emerged that state-of-the-art results can be obtained with much less data than hinted by Yuan et al. All code and trained models are made freely available.
• [cs.CR]Improving Malware Detection Accuracy by Extracting Icon Information
Pedro Silva, Sepehr Akhavan-Masouleh, Li Li
http://arxiv.org/abs/1712.03483v1
Detecting PE malware files is now commonly approached using statistical and machine learning models. While these models commonly use features extracted from the structure of PE files, we propose that icons from these files can also help better predict malware. We propose an innovative machine learning approach to extract information from icons. Our proposed approach consists of two steps: 1) extracting icon features using summary statics, histogram of gradients (HOG), and a convolutional autoencoder, 2) clustering icons based on the extracted icon features. Using publicly available data and by using machine learning experiments, we show our proposed icon clusters significantly boost the efficacy of malware prediction models. In particular, our experiments show an average accuracy increase of 10% when icon clusters are used in the prediction model.
• [cs.CR]Performance Analysis and Application of Mobile Blockchain
Kongrath Suankaewmanee, Dinh Thai Hoang, Dusit Niyato, Suttinee Sawadsitang, Ping Wang, Zhu Han
http://arxiv.org/abs/1712.03659v1
Mobile security has become more and more important due to the boom of mobile commerce (m-commerce). However, the development of m-commerce is facing many challenges regarding data security problems. Recently, blockchain has been introduced as an effective security solution deployed successfully in many applications in practice, such as, Bitcoin, cloud computing, and Internet-of-Things. However, the blockchain technology has not been adopted and implemented widely in m-commerce because its mining processes usually require to be performed on standard computing units, e.g., computers. Therefore, in this paper, we introduce a new m-commerce application using blockchain technology, namely, MobiChain, to secure transactions in the m-commerce. Especially, in the MobiChain application, the mining processes can be executed efficiently on mobile devices using our proposed Android core module. Through real experiments, we evaluate the performance of the proposed model and show that blockchain will be an efficient security solution for future m-commerce.
• [cs.CV]3D Facial Expression Reconstruction using Cascaded Regression
Fanzi Wu, Songnan Li, Tianhao Zhao, King Ngi Ngan
http://arxiv.org/abs/1712.03491v1
This paper proposes a novel model fitting algorithm for 3D facial expression reconstruction from a single image. Face expression reconstruction from a single image is a challenging task in computer vision. Most state-of-the-art methods fit the input image to a 3D Morphable Model (3DMM). These methods need to solve a stochastic problem and cannot deal with expression and pose variations. To solve this problem, we adopt a 3D face expression model and use a combined feature which is robust to scale, rotation and different lighting conditions. The proposed method applies a cascaded regression framework to estimate parameters for the 3DMM. 2D landmarks are detected and used to initialize the 3D shape and mapping matrices. In each iteration, residues between the current 3DMM parameters and the ground truth are estimated and then used to update the 3D shapes. The mapping matrices are also calculated based on the updated shapes and 2D landmarks. HOG features of the local patches and displacements between 3D landmark projections and 2D landmarks are exploited. Compared with existing methods, the proposed method is robust to expression and pose changes and can reconstruct higher fidelity 3D face shape.
• [cs.CV]3D Hand Pose Estimation: From Current Achievements to Future Goals
Shanxin Yuan, Guillermo Garcia-Hernando, Bjorn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, Junsong Yuan, Xinghao Chen, Guijin Wang, Fan Yang, Kai Akiyama, Yang Wu, Qingfu Wan, Meysam Madadi, Sergio Escalera, Shile Li, Dongheui Lee, Iason Oikonomidis, Antonis Argyros, Tae-Kyun Kim
http://arxiv.org/abs/1712.03917v1
In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate 11 state-of-the-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during object interaction. We analyze the performance of different CNN structures with regard to hand shape, joint visibility, view point and articulation distributions. Our findings include: (1) isolated 3D hand pose estimation achieves low mean errors (10 mm) in the view point range of [40, 150] degrees, but it is far from being solved for extreme view points; (2)3D volumetric representations outperform 2D CNNs, better capturing the spatial structure of the depth data; (3)~Discriminative methods still generalize poorly to unseen hand shapes; (4)~While joint occlusions pose a challenge for most methods, explicit modeling of structure constraints can significantly narrow the gap between errors on visible and occluded joints.
• [cs.CV]An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification
Abien Fred Agarap
http://arxiv.org/abs/1712.03541v1
Convolutional neural networks (CNNs) are similar to "ordinary" neural networks in the sense that they are made up of hidden layers consisting of neurons with "learnable" parameters. These neurons receive inputs, performs a dot product, and then follows it with a non-linearity. The whole network expresses the mapping between raw image pixels and their class scores. Conventionally, the Softmax function is the classifier used at the last layer of this network. However, there have been studies (Alalshekmubarak and Smith, 2013; Agarap, 2017; Tang, 2013) conducted to challenge this norm. The cited studies introduce the usage of linear support vector machine (SVM) in an artificial neural network architecture. This project is yet another take on the subject, and is inspired by (Tang, 2013). Empirical data has shown that the CNN-SVM model was able to achieve a test accuracy of ~99.04% using the MNIST dataset (LeCun, Cortes, and Burges, 2010). On the other hand, the CNN-Softmax was able to achieve a test accuracy of ~99.23% using the same dataset. Both models were also tested on the recently-published Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf, 2017), which is suppose to be a more difficult image classification dataset than MNIST (Zalandoresearch, 2017). This proved to be the case as CNN-SVM reached a test accuracy of ~90.72%, while the CNN-Softmax reached a test accuracy of ~91.86%. The said results may be improved if data preprocessing techniques were employed on the datasets, and if the base CNN model was a relatively more sophisticated than the one used in this study.
• [cs.CV]Autonomous UAV Navigation with Domain Adaptation
Jaeyoon Yoo, Yongjun Hong, Sungrho Yoon
http://arxiv.org/abs/1712.03742v1
Unmanned Aerial Vehicle(UAV) autonomous driving gets popular attention in machine learning field. Especially, autonomous navigation in outdoor environment has been in trouble since acquiring massive dataset of various environments is difficult and environment always changes dynamically. In this paper, we apply domain adaptation with adversarial learning framework to UAV autonomous navigation. We succeed UAV navigation in various courses without assigning corresponding label information to real outdoor images. Also, we show empirical and theoretical results which verify why our approach is feasible.
• [cs.CV]Bayesian Joint Matrix Decomposition for Data Integration with Heterogeneous Noise
Chihao Zhang, Shihua Zhang
http://arxiv.org/abs/1712.03337v1
Matrix decomposition is a popular and fundamental approach in machine learning and data mining. It has been successfully applied into various fields. Most matrix decomposition methods focus on decomposing a data matrix from one single source. However, it is common that data are from different sources with heterogeneous noise. A few of matrix decomposition methods have been extended for such multi-view data integration and pattern discovery. While only few methods were designed to consider the heterogeneity of noise in such multi-view data for data integration explicitly. To this end, we propose a joint matrix decomposition framework (BJMD), which models the heterogeneity of noise by Gaussian distribution in a Bayesian framework. We develop two algorithms to solve this model: one is a variational Bayesian inference algorithm, which makes full use of the posterior distribution; and another is a maximum a posterior algorithm, which is more scalable and can be easily paralleled. Extensive experiments on synthetic and real-world datasets demonstrate that BJMD considering the heterogeneity of noise is superior or competitive to the state-of-the-art methods.
• [cs.CV]Can We Teach Computers to Understand Art? Domain Adaptation for Enhancing Deep Networks Capacity to De-Abstract Art
Mihai Badea, Corneliu Florea, Laura Florea, Constantin Vertan
http://arxiv.org/abs/1712.03727v1
Humans comprehend a natural scene at a single glance; painters and other visual artists, through their abstract representations, stressed this capacity to the limit. The performance of computer vision solutions matched that of humans in many problems of visual recognition. In this paper we address the problem of recognizing the genre (subject) in digitized paintings using Convolutional Neural Networks (CNN) as part of the more general dealing with abstract and/or artistic representation of scenes. Initially we establish the state of the art performance by training a CNN from scratch. In the next level of evaluation, we identify aspects that hinder the CNNs' recognition, such as artistic abstraction. Further, we test various domain adaptation methods that could enhance the subject recognition capabilities of the CNNs. The evaluation is performed on a database of 80,000 annotated digitized paintings, which is tentatively extended with artistic photographs, either original or stylized, in order to emulate artistic representations. Surprisingly, the most efficient domain adaptation is not the neural style transfer. Finally, the paper provides an experiment-based assessment of the abstraction level that CNNs are able to achieve.
• [cs.CV]CycleGAN Face-off
Xiaohan Jin, Ye Qi, Shangxuan Wu
http://arxiv.org/abs/1712.03451v1
Face-off is an interesting case of style transfer where the facial expressions and attributes of one person could be fully transformed to another face. We are interested in the unsupervised training process which only requires two sequences of unaligned video frames from each person and learns what shared attributes to extract automatically. In this project, we explored various improvements for adversarial training (i.e. CycleGAN[Zhu et al., 2017]) to capture details in facial expressions and head poses and thus generate transformation videos of higher consistency and stability.
• [cs.CV]Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2
Federico Baldassarre, Diego González Morín, Lucas Rodés-Guirao
http://arxiv.org/abs/1712.03400v1
We review some of the most recent approaches to colorize gray-scale images using deep learning methods. Inspired by these, we propose a model which combines a deep Convolutional Neural Network trained from scratch with high-level features extracted from the Inception-ResNet-v2 pre-trained model. Thanks to its fully convolutional architecture, our encoder-decoder model can process images of any size and aspect ratio. Other than presenting the training results, we assess the "public acceptance" of the generated images by means of a user study. Finally, we present a carousel of applications on different types of images, such as historical photographs.
• [cs.CV]Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review
Jose Bernal, Kaisar Kushibar, Daniel S. Asfaw, Sergi Valverde, Arnau Oliver, Robert Martí, Xavier Lladó
http://arxiv.org/abs/1712.03747v1
In recent years, deep convolutional neural networks (CNNs) have shown record-shattering performance in a variety of computer vision problems, such as visual object recognition, detection and segmentation. These methods have also been utilized in medical image analysis domain for lesion segmentation, anatomical segmentation and classification. We present an extensive literature review of CNN techniques applied in brain magnetic resonance imaging (MRI) analysis, focusing on the architectures, pre-processing, data-preparation and post-processing strategies available in these works. The aim of this study is three-fold. Our primary goal is to report how different CNN architectures have evolved, now entailing state-of-the-art methods by extensive discussion of the architectures and examining the pros and cons of the models when evaluating their performance using public datasets. Second, this paper is intended to be a detailed reference of the research activity in deep CNN for brain MRI analysis. Finally, our goal is to present a perspective on the future of CNNs, which we believe will be among the growing approaches in brain image analysis in subsequent years.
• [cs.CV]Distributed Mapper
Mustafa Hajij, Basem Assiri, Paul Rosen
http://arxiv.org/abs/1712.03660v1
The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint trees. In this paper we study the parallel analysis of the construction of Mapper. We give a provably correct algorithm to distribute Mapper on a set of processors and discuss the performance results that compare our approach to a reference sequential Mapper implementation. We report the performance experiments that demonstrate the efficiency of our method.
• [cs.CV]Dynamics Transfer GAN: Generating Video by Transferring Arbitrary Temporal Dynamics from a Source Video to a Single Target Image
Wissam J. Baddar, Geonmo Gu, Sangmin Lee, Yong Man Ro
http://arxiv.org/abs/1712.03534v1
In this paper, we propose Dynamics Transfer GAN; a new method for generating video sequences based on generative adversarial learning. The spatial constructs of a generated video sequence are acquired from the target image. The dynamics of the generated video sequence are imported from a source video sequence, with arbitrary motion, and imposed onto the target image. To preserve the spatial construct of the target image, the appearance of the source video sequence is suppressed and only the dynamics are obtained before being imposed onto the target image. That is achieved using the proposed appearance suppressed dynamics feature. Moreover, the spatial and temporal consistencies of the generated video sequence are verified via two discriminator networks. One discriminator validates the fidelity of the generated frames appearance, while the other validates the dynamic consistency of the generated video sequence. Experiments have been conducted to verify the quality of the video sequences generated by the proposed method. The results verified that Dynamics Transfer GAN successfully transferred arbitrary dynamics of the source video sequence onto a target image when generating the output video sequence. The experimental results also showed that Dynamics Transfer GAN maintained the spatial constructs (appearance) of the target image while generating spatially and temporally consistent video sequences.
• [cs.CV]Error Correction for Dense Semantic Image Labeling
Yu-Hui Huang, Xu Jia, Stamatios Georgoulis, Tinne Tuytelaars, Luc Van Gool
http://arxiv.org/abs/1712.03812v1
Pixelwise semantic image labeling is an important, yet challenging, task with many applications. Typical approaches to tackle this problem involve either the training of deep networks on vast amounts of images to directly infer the labels or the use of probabilistic graphical models to jointly model the dependencies of the input (i.e. images) and output (i.e. labels). Yet, the former approaches do not capture the structure of the output labels, which is crucial for the performance of dense labeling, and the latter rely on carefully hand-designed priors that require costly parameter tuning via optimization techniques, which in turn leads to long inference times. To alleviate these restrictions, we explore how to arrive at dense semantic pixel labels given both the input image and an initial estimate of the output labels. We propose a parallel architecture that: 1) exploits the context information through a LabelPropagation network to propagate correct labels from nearby pixels to improve the object boundaries, 2) uses a LabelReplacement network to directly replace possibly erroneous, initial labels with new ones, and 3) combines the different intermediate results via a Fusion network to obtain the final per-pixel label. We experimentally validate our approach on two different datasets for the semantic segmentation and face parsing tasks respectively, where we show improvements over the state-of-the-art. We also provide both a quantitative and qualitative analysis of the generated results.
• [cs.CV]FHEDN: A based on context modeling Feature Hierarchy Encoder-Decoder Network for face detection
Zexun Zhou, Zhongshi He, Ziyu Chen, Yuanyuan Jia, Haiyan Wang, Jinglong Du, Dingding Chen
http://arxiv.org/abs/1712.03687v1
Because of affected by weather conditions, camera pose and range, etc. Objects are usually small, blur, occluded and diverse pose in the images gathered from outdoor surveillance cameras or access control system. It is challenging and important to detect faces precisely for face recognition system in the field of public security. In this paper, we design a based on context modeling structure named Feature Hierarchy Encoder-Decoder Network for face detection(FHEDN), which can detect small, blur and occluded face with hierarchy by hierarchy from the end to the beginning likes encoder-decoder in a single network. The proposed network is consist of multiple context modeling and prediction modules, which are in order to detect small, blur, occluded and diverse pose faces. In addition, we analyse the influence of distribution of training set, scale of default box and receipt field size to detection performance in implement stage. Demonstrated by experiments, Our network achieves promising performance on WIDER FACE and FDDB benchmarks.
• [cs.CV]Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images
Mahdi Rad, Markus Oberweger, Vincent Lepetit
http://arxiv.org/abs/1712.03904v1
We propose a simple and efficient method for exploiting synthetic images when training a Deep Network to predict a 3D pose from an image. The ability of using synthetic images for training a Deep Network is extremely valuable as it is easy to create a virtually infinite training set made of such images, while capturing and annotating real images can be very cumbersome. However, synthetic images do not resemble real images exactly, and using them for training can result in suboptimal performance. It was recently shown that for exemplar-based approaches, it is possible to learn a mapping from the exemplar representations of real images to the exemplar representations of synthetic images. In this paper, we show that this approach is more general, and that a network can also be applied after the mapping to infer a 3D pose: At run time, given a real image of the target object, we first compute the features for the image, map them to the feature space of synthetic images, and finally use the resulting features as input to another network which predicts the 3D pose. Since this network can be trained very effectively by using synthetic images, it performs very well in practice, and inference is faster and more accurate than with an exemplar-based approach. We demonstrate our approach on the LINEMOD dataset for 3D object pose estimation from color images, and the NYU dataset for 3D hand pose estimation from depth maps. We show that it allows us to outperform the state-of-the-art on both datasets.
• [cs.CV]Geometry Guided Adversarial Facial Expression Synthesis
Lingxiao Song, Zhihe Lu, Ran He, Zhenan Sun, Tieniu Tan
http://arxiv.org/abs/1712.03474v1
Facial expression synthesis has drawn much attention in the field of computer graphics and pattern recognition. It has been widely used in face animation and recognition. However, it is still challenging due to the high-level semantic presence of large and non-linear face geometry variations. This paper proposes a Geometry-Guided Generative Adversarial Network (G2-GAN) for photo-realistic and identity-preserving facial expression synthesis. We employ facial geometry (fiducial points) as a controllable condition to guide facial texture synthesis with specific expression. A pair of generative adversarial subnetworks are jointly trained towards opposite tasks: expression removal and expression synthesis. The paired networks form a mapping cycle between neutral expression and arbitrary expressions, which also facilitate other applications such as face transfer and expression invariant face recognition. Experimental results show that our method can generate compelling perceptual results on various facial expression synthesis databases. An expression invariant face recognition experiment is also performed to further show the advantages of our proposed method.
• [cs.CV]Learning Nested Sparse Structures in Deep Neural Networks
Eunwoo Kim, Chanho Ahn, Songhwai Oh
http://arxiv.org/abs/1712.03781v1
Recently, there have been increasing demands to construct compact deep architectures to remove unnecessary redundancy and to improve the inference speed. While many recent works focus on reducing the redundancy by eliminating unneeded weight parameters, it is not possible to apply a single deep architecture for multiple devices with different resources. When a new device or circumstantial condition requires a new deep architecture, it is necessary to construct and train a new network from scratch. In this work, we propose a novel deep learning framework, called a nested sparse network, which exploits an n-in-1-type nested structure in a neural network. A nested sparse network consists of multiple levels of networks with a different sparsity ratio associated with each level, and higher level networks share parameters with lower level networks to enable stable nested learning. The proposed framework realizes a resource-aware versatile architecture as the same network can meet diverse resource requirements. Moreover, the proposed nested network can learn different forms of knowledge in its internal networks at different levels, enabling multiple tasks using a single network, such as coarse-to-fine hierarchical classification. In order to train the proposed nested sparse network, we propose efficient weight connection learning and channel and layer scheduling strategies. We evaluate our network in multiple tasks, including adaptive deep compression, knowledge distillation, and learning class hierarchy, and demonstrate that nested sparse networks perform competitively, but more efficiently, than existing methods.
• [cs.CV]Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing
Prashant Singh, Ekta Vats, Anders Hast
http://arxiv.org/abs/1712.03738v1
Computation of document image quality metrics often depends upon the availability of a ground truth image corresponding to the document. This limits the applicability of quality metrics in applications such as hyperparameter optimization of image processing algorithms that operate on-the-fly on unseen documents. This work proposes the use of surrogate models to learn the behavior of a given document quality metric on existing datasets where ground truth images are available. The trained surrogate model can later be used to predict the metric value on previously unseen document images without requiring access to ground truth images. The surrogate model is empirically evaluated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets.
• [cs.CV]MapNet: Geometry-Aware Learning of Maps for Camera Localization
Samarth Brahmbhatt, Jinwei Gu, Kihwan Kim, James Hays, Jan Kautz
http://arxiv.org/abs/1712.03342v1
Maps are a key component in image-based camera localization and visual SLAM systems: they are used to establish geometric constraints between images, correct drift in relative pose estimation, and relocalize cameras after lost tracking. The exact definitions of maps, however, are often application-specific and hand-crafted for different scenarios (e.g., 3D landmarks, lines, planes, bags of visual words). We propose to represent maps as a deep neural net called MapNet, which enables learning a data-driven map representation. Unlike prior work on learning maps, MapNet exploits cheap and ubiquitous sensory inputs like visual odometry and GPS in addition to images and fuses them together for camera localization. Geometric constraints expressed by these inputs, which have traditionally been used in bundle adjustment or pose-graph optimization, are formulated as loss terms in MapNet training and also used during inference. In addition to directly improving localization accuracy, this allows us to update the MapNet (i.e., maps) in a self-supervised manner using additional unlabeled video sequences from the scene. We also propose a novel parameterization for camera rotation which is better suited for deep-learning based camera pose regression. Experimental results on both the indoor 7-Scenes dataset and the outdoor Oxford RobotCar dataset show significant performance improvement over prior work.
• [cs.CV]NAG: Network for Adversary Generation
Konda Reddy Mopuri, Utkarsh Ojha, Utsav Garg, R. Venkatesh Babu
http://arxiv.org/abs/1712.03390v1
Adversarial perturbations can pose a serious threat for deploying machine learning systems. Recent works have shown existence of image-agnostic perturbations that can fool classifiers over most natural images. Existing methods present optimization approaches that solve for a fooling objective with an imperceptibility constraint to craft the perturbations. However, for a given classifier, they generate one perturbation at a time, which is a single instance from the manifold of adversarial perturbations. Also, in order to build robust models, it is essential to explore the manifold of adversarial perturbations. In this paper, we propose for the first time, a generative approach to model the distribution of adversarial perturbations. The architecture of the proposed model is inspired from that of GANs and is trained using fooling and diversity objectives. Our trained generator network attempts to capture the distribution of adversarial perturbations for a given classifier and readily generates a wide variety of such perturbations. Our experimental evaluation demonstrates that perturbations crafted by our model (i) achieve state-of-the-art fooling rates, (ii) exhibit wide variety and (iii) deliver excellent cross model generalizability. Our work can be deemed as an important step in the process of inferring about the complex manifolds of adversarial perturbations.
• [cs.CV]SPP-Net: Deep Absolute Pose Regression with Synthetic Views
Pulak Purkait, Cheng Zhao, Christopher Zach
http://arxiv.org/abs/1712.03452v1
Image based localization is one of the important problems in computer vision due to its wide applicability in robotics, augmented reality, and autonomous systems. There is a rich set of methods described in the literature how to geometrically register a 2D image w.r.t.\ a 3D model. Recently, methods based on deep (and convolutional) feedforward networks (CNNs) became popular for pose regression. However, these CNN-based methods are still less accurate than geometry based methods despite being fast and memory efficient. In this work we design a deep neural network architecture based on sparse feature descriptors to estimate the absolute pose of an image. Our choice of using sparse feature descriptors has two major advantages: first, our network is significantly smaller than the CNNs proposed in the literature for this task---thereby making our approach more efficient and scalable. Second---and more importantly---, usage of sparse features allows to augment the training data with synthetic viewpoints, which leads to substantial improvements in the generalization performance to unseen poses. Thus, our proposed method aims to combine the best of the two worlds---feature-based localization and CNN-based pose regression--to achieve state-of-the-art performance in the absolute pose estimation. A detailed analysis of the proposed architecture and a rigorous evaluation on the existing datasets are provided to support our method.
• [cs.CV]Single-Shot Multi-Person 3D Body Pose Estimation From Monocular RGB Input
Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Srinath Sridhar, Gerard Pons-Moll, Christian Theobalt
http://arxiv.org/abs/1712.03453v1
We propose a new efficient single-shot method for multi-person 3D pose estimation in general scenes from a monocular RGB camera. Our fully convolutional DNN-based approach jointly infers 2D and 3D joint locations on the basis of an extended 3D location map supported by body part associations. This new formulation enables the readout of full body poses at a subset of visible joints without the need for explicit bounding box tracking. It therefore succeeds even under strong partial body occlusions by other people and objects in the scene. We also contribute the first training data set showing real images of sophisticated multi-person interactions and occlusions. To this end, we leverage multi-view video-based performance capture of individual people for ground truth annotation and a new image compositing for user-controlled synthesis of large corpora of real multi-person images. We also propose a new video-recorded multi-person test set with ground truth 3D annotations. Our method achieves state-of-the-art performance on challenging multi-person scenes.
• [cs.CV]Sketch Layer Separation in Multi-Spectral Historical Document Images
AmirAbbas Davari, Armin Häberle, Vincent Christlein, Andreas Maier, Christian Riess
http://arxiv.org/abs/1712.03596v1
High-resolution imaging has delivered new prospects for detecting the material composition and structure of cultural treasures. Despite the various techniques for analysis, a significant diagnostic gap remained in the range of available research capabilities for works on paper. Old master drawings were mostly composed in a multi-step manner with various materials. This resulted in the overlapping of different layers which made the subjacent strata difficult to differentiate. The separation of stratified layers using imaging methods could provide insights into the artistic work processes and help answer questions about the object, its attribution, or in identifying forgeries. The pattern recognition procedure was tested with mock replicas to achieve the separation and the capability of displaying concealed red chalk under ink. In contrast to RGB-sensor based imaging, the multi- or hyperspectral technology allows accurate layer separation by recording the characteristic signatures of the material's reflectance. The risk of damage to the artworks as a result of the examination can be reduced by using combinations of defined spectra for lightning and image capturing. By guaranteeing the maximum level of readability, our results suggest that the technique can be applied to a broader range of objects and assist in diagnostic research into cultural treasures in the future.
• [cs.CV]The Effectiveness of Data Augmentation for Detection of Gastrointestinal Diseases from Endoscopical Images
Andrea Asperti, Claudio Mastronardo
http://arxiv.org/abs/1712.03689v1
The lack, due to privacy concerns, of large public databases of medical pathologies is a well-known and major problem, substantially hindering the application of deep learning techniques in this field. In this article, we investigate the possibility to supply to the deficiency in the number of data by means of data augmentation techniques, working on the recent Kvasir dataset of endoscopical images of gastrointestinal diseases. The dataset comprises 4,000 colored images labeled and verified by medical endoscopists, covering a few common pathologies at different anatomical landmarks: Z-line, pylorus and cecum. We show how the application of data augmentation techniques allows to achieve sensible improvements of the classification with respect to previous approaches, both in terms of precision and recall.
• [cs.CV]Unsupervised Feature Learning for Audio Analysis
Matthias Meyer, Jan Beutel, Lothar Thiele
http://arxiv.org/abs/1712.03835v1
Identifying acoustic events from a continuously streaming audio source is of interest for many applications including environmental monitoring for basic research. In this scenario neither different event classes are known nor what distinguishes one class from another. Therefore, an unsupervised feature learning method for exploration of audio data is presented in this paper. It incorporates the two following novel contributions: First, an audio frame predictor based on a Convolutional LSTM autoencoder is demonstrated, which is used for unsupervised feature extraction. Second, a training method for autoencoders is presented, which leads to distinct features by amplifying event similarities. In comparison to standard approaches, the features extracted from the audio frame predictor trained with the novel approach show 13 % better results when used with a classifier and 36 % better results when used for clustering.
• [cs.CV]Using a single RGB frame for real time 3D hand pose estimation in the wild
Paschalis Panteleris, Iason Oikonomidis, Antonis Argyros
http://arxiv.org/abs/1712.03866v1
We present a method for the real-time estimation of the full 3D pose of one or more human hands using a single commodity RGB camera. Recent work in the area has displayed impressive progress using RGBD input. However, since the introduction of RGBD sensors, there has been little progress for the case of monocular color input. We capitalize on the latest advancements of deep learning, combining them with the power of generative hand pose estimation techniques to achieve real-time monocular 3D hand pose estimation in unrestricted scenarios. More specifically, given an RGB image and the relevant camera calibration information, we employ a state-of-the-art detector to localize hands. Given a crop of a hand in the image, we run the pretrained network of OpenPose for hands to estimate the 2D location of hand joints. Finally, non-linear least-squares minimization fits a 3D model of the hand to the estimated 2D joint positions, recovering the 3D hand pose. Extensive experimental results provide comparison to the state of the art as well as qualitative assessment of the method in the wild.
• [cs.CV]Visual aesthetic analysis using deep neural network: model and techniques to increase accuracy without transfer learning
Muktabh Mayank Srivastava, Sonaal Kant
http://arxiv.org/abs/1712.03382v1
We train a deep Convolutional Neural Network (CNN) from scratch for visual aesthetic analysis in images and discuss techniques we adopt to improve the accuracy. We avoid the prevalent best transfer learning approaches of using pretrained weights to perform the task and train a model from scratch to get accuracy of 78.7% on AVA2 Dataset close to the best models available (85.6%). We further show that accuracy increases to 81.48% on increasing the training set by incremental 10 percentile of entire AVA dataset showing our algorithm gets better with more data.
• [cs.CY]Cogniculture: Towards a Better Human-Machine Co-evolution
Rakesh R Pimplikar, Kushal Mukherjee, Gyana Parija, Harit Vishwakarma, Ramasuri Narayanam, Sarthak Ahuja, Rohith D Vallam, Ritwik Chaudhuri, Joydeep Mondal
http://arxiv.org/abs/1712.03724v1
Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opinions regarding the pros and cons of AI mimicking human behavior. Instead of worrying about AI advancements, we propose a novel idea of cognitive agents, including both human and machines, living together in a complex adaptive ecosystem, collaborating on human computation for producing essential social goods while promoting sustenance, survival and evolution of the agents' life cycle. We highlight several research challenges and technology barriers in achieving this goal. We propose a governance mechanism around this ecosystem to ensure ethical behaviors of all cognitive agents. Along with a novel set of use-cases of Cogniculture, we discuss the road map ahead for this journey.
• [cs.CY]Fairness in Machine Learning: Lessons from Political Philosophy
Reuben Binns
http://arxiv.org/abs/1712.03586v1
What does it mean for a machine learning model to be
fair', in terms which can be operationalised? Should fairness consist of ensuring everyone has an equal probability of obtaining some benefit, or should we aim instead to minimise the harms to the least advantaged? Can the relevant ideal be determined by reference to some alternative state of affairs in which a particular social pattern of discrimination does not exist? Various definitions proposed in recent literature make different assumptions about what terms like discrimination and fairness mean and how they can be defined in mathematical terms. Questions of discrimination, egalitarianism and justice are of significant interest to moral and political philosophers, who have expended significant efforts in formalising and defending these central concepts. It is therefore unsurprising that attempts to formalise
fairness' in machine learning contain echoes of these old philosophical debates. This paper draws on existing work in moral and political philosophy in order to elucidate emerging debates about fair machine learning.
• [cs.DC]An Efficient Multi-core Implementation of the Jaya Optimisation Algorithm
Panagiotis D. Michailidis
http://arxiv.org/abs/1712.03366v1
In this work, we propose a hybrid parallel Jaya optimisation algorithm for a multi-core environment with the aim of solving large-scale global optimisation problems. The proposed algorithm is called HHCPJaya, and combines the hyper-population approach with the hierarchical cooperation search mechanism. The HHCPJaya algorithm divides the population into many small subpopulations, each of which focuses on a distinct block of the original population dimensions. In the hyper-population approach, we increase the small subpopulations by assigning more than one subpopulation to each core, and each subpopulation evolves independently to enhance the explorative and exploitative nature of the population. We combine this hyper-population approach with the two-level hierarchical cooperative search scheme to find global solutions from all subpopulations. Furthermore, we incorporate an additional updating phase on the respective subpopulations based on global solutions, with the aim of further improving the convergence rate and the quality of solutions. Several experiments applying the proposed parallel algorithm in different settings prove that it demonstrates sufficient promise in terms of the quality of solutions and the convergence rate. Furthermore, a relatively small computational effort is required to solve complex and large-scale optimisation problems.
• [cs.GR]A Deep Recurrent Framework for Cleaning Motion Capture Data
Utkarsh Mall, G. Roshan Lal, Siddhartha Chaudhuri, Parag Chaudhuri
http://arxiv.org/abs/1712.03380v1
We present a deep, bidirectional, recurrent framework for cleaning noisy and incomplete motion capture data. It exploits temporal coherence and joint correlations to infer adaptive filters for each joint in each frame. A single model can be trained to denoise a heterogeneous mix of action types, under substantial amounts of noise. A signal that has both noise and gaps is preprocessed with a second bidirectional network that synthesizes missing frames from surrounding context. The approach handles a wide variety of noise types and long gaps, does not rely on knowledge of the noise distribution, and operates in a streaming setting. We validate our approach through extensive evaluations on noise both in joint angles and in joint positions, and show that it improves upon various alternatives.
• [cs.HC]Usability of Humanly Computable Passwords
Samira Samadi, Santosh Vempala, Adam Tauman Kalai
http://arxiv.org/abs/1712.03650v1
Reusing passwords across multiple websites is a common practice that compromises security. Recently, Blum and Vempala have proposed password strategies to help people calculate, in their heads, passwords for different sites without dependence on third-party tools or external devices. Thus far, the security and efficiency of these "mental algorithms" has been analyzed only theoretically. But are such methods usable? We present the first usability study of humanly computable password strategies, involving a learning phase (to learn a password strategy), then a rehearsal phase (to login to a few websites), and multiple follow-up tests. In our user study, with training, participants were able to calculate a deterministic eight-character password for an arbitrary new website in under 20 seconds.
• [cs.IR]Fast Nearest-Neighbor Classification using RNN in Domains with Large Number of Classes
Gautam Singh, Gargi Dasgupta, Yu Deng
http://arxiv.org/abs/1712.03941v1
In scenarios involving text classification where the number of classes is large (in multiples of 10000s) and training samples for each class are few and often verbose, nearest neighbor methods are effective but very slow in computing a similarity score with training samples of every class. On the other hand, machine learning models are fast at runtime but training them adequately is not feasible using few available training samples per class. In this paper, we propose a hybrid approach that cascades 1) a fast but less-accurate recurrent neural network (RNN) model and 2) a slow but more-accurate nearest-neighbor model using bag of syntactic features. Using the cascaded approach, our experiments, performed on data set from IT support services where customer complaint text needs to be classified to return top-$N$ possible error codes, show that the query-time of the slow system is reduced to $1/6^{th}$ while its accuracy is being improved. Our approach outperforms an LSH-based baseline for query-time reduction. We also derive a lower bound on the accuracy of the cascaded model in terms of the accuracies of the individual models. In any two-stage approach, choosing the right number of candidates to pass on to the second stage is crucial. We prove a result that aids in choosing this cutoff number for the cascaded system.
• [cs.IR]Interactions between Health Searchers and Search Engines
George Philipp, Ryen W. White
http://arxiv.org/abs/1712.03622v1
The Web is an important resource for understanding and diagnosing medical conditions. Based on exposure to online content, people may develop undue health concerns, believing that common and benign symptoms are explained by serious illnesses. In this paper, we investigate potential strategies to mine queries and searcher histories for clues that could help search engines choose the most appropriate information to present in response to exploratory medical queries. To do this, we performed a longitudinal study of health search behavior using the logs of a popular search engine. We found that query variations which might appear innocuous (e.g. "bad headache" vs "severe headache") may hold valuable information about the searcher which could be used by search engines to improve performance. Furthermore, we investigated how medically concerned users respond differently to search engine result pages (SERPs) and find that their disposition for clicking on concerning pages is pronounced, potentially leading to a self-reinforcement of concern. Finally, we studied to which degree variations in the SERP impact future search and real-world health-seeking behavior and obtained some surprising results (e.g., viewing concerning pages may lead to a short-term reduction of real-world health seeking).
• [cs.IR]Semi-supervised Multimodal Hashing
Dayong Tian, Maoguo Gong, Deyun Zhou, Jiao Shi, Yu Lei
http://arxiv.org/abs/1712.03404v1
Retrieving nearest neighbors across correlated data in multiple modalities, such as image-text pairs on Facebook and video-tag pairs on YouTube, has become a challenging task due to the huge amount of data. Multimodal hashing methods that embed data into binary codes can boost the retrieving speed and reduce storage requirement. As unsupervised multimodal hashing methods are usually inferior to supervised ones, while the supervised ones requires too much manually labeled data, the proposed method in this paper utilizes a part of labels to design a semi-supervised multimodal hashing method. It first computes the transformation matrices for data matrices and label matrix. Then, with these transformation matrices, fuzzy logic is introduced to estimate a label matrix for unlabeled data. Finally, it uses the estimated label matrix to learn hashing functions for data in each modality to generate a unified binary code matrix. Experiments show that the proposed semi-supervised method with 50% labels can get a medium performance among the compared supervised ones and achieve an approximate performance to the best supervised method with 90% labels. With only 10% labels, the proposed method can still compete with the worst compared supervised one.
• [cs.IR]SneakPeek: Interest Mining of Images based on User Interaction
Daniyal Shahrokhian, Alejandro Vera de Juan
http://arxiv.org/abs/1712.03585v1
Nowadays, eye tracking is the most used technology to detect areas of interest. This kind of technology requires specialized equipment recording user's eyes. In this paper, we propose SneakPeek, a different approach to detect areas of interest on images displayed in web pages based on the zooming and panning actions of the users through the image. We have validated our proposed solution with a group of test subjects that have performed a test in our on-line prototype. Being this the first iteration of the algorithm, we have found both good and bad results, depending on the type of image. In specific, SneakPeek works best with medium/big objects in medium/big sized images. The reason behind it is the limitation on detection when smartphone screens keep getting bigger and bigger. SneakPeek can be adapted to any website by simply adapting the controller interface for the specific case.
• [cs.IT]A Framework for Optimizing Multi-cell NOMA: Delivering Demand with Less Resource
Lei You, Lei Lei, Di Yuan, Sumei Sun, Symeon Chatzinotas, Björn Ottersten
http://arxiv.org/abs/1712.03757v1
Non-orthogonal multiple access (NOMA) allows multiple users to simultaneously access the same time-frequency resource by using superposition coding and successive interference cancellation (SIC). Thus far, most papers on NOMA have focused on performance gain for one or sometimes two base stations. In this paper, we study multi-cell NOMA and provide a general framework for user clustering and power allocation, taking into account inter-cell interference, for optimizing resource allocation of NOMA in multi-cell networks of arbitrary topology. We provide a series of theoretical analysis, to algorithmically enable optimization approaches. The resulting algorithmic notion is very general. Namely, we prove that for any performance metric that monotonically increases in the cells' resource consumption, we have convergence guarantee for global optimum. We apply the framework with its algorithmic concept to a multi-cell scenario to demonstrate the gain of NOMA in achieving significantly higher efficiency.
• [cs.IT]Achieving Private Information Retrieval Capacity in Distributed Storage Using an Arbitrary Linear Code
Siddhartha Kumar, Hsuan-Yin Lin, Eirik Rosnes, Alexandre Graell i Amat
http://arxiv.org/abs/1712.03898v1
We propose three private information retrieval (PIR) protocols for distributed storage systems (DSSs) where data is stored using an arbitrary linear code. The first two protocols, named Protocol 1 and Protocol 2, achieve privacy for the scenario with non-colluding nodes. Protocol 1 requires a file size that is exponential in the number of files in the system, while the file size required for Protocol 2 is independent of the number of files and is hence simpler. We prove that, for certain linear codes, Protocol 1 achieves the PIR capacity, i.e., its PIR rate (the ratio of the amount of retrieved stored data per unit of downloaded data) is the maximum possible for any given (finite and infinite) number of files, and Protocol 2 achieves the \emph{asymptotic} PIR capacity (with infinitely large number of files in the DSS). In particular, we provide a sufficient and a necessary condition for a code to be PIR capacity achieving and prove that cyclic codes, Reed-Muller (RM) codes, and optimal information locality local reconstruction codes achieve both the \emph{finite} PIR capacity (i.e., with any given number of files) and the asymptotic PIR capacity with Protocol 1 and 2, respectively. Furthermore, we present a third protocol, Protocol 3, for the scenario with multiple colluding nodes, which can be seen as an improvement of a protocol recently introduced by Freij-Hollanti \emph{et al.}. We also present an algorithm to optimize the PIR rate of the proposed protocol. Finally, we provide a particular class of codes that is suitable for this protocol and show that RM codes achieve the maximum possible PIR rate for the protocol.
• [cs.IT]Age Minimization in Energy Harvesting Communications: Energy-Controlled Delays
Ahmed Arafa, Sennur Ulukus
http://arxiv.org/abs/1712.03945v1
We consider an energy harvesting source that is collecting measurements from a physical phenomenon and sending updates to a destination within a communication session time. Updates incur transmission delays that are function of the energy used in their transmission. The more transmission energy used per update, the faster it reaches the destination. The goal is to transmit updates in a timely manner, namely, such that the total age of information is minimized by the end of the communication session, subject to energy causality constraints. We consider two variations of this problem. In the first setting, the source controls the number of measurement updates, their transmission times, and the amounts of energy used in their transmission (which govern their delays, or service times, incurred). In the second setting, measurement updates externally arrive over time, and therefore the number of updates becomes fixed, at the expense of adding data causality constraints to the problem. We characterize age-minimal policies in the two settings, and discuss the relationship of the age of information metric to other metrics used in the energy harvesting literature.
• [cs.IT]Caching and Coded Delivery over Gaussian Broadcast Channels for Energy Efficiency
Mohammad Mohammadi Amiri, Deniz Gunduz
http://arxiv.org/abs/1712.03433v1
A cache-aided $K$-user Gaussian broadcast channel (BC) is considered. The transmitter has a library of $N$ equal-rate files, from which each user demands one. The impact of the equal-capacity receiver cache memories on the minimum required transmit power to satisfy all user demands is studied. Considering uniformly random demands across the library, both the minimum average power (averaged over all demand combinations) and the minimum peak power (minimum power required to satisfy all demand combinations) are studied. Upper bounds are presented on the minimum required average and peak transmit power as a function of the cache capacity considering centralized caching. The proposed scheme is then extended to the decentralized caching scenario. The lower bounds on the minimum required average and peak power values are also derived assuming uncoded cache placement. The bounds for both the peak and average power are shown to be tight in the centralized scenario through numerical results. The results in this paper show that proactive caching and coded delivery can provide significant energy savings in wireless networks.
• [cs.IT]Compressive Phase Retrieval of Structured Signal
Milad Bakhshizadeh, Arian Maleki, Shirin Jalali
http://arxiv.org/abs/1712.03278v1
Compressive phase retrieval is the problem of recovering a structured vector $\boldsymbol{x} \in \mathbb{C}^n$ from its phaseless linear measurements. A compression algorithm aims to represent structured signals with as few bits as possible. As a result of extensive research devoted to compression algorithms, in many signal classes, compression algorithms are capable of employing sophisticated structures in signals and compress them efficiently. This raises the following important question: Can a compression algorithm be used for the compressive phase retrieval problem? To address this question, COmpressive PhasE Retrieval (COPER) optimization is proposed, which is a compression-based phase retrieval method. For a family of compression codes with rate-distortion function denoted by $r(\delta)$, in the noiseless setting, COPER is shown to require slightly more than $\lim\limits_{\delta \rightarrow 0} \frac{r(\delta)}{\log(1/\delta)}$ observations for an almost accurate recovery of $\boldsymbol{x}$.
• [cs.IT]Data Aggregation Over Multiple Access Wireless Sensors Network
Alejandro Cohen, Asaf Cohen, Omer Gurewitz
http://arxiv.org/abs/1712.03314v1
Data collection in Wireless Sensor Networks (WSN) draws significant attention, due to emerging interest in technologies raging from Internet of Things (IoT) networks to simple "Presence" applications, which identify the status of the devices (active or inactive). Numerous Medium Access Control (MAC) protocols for WSN, which can address the challenge of data collection in dense networks, were suggested over the years. Most of these protocols utilize the traditional layering approach, in which the MAC layer is unaware of the encapsulated packet payload, and therefore there is no connection between the data collected, the physical layer and the signaling mechanisms. Nonetheless, in many of the applications that intend to utilize such protocols, nodes may need to exchange very little information, and do so only sporadically, that is, while the number of devices in the network can be very large, only a subset wishes to transmit at any given time. Thus, a tailored protocol, which matches the signaling, physical layer and access control to traffic patterns is required. In this work, we design and analyze a data collection protocol based on information theoretic principles. In the suggested protocol, the sink collects messages from up to K sensors simultaneously, out of a large population of sensors, without knowing in advance which sensors will transmit, and without requiring any synchronization, coordination or management overhead. In other words, neither the sink nor the other sensors need to know who are the actively transmitting sensors, and this data is decoded directly from the channel output. We provide a simple codebook construction with very simple encoding and decoding procedures. We further design a secure version of the protocol.
• [cs.IT]Hybrid Analog-Digital Beamforming for Massive MIMO Systems
Shahar Stein Ioushua, Yonina C. Eldar
http://arxiv.org/abs/1712.03485v1
In massive MIMO systems, hybrid beamforming is an essential technique for exploiting the potential array gain without using a dedicated RF chain for each antenna. In this work, we consider the data phase in a massive MIMO communication process, where the transmitter and receiver use fewer RF chains than antennas. We examine several different fully- and partially connected schemes and consider the design of hybrid beamformers that minimize the estimation error in the data. For the hybrid precoder, we introduce a framework for approximating the optimal fully-digital precoder with a feasible hybrid one. We exploit the fact that the fully-digital precoder is unique only up to a unitary matrix and optimize over this matrix and the hybrid precoder alternately. Our alternating minimization of approximation gap (Alt-MaG) framework improves the performance over state-of-the-art methods with no substantial increase in complexity. In addition, we present a special case of Alt-MaG, minimal gap iterative quantization (MaGiQ), that results in low complexity and lower mean squared error (MSE) than other common methods, in the case of very few RF chains. MaGiQ is shown to coincide with the optimal fully-digital solution in some scenarios. For combiner design, we exploit the structure of the MSE objective and develop a greedy ratio trace maximization technique, that achieves low MSE under various settings. All of our algorithms can be used with multiple hardware architectures.
• [cs.IT]Low-Latency Multiuser Two-Way Wireless Relaying for Spectral and Energy Efficiencies
Zhichao Sheng, Hoang Duong Tuan, Trung Q. Duong, H. Vincent Poor, Yong Fang
http://arxiv.org/abs/1712.03756v1
The paper considers two possible approaches, which enable multiple pairs of users to exchange information via multiple multi-antenna relays within one time-slot to save the communication bandwidth in low-latency communications. The first approach is to deploy full-duplexes for both users and relays to make their simultaneous signal transmission and reception possible. In the second approach the users use a fraction of a time slot to send their information to the relays and the relays use the remaining complementary fraction of the time slot to send the beamformed signals to the users. The inherent loop self-interference in the duplexes and inter-full-duplexing-user interference in the first approach are absent in the second approach. Under both these approaches, the joint users' power allocation and relays' beamformers to either optimize the users' exchange of information or maximize the energy-efficiency subject to user quality-of-service (QoS) in terms of the exchanging information throughput thresholds lead to complex nonconvex optimization problems. Path-following algorithms are developed for their computational solutions. The provided numerical examples show the advantages of the second approach over the first approach.
• [cs.IT]Multi-cell Massive MIMO Beamforming in Assuring QoS for Large Numbers of Users
Long D. Nguyen, Hoang D. Tuan, Trung Q. Duong, H. Vincent Poor
http://arxiv.org/abs/1712.03548v1
Massive multi-input multi-output (MIMO) uses a very large number of low-power transmit antennas to serve much smaller numbers of users. The most widely proposed type of massive MIMO transmit beamforming is zero-forcing, which is based on the right inverse of the overall MIMO channel matrix to force the inter-user interference to zero. The performance of massive MIMO is then analyzed based on the throughput of cell-edge users. This paper reassesses this beamforming philosophy, to instead consider the maximization of the energy efficiency of massive MIMO systems in assuring the quality-of- service (QoS) for as many users as possible. The bottleneck of serving small numbers of users by a large number of transmit antennas is unblocked by a new time-fraction-wise beamforming technique, which focuses signal transmission in fractions of a time slot. Accordingly, massive MIMO can deliver better quality-of-experience (QoE) in assuring QoS for much larger numbers of users. The provided simulations show that the numbers of users served by massive MIMO with the required QoS may be twice or more than the number of its transmit antennas.
• [cs.IT]Multilevel Diversity Coding with Secure Regeneration: Separate Coding Achieves the MBR Point
Shuo Shao, Tie Liu, Chao Tian, Cong Shen
http://arxiv.org/abs/1712.03326v1
The problem of multilevel diversity coding with secure regeneration (MDC-SR) is considered, which includes the problems of multilevel diversity coding with regeneration (MDC-R) and secure regenerating code (SRC) as special cases. Two outer bounds are established, showing that separate coding of different messages using the respective SRCs can achieve the minimum-bandwidth-regeneration (MBR) point of the achievable normalized storage-capacity repair-bandwidth tradeoff regions for the general MDC-SR problem. The core of the new converse results is an exchange lemma, which can be established using Han's subset inequality.
• [cs.IT]On Stochastic Orders and Fast Fading Multiuser Channels with Statistical CSIT
Pin-Hsun Lin, Eduard A. Jorswieck, Rafael F. Schaefer, Martin Mittelbach, Carsten R. Janda
http://arxiv.org/abs/1712.03692v1
In this paper, we investigate the ergodic capacity of fast fading Gaussian multiuser channels when only the statistics of the channel state are known at the transmitter. In general, the characterization of capacity regions of multiuser channels with only statistical channel state information at the transmitter (CSIT) is open. Instead of directly matching achievable rate regions and the corresponding outer bounds, in this work we resort to classifying the random channels through their probability distributions. To be more precise, in order to attain capacity results, we first derive sufficient conditions to attain some information theoretic channel orders such as degraded and strong/very strong interference by applying the usual stochastic order and exploiting the same marginal property such that the capacity regions of the memoryless Gaussian multiuser channels can be characterized. These include Gaussian interference channels, Gaussian broadcast channels, and Gaussian wiretap channels/secret key generation. We also extend the framework to channels with a specific memory structure, namely, channels with finite-state, wherein the Markov fading channel is discussed as a special case. Several practical examples such as Rayleigh fading and Nakagami-\textit{m} fading, etc., illustrate the application of the derived results.
• [cs.IT]Optimal Odd Arm Identification with Fixed Confidence
Gayathri R Prabhu, Srikrishna Bhashyam, Aditya Gopalan, Rajesh Sundaresan
http://arxiv.org/abs/1712.03682v1
The problem of detecting an odd arm from a set of K arms of a multi-armed bandit, with fixed confidence, is studied in a sequential decision-making scenario. Each arm's signal follows a distribution from a vector exponential family. All arms have the same parameters except the odd arm. The actual parameters of the odd and non-odd arms are unknown to the decision maker. Further, the decision maker incurs a cost whenever the decision maker switches from one arm to another. This is a sequential decision making problem where the decision maker gets only a limited view of the true state of nature at each stage, but can control his view by choosing the arm to observe at each stage. Of interest are policies that satisfy a given constraint on the probability of false detection. An information-theoretic lower bound on the total cost (expected time for a reliable decision plus total switching cost) is first identified, and a variation on a sequential policy based on the generalised likelihood ratio statistic is then studied. Thanks to the vector exponential family assumption, the signal processing in this policy at each stage turns out to be very simple, in that the associated conjugate prior enables easy updates of the posterior distribution of the model parameters. The policy, with a suitable threshold, is shown to satisfy the given constraint on the probability of false detection. Further, the proposed policy is asymptotically optimal in terms of the total cost among all policies that satisfy the constraint on the probability of false detection
• [cs.IT]Optimal locally repairable codes via elliptic curves
Xudong Li, Liming Ma, Chaoping Xing
http://arxiv.org/abs/1712.03744v1
Constructing locally repairable codes achieving Singleton-type bound (we call them optimal codes in this paper) is a challenging task and has attracted great attention in the last few years. Tamo and Barg \cite{TB14} first gave a breakthrough result in this topic by cleverly considering subcodes of Reed-Solomon codes. Thus, $q$-ary optimal locally repairable codes from subcodes of Reed-Solomon codes given in \cite{TB14} have length upper bounded by $q$. Recently, it was shown through extension of construction in \cite{TB14} that length of $q$-ary optimal locally repairable codes can be $q+1$ in \cite{JMX17}. Surprisingly it was shown in \cite{BHHMV16} that, unlike classical MDS codes, $q$-ary optimal locally repairable codes could have length bigger than $q+1$. Thus, it becomes an interesting and challenging problem to construct $q$-ary optimal locally repairable codes of length bigger than $q+1$. In the present paper, we make use of rich algebraic structures of elliptic curves to construct a family of $q$-ary optimal locally repairable codes of length up to $q+2\sqrt{q}$. It turns out that locality of our codes can be as big as $23$ and distance can be linear in length.
• [cs.IT]Overcoming Endurance Issue: UAV-Enabled Communications with Proactive Caching
Xiaoli Xu, Yong Zeng, Yong Liang Guan, Rui Zhang
http://arxiv.org/abs/1712.03542v1
Wireless communication enabled by unmanned aerial vehicles (UAVs) has emerged as an appealing technology for many application scenarios in future wireless systems. However, the limited endurance of UAVs greatly hinders the practical implementation of UAV-enabled communications. To overcome this issue, this paper proposes a novel scheme for UAV-enabled communications by utilizing the promising technique of proactive caching at the users. Specifically, we focus on content-centric communication systems, where a UAV is dispatched to serve a group of ground nodes (GNs) with random and asynchronous requests for files drawn from a given set. With the proposed scheme, at the beginning of each operation period, the UAV pro-actively transmits the files to a subset of selected GNs that cooperatively cache all the files in the set. As a result, when requested, a file can be retrieved by each GN either directly from its local cache or from its nearest neighbor that has cached the file via device-to-device (D2D) communications. It is revealed that there exists a fundamental trade-off between the file caching cost, which is the total time required for the UAV to transmit the files to their designated caching GNs, and the file retrieval cost, which is the average time required for serving one file request. To characterize this trade-off, we formulate an optimization problem to minimize the weighted sum of the two costs, via jointly designing the file caching policy, the UAV trajectory and communication scheduling. As the formulated problem is NP-hard in general, we propose efficient algorithms to find high-quality approximate solutions for it. Numerical results are provided to corroborate our study and show the great potential of proactive caching for overcoming the limited endurance issue in UAV-enabled communications.
• [cs.IT]Progressive Bit-Flipping Decoding of Polar Codes Over Layered Critical Sets
Zhaoyang Zhang, Kangjian Qin, Liang Zhang, Huazi Zhang, Guo Tai Chen
http://arxiv.org/abs/1712.03332v1
In successive cancellation (SC) polar decoding, an incorrect estimate of any prior unfrozen bit may bring about severe error propagation in the following decoding, thus it is desirable to find out and correct an error as early as possible. In this paper, we first construct a critical set $S$ of unfrozen bits, which with high probability (typically $>99%$) includes the bit where the first error happens. Then we develop a progressive multi-level bit-flipping decoding algorithm to correct multiple errors over the multiple-layer critical sets each of which is constructed using the remaining undecoded subtree associated with the previous layer. The \emph{level} in fact indicates the number of \emph{independent} errors that could be corrected. We show that as the level increases, the block error rate (BLER) performance of the proposed progressive bit flipping decoder competes with the corresponding cyclic redundancy check (CRC) aided successive cancellation list (CA-SCL) decoder, e.g., a level 4 progressive bit-flipping decoder is comparable to the CA-SCL decoder with a list size of $L=32$. Furthermore, the average complexity of the proposed algorithm is much lower than that of a SCL decoder (and is similar to that of SC decoding) at medium to high signal to noise ratio (SNR).
• [cs.IT]Short-Packet Two-Way Amplify-and-Forward Relaying
Yifan Gu, He Chen, Yonghui Li, Lingyang Song, Branka Vucetic
http://arxiv.org/abs/1712.03653v1
This letter investigates an amplify-and-forward two-way relay network (TWRN) for short-packet communications. We consider a classical three-node TWRN consisting of two sources and one relay. Both two time slots (2TS) scheme and three time slots (3TS) scheme are studied under the finite blocklength regime. We derive approximate closed-form expressions of sum-block error rate (BLER) for both schemes. Simple asymptotic expressions for sum-BLER at high signal-to-noise ratio (SNR) are also derived. Based on the asymptotic expressions, we analytically compare the sum-BLER performance of 2TS and 3TS schemes, and attain an expression of critical blocklength, which can determine the performance superiority of 2TS and 3TS in terms of sum-BLER. Extensive simulations are provided to validate our theoretical analysis. Our results discover that 3TS scheme is more suitable for a system with lower relay transmission power, higher differences between the average SNR of both links and relatively lower requirements on data rate and latency.
• [cs.IT]Ulam Sphere Size Analysis for Permutation and Multipermutation Codes Correcting Translocation Errors
Justin Kong
http://arxiv.org/abs/1712.03639v1
Permutation and multipermutation codes in the Ulam metric have been suggested for use in non-volatile memory storage systems such as flash memory devices. In this paper we introduce a new method to calculate permutation sphere sizes in the Ulam metric using Young Tableaux and prove the non-existence of non-trivial perfect permutation codes in the Ulam metric. We then extend the study to multipermutations, providing tight upper and lower bounds on multipermutation Ulam sphere sizes and resulting upper and lower bounds on the maximal size of multipermutation codes in the Ulam metric.
• [cs.LG]A General Memory-Bounded Learning Algorithm
Michal Moshkovitz, Naftali Tishby
http://arxiv.org/abs/1712.03524v1
In an era of big data there is a growing need for memory-bounded learning algorithms. In the last few years researchers have investigated what cannot be learned under memory constraints. In this paper we focus on the complementary question of what can be learned under memory constraints. We show that if a hypothesis class fulfills a combinatorial condition defined in this paper, there is a memory-bounded learning algorithm for this class. We prove that certain natural classes fulfill this combinatorial property and thus can be learned under memory constraints.
• [cs.LG]Bayesian Q-learning with Assumed Density Filtering
Heejin Jeong, Daniel D. Lee
http://arxiv.org/abs/1712.03333v1
While off-policy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings non-linearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesian approach to off-policy TD methods using Assumed Density Filtering, called ADFQ, which updates beliefs on action-values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs not only are used in exploration but they provide a natural regularization in the belief updates. We also present a connection between ADFQ and Q-learning. Our empirical results show the proposed ADFQ algorithms outperform comparing algorithms in several task domains. Moreover, our algorithms improve general drawbacks in BRL such as computational complexity, usage of uncertainty, and nonlinearity.
• [cs.LG]Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent
Matteo Pirotta, Marcello Restelli
http://arxiv.org/abs/1712.03428v1
In this paper, we propose a novel approach to automatically determine the batch size in stochastic gradient descent methods. The choice of the batch size induces a trade-off between the accuracy of the gradient estimate and the cost in terms of samples of each update. We propose to determine the batch size by optimizing the ratio between a lower bound to a linear or quadratic Taylor approximation of the expected improvement and the number of samples used to estimate the gradient. The performance of the proposed approach is empirically compared with related methods on popular classification tasks. The work was presented at the NIPS workshop on Optimizing the Optimizers. Barcelona, Spain, 2016.
• [cs.LG]DGCNN: Disordered Graph Convolutional Neural Network Based on the Gaussian Mixture Model
Bo Wu, Yang Liu, Bo Lang, Lei Huang
http://arxiv.org/abs/1712.03563v1
Convolutional neural networks (CNNs) can be applied to graph similarity matching, in which case they are called graph CNNs. Graph CNNs are attracting increasing attention due to their effectiveness and efficiency. However, the existing convolution approaches focus only on regular data forms and require the transfer of the graph or key node neighborhoods of the graph into the same fixed form. During this transfer process, structural information of the graph can be lost, and some redundant information can be incorporated. To overcome this problem, we propose the disordered graph convolutional neural network (DGCNN) based on the mixed Gaussian model, which extends the CNN by adding a preprocessing layer called the disordered graph convolutional layer (DGCL). The DGCL uses a mixed Gaussian function to realize the mapping between the convolution kernel and the nodes in the neighborhood of the graph. The output of the DGCL is the input of the CNN. We further implement a backward-propagation optimization process of the convolutional layer by which we incorporate the feature-learning model of the irregular node neighborhood structure into the network. Thereafter, the optimization of the convolution kernel becomes part of the neural network learning process. The DGCNN can accept arbitrary scaled and disordered neighborhood graph structures as the receptive fields of CNNs, which reduces information loss during graph transformation. Finally, we perform experiments on multiple standard graph datasets. The results show that the proposed method outperforms the state-of-the-art methods in graph classification and retrieval.
• [cs.LG]Generalized Zero-Shot Learning via Synthesized Examples
Gundeep Arora, Vinay Kumar Verma, Ashish Mishra, Piyush Rai
http://arxiv.org/abs/1712.03878v1
We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can subsequently be used to train any off-the-shelf classification model. One of the key aspects of our encoder-decoder architecture is a feedback-driven mechanism in which a discriminator (a multivariate regressor) learns to map the generated exemplars to the corresponding class attribute vectors, leading to an improved generator. Our model's ability to generate and leverage examples from unseen classes to train the classification model naturally helps to mitigate the bias towards predicting seen classes in generalized zero-shot learning settings. Through a comprehensive set of experiments, we show that our model outperforms several state-of-the-art methods, on several benchmark datasets, for both standard as well as generalized zero-shot learning.
• [cs.LG]Gradient Normalization & Depth Based Decay For Deep Learning
Robert Kwiatkowski, Oscar Chang
http://arxiv.org/abs/1712.03607v1
In this paper we introduce a novel method of gradient normalization and decay with respect to depth. Our method leverages the simple concept of normalizing all gradients in a deep neural network, and then decaying said gradients with respect to their depth in the network. Our proposed normalization and decay techniques can be used in conjunction with most current state of the art optimizers and are a very simple addition to any network. This method, although simple, showed improvements in convergence time on state of the art networks such as DenseNet and ResNet on image classification tasks, as well as on an LSTM for natural language processing tasks.
• [cs.LG]Learning Modality-Invariant Representations for Speech and Images
Kenneth Leidal, David Harwath, James Glass
http://arxiv.org/abs/1712.03897v1
In this paper, we explore the unsupervised learning of a semantic embedding space for co-occurring sensory inputs. Specifically, we focus on the task of learning a semantic vector space for both spoken and handwritten digits using the TIDIGITs and MNIST datasets. Current techniques encode image and audio/textual inputs directly to semantic embeddings. In contrast, our technique maps an input to the mean and log variance vectors of a diagonal Gaussian from which sample semantic embeddings are drawn. In addition to encouraging semantic similarity between co-occurring inputs,our loss function includes a regularization term borrowed from variational autoencoders (VAEs) which drives the posterior distributions over embeddings to be unit Gaussian. We can use this regularization term to filter out modality information while preserving semantic information. We speculate this technique may be more broadly applicable to other areas of cross-modality/domain information retrieval and transfer learning.
• [cs.LG]MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments
Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun
http://arxiv.org/abs/1712.03931v1
We present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. The simulator leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites. We use MINOS to benchmark deep-learning-based navigation methods, to analyze the influence of environmental complexity on navigation performance, and to carry out a controlled study of multimodality in sensorimotor learning. The experiments show that current deep reinforcement learning approaches fail in large realistic environments. The experiments also indicate that multimodality is beneficial in learning to navigate cluttered scenes. MINOS is released open-source to the research community at http://minosworld.org . A video that shows MINOS can be found at https://youtu.be/c0mL9K64q84
• [cs.LG]Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan, Ying Xiao, Rif A. Saurous
http://arxiv.org/abs/1712.03298v1
Progress in deep learning is slowed by the days or weeks it takes to train large models. The natural solution of using more hardware is limited by diminishing returns, and leads to inefficient use of additional resources. In this paper, we present a large batch, stochastic optimization algorithm that is both faster than widely used algorithms for fixed amounts of computation, and also scales up substantially better as more computational resources become available. Our algorithm implicitly computes the inverse Hessian of each mini-batch to produce descent directions; we do so without either an explicit approximation to the Hessian or Hessian-vector products. We demonstrate the effectiveness of our algorithm by successfully training large ImageNet models (Inception-V3, Resnet-50, Resnet-101 and Inception-Resnet-V2) with mini-batch sizes of up to 32000 with no loss in validation error relative to current baselines, and no increase in the total number of steps. At smaller mini-batch sizes, our optimizer improves the validation error in these models by 0.8-0.9%. Alternatively, we can trade off this accuracy to reduce the number of training steps needed by roughly 10-30%. Our work is practical and easily usable by others -- only one hyperparameter (learning rate) needs tuning, and furthermore, the algorithm is as computationally cheap as the commonly used Adam optimizer.
• [cs.LG]Peephole: Predicting Network Performance Before Training
Boyang Deng, Junjie Yan, Dahua Lin
http://arxiv.org/abs/1712.03351v1
The quest for performant networks has been a significant force that drives the advancements of deep learning in recent years. While rewarding, improving network design has never been an easy journey. The large design space combined with the tremendous cost required for network training poses a major obstacle to this endeavor. In this work, we propose a new approach to this problem, namely, predicting the performance of a network before training, based on its architecture. Specifically, we develop a unified way to encode individual layers into vectors and bring them together to form an integrated description via LSTM. Taking advantage of the recurrent network's strong expressive power, this method can reliably predict the performances of various network architectures. Our empirical studies showed that it not only achieved accurate predictions but also produced consistent rankings across datasets -- a key desideratum in performance prediction.
• [cs.LG]Robust Deep Reinforcement Learning with Adversarial Attacks
Anay Pattanaik, Zhenyi Tang, Shuijing Liu, Gautham Bommannan, Girish Chowdhary
http://arxiv.org/abs/1712.03632v1
This paper proposes adversarial attacks for Reinforcement Learning (RL) and then improves the robustness of Deep Reinforcement Learning algorithms (DRL) to parameter uncertainties with the help of these attacks. We show that even a naively engineered attack successfully degrades the performance of DRL algorithm. We further improve the attack using gradient information of an engineered loss function which leads to further degradation in performance. These attacks are then leveraged during training to improve the robustness of RL within robust control framework. We show that this adversarial training of DRL algorithms like Deep Double Q learning and Deep Deterministic Policy Gradients leads to significant increase in robustness to parameter variations for RL benchmarks such as Cart-pole, Mountain Car, Hopper and Half Cheetah environment.
• [cs.LG]Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently
Yaodong Yu, Difan Zou, Quanquan Gu
http://arxiv.org/abs/1712.03950v1
We propose a family of nonconvex optimization algorithms that are able to save gradient and negative curvature computations to a large extent, and are guaranteed to find an approximate local minimum with improved runtime complexity. At the core of our algorithms is the division of the entire domain of the objective function into small and large gradient regions: our algorithms only perform gradient descent based procedure in the large gradient region, and only perform negative curvature descent in the small gradient region. Our novel analysis shows that the proposed algorithms can escape the small gradient region in only one negative curvature descent step whenever they enter it, and thus they only need to perform at most $N_{\epsilon}$ negative curvature direction computations, where $N_{\epsilon}$ is the number of times the algorithms enter small gradient regions. For both deterministic and stochastic settings, we show that the proposed algorithms can potentially beat the state-of-the-art local minima finding algorithms. For the finite-sum setting, our algorithm can also outperform the best algorithm in a certain regime.
• [cs.LG]StrassenNets: Deep learning with a multiplication budget
Michael Tschannen, Aran Khanna, Anima Anandkumar
http://arxiv.org/abs/1712.03942v1
A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) are due to matrix multiplications, both in convolutional and fully connected layers. Matrix multiplications can be cast as $2$-layer sum-product networks (SPNs) (arithmetic circuits), disentangling multiplications and additions. We leverage this observation for end-to-end learning of low-cost (in terms of multiplications) approximations of linear operations in DNN layers. Specifically, we propose to replace matrix multiplication operations by SPNs, with widths corresponding to the budget of multiplications we want to allocate to each layer, and learning the edges of the SPNs from data. Experiments on CIFAR-10 and ImageNet show that this method applied to ResNet yields significantly higher accuracy than existing methods for a given multiplication budget, or leads to the same or higher accuracy compared to existing methods while using significantly fewer multiplications. Furthermore, our approach allows fine-grained control of the tradeoff between arithmetic complexity and accuracy of DNN models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen's matrix multiplication algorithm, i.e., it can learn to multiply $2 \times 2$ matrices using only $7$ multiplications instead of $8$.
• [cs.RO]A cable-driven parallel manipulator with force sensing capabilities for high-accuracy tissue endomicroscopy
Kiyoteru Miyashita, Timo Oude Vrielink, George Mylonas
http://arxiv.org/abs/1712.03374v1
This paper introduces a new surgical end-effector probe, which allows to accurately apply a contact force on a tissue, while at the same time allowing for high resolution and highly repeatable probe movement. These are achieved by implementing a cable-driven parallel manipulator arrangement, which is deployed at the distal-end of a robotic instrument. The combination of the offered qualities can be advantageous in several ways, with possible applications including: large area endomicroscopy and multi-spectral imaging, micro-surgery, tissue palpation, safe energy-based and conventional tissue resection. To demonstrate the concept and its adaptability, the probe is integrated with a modified da Vinci robot instrument.
• [cs.RO]ESD CYCLOPS: A new robotic surgical system for GI surgery
Timo J. C. Oude Vrielink, Ming Zhao, Ara Darzi, George P. Mylonas
http://arxiv.org/abs/1712.03388v1
Gastrointestinal (GI) cancers account for 1.5 million deaths worldwide. Endoscopic Submucosal Dissection (ESD) is an advanced therapeutic endoscopy technique with superior clinical outcome due to the minimally invasive and en bloc removal of tumours. In the western world, ESD is seldom carried out, due to its complex and challenging nature. Various surgical systems are being developed to make this therapy accessible, however, these solutions have shown limited operational workspace, dexterity, or low force exertion capabilities. The current paper shows the ESD CYCLOPS system, a bimanual surgical robotic attachment that can be mounted at the end of any flexible endoscope. The system is able to achieve forces of up to 46N, and showed a mean error of 0.217mm during an elliptical tracing task. The workspace and instrument dexterity is shown by pre-clinical ex vivo trials, in which ESD is succesfully performed by a GI surgeon. The system is currently undergoing pre-clinical in vivo validation.
• [cs.RO]Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand Observations and Continuous Control
M. Yan, I. Frosio, S. Tyree, J. Kautz
http://arxiv.org/abs/1712.03303v1
In the context of deep learning for robotics, we show effective method of training a real robot to grasp a tiny sphere (1.37cm of diameter), with an original combination of system design choices. We decompose the end-to-end system into a vision module and a closed-loop controller module. The two modules use target object segmentation as their common interface. The vision module extracts information from the robot end-effector camera, in the form of a binary segmentation mask of the target. We train it to achieve effective domain transfer by composing real background images with simulated images of the target. The controller module takes as input the binary segmentation mask, and thus is agnostic to visual discrepancies between simulated and real environments. We train our closed-loop controller in simulation using imitation learning and show it is robust with respect to discrepancies between the dynamic model of the simulated and real robot: when combined with eye-in-hand observations, we achieve a 90% success rate in grasping a tiny sphere with a real robot. The controller can generalize to unseen scenarios where the target is moving and even learns to recover from failures.
• [cs.RO]Surgical task-space optimisation of the CYCLOPS robotic system
T. J. C. Oude Vrielink, Y. W. Pang, M. Zhao, S. -L. Lee, A. Darzi, G. P. Mylonas
http://arxiv.org/abs/1712.03720v1
The CYCLOPS is a cable-driven parallel mechanism used for minimally invasive applications, with the ability to be customised to different surgical needs; allowing it to be made procedure- and patient-specific. For adequate optimisation, however, appropriate data on clinical constraints and task-space is required. Whereas the former can be provided through preoperative planning and imaging, the latter remains a problem, primarily for highly dexterous MIS systems. The current work focuses on the development of a task-space optimisation method for the CYCLOPS system and the development of a data collection method in a simulation environment for minimally invasive task-spaces. The same data collection method can be used for the development of other minimally invasive platforms. A case-study is used to illustrate the developed method for Endoscopic Submucosal Dissection (ESD). This paper shows that using this method, the system can be succesfully optimised for this application.
• [cs.RO]Towards Fully Environment-Aware UAVs: Real-Time Path Planning with Online 3D Wind Field Prediction in Complex Terrain
Philipp Oettershagen, Florian Achermann, Benjamin Müller, Daniel Schneider, Roland Siegwart
http://arxiv.org/abs/1712.03608v1
Today, low-altitude fixed-wing Unmanned Aerial Vehicles (UAVs) are largely limited to primitively follow user-defined waypoints. To allow fully-autonomous remote missions in complex environments, real-time environment-aware navigation is required both with respect to terrain and strong wind drafts. This paper presents two relevant initial contributions: First, the literature's first-ever 3D wind field prediction method which can run in real time onboard a UAV is presented. The approach retrieves low-resolution global weather data, and uses potential flow theory to adjust the wind field such that terrain boundaries, mass conservation, and the atmospheric stratification are observed. A comparison with 1D LIDAR data shows an overall wind error reduction of 23% with respect to the zero-wind assumption that is mostly used for UAV path planning today. However, given that the vertical winds are not resolved accurately enough further research is required and identified. Second, a sampling-based path planner that considers the aircraft dynamics in non-uniform wind iteratively via Dubins airplane paths is presented. Performance optimizations, e.g. obstacle-aware sampling and fast 2.5D-map collision checks, render the planner 50% faster than the Open Motion Planning Library (OMPL) implementation. Test cases in Alpine terrain show that the wind-aware planning performs up to 50x less iterations than shortest-path planning and is thus slower in low winds, but that it tends to deliver lower-cost paths in stronger winds. More importantly, in contrast to the shortest-path planner, it always delivers collision-free paths. Overall, our initial research demonstrates the feasibility of 3D wind field prediction from a UAV and the advantages of wind-aware planning. This paves the way for follow-up research on fully-autonomous environment-aware navigation of UAVs in real-life missions and complex terrain.
• [cs.SD]A Cascade Architecture for Keyword Spotting on Mobile Devices
Alexander Gruenstein, Raziel Alvarez, Chris Thornton, Mohammadali Ghodrat
http://arxiv.org/abs/1712.03603v1
We present a cascade architecture for keyword spotting with speaker verification on mobile devices. By pairing a small computational footprint with specialized digital signal processing (DSP) chips, we are able to achieve low power consumption while continuously listening for a keyword.
• [cs.SD]Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models
Chanwoo Kim, Ehsan Variani, Arun Narayanan, Michiel Bacchiani
http://arxiv.org/abs/1712.03439v1
In this paper, we describe how to efficiently implement an acoustic room simulator to generate large-scale simulated data for training deep neural networks. Even though Google Room Simulator in [1] was shown to be quite effective in reducing the Word Error Rates (WERs) for far-field applications by generating simulated far-field training sets, it requires a very large number of Fast Fourier Transforms (FFTs) of large size. Room Simulator in [1] used approximately 80 percent of Central Processing Unit (CPU) usage in our CPU + Graphics Processing Unit (GPU) training architecture [2]. In this work, we implement an efficient OverLap Addition (OLA) based filtering using the open-source FFTW3 library. Further, we investigate the effects of the Room Impulse Response (RIR) lengths. Experimentally, we conclude that we can cut the tail portions of RIRs whose power is less than 20 dB below the maximum power without sacrificing the speech recognition accuracy. However, we observe that cutting RIR tail more than this threshold harms the speech recognition accuracy for rerecorded test sets. Using these approaches, we were able to reduce CPU usage for the room simulator portion down to 9.69 percent in CPU/GPU training architecture. Profiling result shows that we obtain 22.4 times speed-up on a single machine and 37.3 times speed up on Google's distributed training infrastructure.
• [econ.EM]A Random Attention Model
Matias D. Cattaneo, Xinwei Ma, Yusufcan Masatlioglu, Elchin Suleymanov
http://arxiv.org/abs/1712.03448v1
We introduce a Random Attention Model (RAM) allowing for a large class of stochastic consideration maps in the context of an otherwise canonical limited attention model for decision theory. The model relies on a new restriction on the unobserved, possibly stochastic consideration map, termed \textit{Monotonic Attention}, which is intuitive and nests many recent contributions in the literature on limited attention. We develop revealed preference theory within RAM and obtain precise testable implications for observable choice probabilities. Using these results, we show that a set (possibly a singleton) of strict preference orderings compatible with RAM is identifiable from the decision maker's choice probabilities, and establish a representation of this identified set of unobserved preferences as a collection of inequality constrains on her choice probabilities. Given this nonparametric identification result, we develop uniformly valid inference methods for the (partially) identifiable preferences. We showcase the performance of our proposed econometric methods using simulations, and provide general-purpose software implementation of our estimation and inference results in the \texttt{R} software package \texttt{ramchoice}. Our proposed econometric methods are computationally very fast to implement.
• [eess.SP]Identifying the Mislabeled Training Samples of ECG Signals using Machine Learning
Yaoguang Li, Wei Cui, Cong Wang
http://arxiv.org/abs/1712.03792v1
The classification accuracy of electrocardiogram signal is often affected by diverse factors in which mislabeled training samples issue is one of the most influential problems. In order to mitigate this negative effect, the method of cross validation is introduced to identify the mislabeled samples. The method utilizes the cooperative advantages of different classifiers to act as a filter for the training samples. The filter removes the mislabeled training samples and retains the correctly labeled ones with the help of 10-fold cross validation. Consequently, a new training set is provided to the final classifiers to acquire higher classification accuracies. Finally, we numerically show the effectiveness of the proposed method with the MIT-BIH arrhythmia database.
• [eess.SP]Noise Level Estimation for Overcomplete Dictionary Learning Based on Tight Asymptotic Bounds
Rui Chen, Changshui Yang, Huizhu Jia, Xiaodong Xie
http://arxiv.org/abs/1712.03381v1
In this letter, we address the problem of estimating Gaussian noise level from the trained dictionaries in update stage. We first provide rigorous statistical analysis on the eigenvalue distributions of a sample covariance matrix. Then we propose an interval-bounded estimator for noise variance in high dimensional setting. To this end, an effective estimation method for noise level is devised based on the boundness and asymptotic behavior of noise eigenvalue spectrum. The estimation performance of our method has been guaranteed both theoretically and empirically. The analysis and experiment results have demonstrated that the proposed algorithm can reliably infer true noise levels, and outperforms the relevant existing methods.
• [eess.SP]Wireless Energy Beamforming Using Signal Strength Feedback
Samith Abeywickrama, Tharaka Samarasinghe, Chin Keong Ho
http://arxiv.org/abs/1712.03531v1
Multiple antenna techniques, that allow energy beamforming, have been looked upon as a possible candidate for increasing the efficiency of the transfer process between the energy transmitter (ET) and the energy receiver (ER) in wireless energy transfer. This paper introduces a novel scheme that facilitates energy beamforming by utilizing Received Signal Strength Indicator (RSSI) values to estimate the channel. Firstly, in the training stage, the ET will transmit sequentially using each beamforming vector in a codebook, which is pre-defined using a Cramer-Rao lower bound analysis. The RSSI value corresponding to each beamforming vector is fed back to the ET, and these values are used to estimate the channel through a maximum likelihood analysis. The results that are obtained are remarkably simple, requires minimal processing, and can be easily implemented. Also, the results are general and hold for all well known fading models. The paper also validates the analytical results numerically, as well as experimentally, and it is shown that the proposed method achieves impressive results in wireless energy transfer.
• [hep-lat]Towards reduction of autocorrelation in HMC by machine learning
Akinori Tanaka, Akio Tomiya
http://arxiv.org/abs/1712.03893v1
In this paper we propose new algorithm to reduce autocorrelation in Markov chain Monte-Carlo algorithms for euclidean field theories on the lattice. Our proposing algorithm is the Hybrid Monte-Carlo algorithm (HMC) with restricted Boltzmann machine. We examine the validity of the algorithm by employing the phi-fourth theory in three dimension. We observe reduction of the autocorrelation both in symmetric and broken phase as well. Our proposing algorithm provides consistent central values of expectation values of the action density and one-point Green's function with ones from the original HMC in both the symmetric phase and broken phase within the statistical error. On the other hand, two-point Green's functions have slight difference between one calculated by the HMC and one by our proposing algorithm in the symmetric phase. Furthermore, near the criticality, the distribution of the one-point Green's function differs from the one from HMC. We discuss the origin of discrepancies and its improvement.
• [math.OC]A Non-Cooperative Game Approach to Autonomous Racing
Alexander Liniger, John Lygeros
http://arxiv.org/abs/1712.03913v1
We consider autonomous racing of two cars and present an approach to formulate the decision making as a non-cooperative non-zero-sum game. The game is formulated by restricting both players to fulfill static track constraints as well as collision constraints which depend on the combined actions of the two players. At the same time the players try to maximize their own progress. In the case where the action space of the players is finite, the racing game can be reformulated as a bimatrix game. For this bimatrix game, we show that the actions obtained by a sequential maximization approach where only the follower considers the action of the leader are identical to a Stackelberg and a Nash equilibrium in pure strategies. Furthermore, we propose a game promoting blocking, by additionally rewarding the leading car for staying ahead at the end of the horizon. We show that this changes the Stackelberg equilibrium, but has a minor influence on the Nash equilibria. For an online implementation, we propose to play the games in a moving horizon fashion, and we present two methods for guaranteeing feasibility of the resulting coupled repeated games. Finally, we study the performance of the proposed approaches in simulation for a set-up that replicates the miniature race car tested at the Automatic Control Laboratory of ETH Zurich. The simulation study shows that the presented games can successfully model different racing behaviors and generate interesting racing situations.
• [math.OC]Novel model-based heuristics for energy optimal motion planning of an autonomous vehicle using A*
Zlatan Ajanovic, Michael Stolz, Martin Horn
http://arxiv.org/abs/1712.03719v1
Predictive motion planning is the key to achieve energy-efficient driving, which is one of the main benefits of automated driving. Researchers have been studying the planning of velocity trajectories, a simpler form of motion planning, for over a decade now and many different methods are available. Dynamic programming has shown to be the most common choice due to its numerical background and ability to include nonlinear constraints and models. Although planning of optimal trajectory is done in a systematic way, dynamic programming doesn't use any knowledge about the considered problem to guide the exploration and therefore explores all possible trajectories. A* is an algorithm which enables using knowledge about the problem to guide the exploration to the most promising solutions first. Knowledge has to be represented in a form of a heuristic function, which gives an optimistic estimate of cost for transitioning between two states, which is not a straightforward task. This paper presents a novel heuristics incorporating air drag and auxiliary power as well as operational costs of the vehicle, besides kinetic and potential energy and rolling resistance known in the literature. Furthermore, optimal cruising velocity, which depends on vehicle aerodynamic properties and auxiliary power, is derived. Results are compared for different variants of heuristic functions and dynamic programming as well.
• [math.PR]Statistical manifolds from optimal transport
Ting-Kam Leonard Wong
http://arxiv.org/abs/1712.03610v1
Divergences, also known as contrast functions, are distance-like quantities defined on manifolds of non-negative or probability measures and they arise in various theoretical and applied problems. Using ideas in optimal transport, we introduce and study a parameterized family of $L^{(\pm \alpha)}$-divergences which includes the Bregman divergence corresponding to the Euclidean quadratic cost, and the $L$-divergence introduced by Pal and Wong in connection with portfolio theory and a logarithmic cost function. Using this unified framework which elucidates the arguments in our previous work, we prove that these divergences induce geometric structures that are dually projectively flat with constant curvatures, and the generalized Pythagorean theorem holds true. Conversely, we show that if a statistical manifold is dually projectively flat with constant curvature $\pm \alpha$ with $\alpha > 0$, then it is locally induced by an $L^{(\mp \alpha)}$-divergence. We define in this context a canonical divergence which extends the one for dually flat manifolds. Finally, we study generalizations of exponential family and show that the $L^{(\pm \alpha)}$-divergence of the corresponding potential functions gives the R'{e}nyi divergence.
• [math.ST]Asymptotically optimal empirical Bayes inference in a piecewise constant sequence model
Ryan Martin, Weining Shen
http://arxiv.org/abs/1712.03848v1
Inference on high-dimensional parameters in structured linear models is an important statistical problem. In this paper, for the piecewise constant Gaussian sequence model, we develop a new empirical Bayes solution that enjoys adaptive minimax posterior concentration rates and, thanks to the conjugate form of the empirical prior, relatively simple posterior computations.
• [math.ST]False Discovery Control for Pairwise Comparisons - An Asymptotic Solution to Williams, Jones and Tukey's Conjecture
Weidong Liu, Dennis Leung, Qiman Shao
http://arxiv.org/abs/1712.03305v1
Under weak moment and asymptotic conditions, we offer an affirmative answer to whether the BH procedure (Benjamini and Hochberg, 1995) can control the false discovery rate in testing pairwise comparisons of means under a one-way ANOVA layout. Specifically, despite the fact that the two sample t-statistics do not exhibit positive regression dependency (Benjamini and Yekutieli, 2001), our result shows that the BH procedure can asymptotically control the directional false discovery rate as conjectured by Williams, Jones, and Tukey (1999). Such a result is useful for most general situations when the number of variables is moderately large and/or when idealistic assumptions such as normality and a balanced design are violated.
• [math.ST]Finite sample Bernstein - von Mises theorems for functionals and spectral projectors of covariance matrix
Igor Silin
http://arxiv.org/abs/1712.03522v1
We demonstrate that a prior influence on the posterior distribution of covariance matrix vanishes as sample size grows. The assumptions on a prior are explicit and mild. The results are valid for a finite sample and admit the dimension $p$ growing with the sample size $n$. We exploit the described fact to derive the finite sample Bernstein - von Mises theorem for functionals of covariance matrix (e.g. eigenvalues) and to find the posterior distribution of the Frobenius distance between spectral projector and empirical spectral projector. This can be useful for constructing sharp confidence sets for the true value of the functional or for the true spectral projector.
• [math.ST]Posterior distribution existence and error control in Banach spaces
J. Andrés Christen
http://arxiv.org/abs/1712.03299v1
We generalize the results of \cite{Christen2017} on expected Bayes factors (BF) to control the numerical error in the posterior distribution to an infinite dimensional setting when considering Banach functional spaces. The main result is a bound on the absolute global error to be tolerated by the Forward Map numerical solver, to keep the BF of the numerical vs. the theoretical model near to 1, now in this more general setting, possibly including a truncated, finite dimensional approximate prior measure. In so doing we found a far more general setting to define and prove existence of the infinite dimensional posterior distribution than that depicted in, for example, \cite{Stuart2010}. Discretization consistency and rates of convergence are also investigated in this general setting for the Bayesian inverse problem.
• [math.ST]Stochastic Restricted Biased Estimators in misspecified regression model with incomplete prior information
Manickavasagar Kayanan, Pushpakanthie Wijekoon
http://arxiv.org/abs/1712.03358v1
In this article, the analysis of misspecification was extended to the recently introduced stochastic restricted biased estimators when multicollinearity exists among the explanatory variables. The Stochastic Restricted Ridge Estimator (SRRE), Stochastic Restricted Almost Unbiased Ridge Estimator (SRAURE), Stochastic Restricted Liu Estimator (SRLE), Stochastic Restricted Almost Unbiased Liu Estimator (SRAULE), Stochastic Restricted Principal Component Regression Estimator (SRPCR), Stochastic Restricted r-k class estimator (SRrk) and Stochastic Restricted r-d class estimator (SRrd) were examined in the misspecified regression model due to missing relevant explanatory variables when incomplete prior information of the regression coefficients is available. Further, the superiority conditions between estimators and their respective predictors were obtained in the mean square error matrix sense (MSEM). Finally, a numerical example and a Monte Carlo simulation study were done to illustrate the theoretical findings.
• [math.ST]Testing homogeneity of proportions from sparse binomial data with a large number of groups
Junyong Park
http://arxiv.org/abs/1712.03317v1
In this paper, we consider testing the homogeneity for proportions in independent binomial distributions especially when data are sparse for large number of groups. We provide broad aspects of our proposed tests such as theoretical studies, simulations and real data application. We present the asymptotic null distributions and asymptotic powers for our proposed tests and compare their performance with existing tests. Our simulation studies show that none of tests dominate the others, however our proposed test and a few tests are expected to control given sizes and obtain significant powers. We also present a real example regarding safety concerns associated with Avandiar (rosiglitazone) in Nissen and Wolsky (2007).
• [physics.chem-ph]Reinforced dynamics of large atomic and molecular systems
Linfeng Zhang, Han Wang, Weinan E
http://arxiv.org/abs/1712.03461v1
A new approach for efficiently exploring the configuration space and computing the free energy of large atomic and molecular systems is proposed, motivated by an analogy with reinforcement learning. There are two major components in this new approach. Like metadynamics, it allows for an efficient exploration of the configuration space by adding an adaptively computed biasing potential to the original dynamics. Like deep reinforcement learning, this biasing potential is trained on the fly using deep neural networks, with data collected judiciously from the exploration. Applications to the full-atom, explicit solvent models of alanine dipeptide and tripeptide show some of promise for this new approach.
• [physics.comp-ph]DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics
Han Wang, Linfeng Zhang, Jiequn Han, Weinan E
http://arxiv.org/abs/1712.03641v1
Recent developments in many-body potential energy representation via deep learning have brought new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Here we describe DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics. Potential applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems. DeePMD-kit is interfaced with TensorFlow, one of the most popular deep learning frameworks, making the training process highly automatic and efficient. On the other end, DeePMD-kit is interfaced with high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, i.e., LAMMPS and the i-PI, respectively. Thus, upon training, the potential energy and force field models can be used to perform efficient molecular simulations for different purposes. As an example of the many potential applications of the package, we use DeePMD-kit to learn the interatomic potential energy and forces of a water model using data obtained from density functional theory. We demonstrate that the resulted molecular dynamics model reproduces accurately the structural information contained in the original model.
• [physics.soc-ph]Crowdsourcing accurately and robustly predicts Supreme Court decisions
Daniel Martin Katz, Michael James Bommarito II, Josh Blackman
http://arxiv.org/abs/1712.03846v1
Scholars have increasingly investigated "crowdsourcing" as an alternative to expert-based judgment or purely data-driven approaches to predicting the future. Under certain conditions, scholars have found that crowdsourcing can outperform these other approaches. However, despite interest in the topic and a series of successful use cases, relatively few studies have applied empirical model thinking to evaluate the accuracy and robustness of crowdsourcing in real-world contexts. In this paper, we offer three novel contributions. First, we explore a dataset of over 600,000 predictions from over 7,000 participants in a multi-year tournament to predict the decisions of the Supreme Court of the United States. Second, we develop a comprehensive crowd construction framework that allows for the formal description and application of crowdsourcing to real-world data. Third, we apply this framework to our data to construct more than 275,000 crowd models. We find that in out-of-sample historical simulations, crowdsourcing robustly outperforms the commonly-accepted null model, yielding the highest-known performance for this context at 80.8% case level accuracy. To our knowledge, this dataset and analysis represent one of the largest explorations of recurring human prediction to date, and our results provide additional empirical support for the use of crowdsourcing as a prediction method.
• [q-bio.QM]Variational auto-encoding of protein sequences
Sam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak
http://arxiv.org/abs/1712.03346v1
Proteins are responsible for the most diverse set of functions in biology. The ability to extract information from protein sequences and to predict the effects of mutations is extremely valuable in many domains of biology and medicine. However the mapping between protein sequence and function is complex and poorly understood. Here we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function. We use this unsupervised approach to cluster natural variants and learn interactions between sets of positions within a protein. This approach generally performs better than baseline methods that consider no interactions within sequences, and in some cases better than the state-of-the-art approaches that use the inverse-Potts model. This generative model can be used to computationally guide exploration of protein sequence space and to better inform rational and automatic protein design.
• [quant-ph]A Characterization of Antidegradable Qubit Channels
Connor Paddock, Jianxin Chen
http://arxiv.org/abs/1712.03399v1
This paper provides a characterization for the set of antidegradable qubit channels. The characterization arises from the correspondence between the antidegradability of a channel and the symmetric extendibility of its Choi operator. Using an inequality derived to describe the set of bipartite qubit states which admit symmetric extension, we are able to characterize the set of all antidegradable qubit channels. Using the characterization we investigate the antidegradability of unital qubit channels and arbitrary qubit channels with respect to the dimension of the environment. We additionally provide a condition which describes qubit channels which are simultaneously degradable and antidegradable along with a classification of self-complementary qubit channels.
• [stat.AP]A practical guide and software for analysing pairwise comparison experiments
Maria Perez-Ortiz, Rafal K. Mantiuk
http://arxiv.org/abs/1712.03686v1
Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense, the use of pairwise comparisons is becoming increasingly popular because of the simplicity of this experimental procedure. However, this strategy requires non-trivial data analysis to aggregate the comparison ranks into a quality scale and analyse the results, in order to take full advantage of the collected data. This paper explains the process of translating pairwise comparison data into a measurement scale, discusses the benefits and limitations of such scaling methods and introduces a publicly available software in Matlab. We improve on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low. Most of our examples focus on image quality assessment.
• [stat.AP]Examining the Effects of Objective Hurricane Risks and Community Resilience on Risk Perceptions of Hurricanes at the County Level in the U.S. Gulf Coast: An Innovative Approach
Wanyun Shao, Maaz Gardezi, Siyuan Xian
http://arxiv.org/abs/1712.03636v1
Community risk perceptions can influence their abilities to cope with coastal hazards such as hurricanes and coastal flooding.Our study presents an initial effort to examine the relationship between community resilience and risk perception at the county level, through innovative construction of aggregate variables. Utilizing the 2012 Gulf Coast Climate Change Survey merged with historical hurricane data and community resilience indicators, we first apply a spatial statistical model to construct a county level risk perception indicator based on survey responses. Next, we employ regression to reveal the relationship between contextual hurricane risk factors and community resilience, on one hand, and county level perceptions of hurricane risks, on the other. Results of this study are directly applicable in the policy making domain as many hazard mitigation plans and adaptation policies are designed and implemented at the county level. Specifically, two major findings stand out. First, the contextual hurricane risks represented by peak height of storm surge associated with the last hurricane landfall and land area exposed to historical storm surge flooding positively affect county level risk perceptions. This indicates that hurricanes another threat wind risks need to be clearly communicated with the public and fully incorporated into hazard mitigation plans and adaptation policies. Second, two components of community resilience higher levels of economic resilience and community capital are found to lead to heightened perceptions of hurricane risks, which suggests that concerted efforts are needed to raise awareness of hurricane risks among counties with less economic and community capitals.
• [stat.AP]Reliability-centered maintenance: analyzing failure in harvest sugarcane machine using some generalizations of the Weibull distribution
Pedro Luiz Ramos, Diego Nascimento, Camila Cocolo, Márcio José Nicola, Carlos Alonso, Luiz Gustavo Ribeiro, André Ennes, Francisco Louzada
http://arxiv.org/abs/1712.03304v1
In this study we considered five generalizations of the standard Weibull distribution to describe the lifetime of two important components of harvest sugarcane machines. The harvesters considered in the analysis does the harvest of an average of 20 tons of sugarcane per hour and their malfunction may lead to major losses, therefore, an effective maintenance approach is of main interesting for cost savings. For the considered distributions, the mathematical background is presented. Maximum likelihood is used for parameter estimation. Further, different discrimination procedures were used to obtain the best fit for each component. At the end, we propose a maintenance scheduling for the components of the harvesters using predictive analysis.
• [stat.CO]Fast nonparametric near-maximum likelihood estimation of a mixing density
Minwoo Chae, Ryan Martin, Stephen G. Walker
http://arxiv.org/abs/1712.03852v1
Mixture models are regularly used in density estimation applications, but the problem of estimating the mixing distribution remains a challenge. Nonparametric maximum likelihood produce estimates of the mixing distribution that are discrete, and these may be hard to interpret when the true mixing distribution is believed to have a smooth density. In this paper, we investigate an algorithm that produces a sequence of smooth estimates that has been conjectured to converge to the nonparametric maximum likelihood estimator. Here we give a rigorous proof of this conjecture, and propose a new data-driven stopping rule that produces smooth near-maximum likelihood estimates of the mixing density, and simulations demonstrate the quality empirical performance of this estimator.
• [stat.ME]Analysis-of-marginal-Tail-Means - a new method for robust parameter optimization
Simon Mak, C. F. Jeff Wu
http://arxiv.org/abs/1712.03589v1
This paper presents a novel method, called Analysis-of-marginal-Tail-Means (ATM), for parameter optimization over a large, discrete design space. The key advantage of ATM is that it offers effective and robust optimization performance for both smooth and rugged response surfaces, using only a small number of function evaluations. This method can therefore tackle a wide range of engineering problems, particularly in applications where the performance metric to optimize is "black-box" and expensive to evaluate. The ATM framework unifies two parameter optimization methods in the literature: the Analysis-of-marginal-Means (AM) approach (Taguchi, 1986), and the Pick-the-Winner (PW) approach (Wu et al., 1990). In this paper, we show that by providing a continuum between AM and PW via the novel idea of marginal tail means, the proposed method offers a balance between three fundamental trade-offs. By adaptively tuning these trade-offs, ATM can then provide excellent optimization performance over a broad class of response surfaces using limited data. We illustrate the effectiveness of ATM using several numerical examples, and demonstrate how such a method can be used to solve two real-world engineering design problems.
• [stat.ME]Comparative analysis of criteria for filtering time series of word usage frequencies
Inna A. Belashova, Vladimir V. Bochkarev
http://arxiv.org/abs/1712.03512v1
This paper describes a method of nonlinear wavelet thresholding of time series. The Ramachandran-Ranganathan runs test is used to assess the quality of approximation. To minimize the objective function, it is proposed to use genetic algorithms - one of the stochastic optimization methods. The suggested method is tested both on the model series and on the word frequency series using the Google Books Ngram data. It is shown that method of filtering which uses the runs criterion shows significantly better results compared with the standard wavelet thresholding. The method can be used when quality of filtering is of primary importance but not the speed of calculations.
• [stat.ME]Comparing Graph Spectra of Adjacency and Laplacian Matrices
J. F. Lutzeyer, A. T. Walden
http://arxiv.org/abs/1712.03769v1
Typically, graph structures are represented by one of three different matrices: the adjacency matrix, the unnormalised and the normalised graph Laplacian matrices. The spectral (eigenvalue) properties of these different matrices are compared. For each pair, the comparison is made by applying an affine transformation to one of them, which enables comparison whilst preserving certain key properties such as normalised eigengaps. Bounds are given on the eigenvalue differences thus found, which depend on the minimum and maximum degree of the graph. The monotonicity of the bounds and the structure of the graphs are related. The bounds on a real social network graph, and on three model graphs, are illustrated and analysed. The methodology is extended to provide bounds on normalised eigengap differences which again turn out to be in terms of the graph's degree extremes. It is found that if the degree extreme difference is large, different choices of representation matrix may give rise to disparate inference drawn from graph signal processing algorithms; smaller degree extreme differences result in consistent inference, whatever the choice of representation matrix. The different inference drawn from signal processing algorithms is visualised using the spectral clustering algorithm on the three representation matrices corresponding to a model graph and a real social network graph.
• [stat.ME]Dynamic Mixed Frequency Synthesis for Economic Nowcasting
Kenichiro McAlinn
http://arxiv.org/abs/1712.03646v1
We develop a novel Bayesian framework for dynamic modeling of mixed frequency data to nowcast quarterly U.S. GDP growth. The introduced framework utilizes foundational Bayesian theory and treats data sampled at different frequencies as latent factors that are later synthesized, allowing flexible methodological specifications based on interests and utility. Time-varying inter-dependencies between the mixed frequency data are learnt and effectively mapped onto easily interpretable parameters. A macroeconomic study of nowcasting quarterly U.S. GDP growth using a number of monthly economic variables demonstrates improvements in terms of nowcast performance and interpretability compared to the standard in the literature. The study further shows that incorporating information during a quarter markedly improves the performance in terms of both point and density nowcasts.
• [stat.ME]Ensembles of Regularized Linear Models
Anthony Christidis, Laks V. S. Lakshmanan, Ezequiel Smucler, Ruben Zamar
http://arxiv.org/abs/1712.03561v1
We propose an approach for building ensembles of regularized linear models by optimizing a novel objective function, that encourages sparsity within each model and diversity among them. Our procedure works on top of a given penalized linear regression estimator (e.g., Lasso, Elastic Net, SCAD) by fitting it to possibly overlapping subsets of features, while at the same time encouraging diversity among the subsets, to reduce the correlation between the predictions that result from each fitted model. The predictions from the models are then aggregated. For the case of an Elastic Net penalty and orthogonal predictors, we give a closed form solution for the regression coefficients in each of the ensembled models. An extensive simulation study and real-data applications show that the proposed method systematically improves the prediction accuracy of the base linear estimators being ensembled. Extensions to GLMs and other models are discussed.
• [stat.ME]Exceedance as a measure of sparsity
Peter McCullagh, Nicholas Polson
http://arxiv.org/abs/1712.03889v1
Sparsity is defined as a limiting property of a sequence of probability distributions. It is characterized by a rate parameter and an exceedance measure, which may be finite or infinite. Many sparse integrals, including the signal-plus-noise convolution, are shown to depend on the signal distribution only through its rate parameter and exceedance measure. For statistical purposes, two sparse families having the same, or proportional, exceedance measures are equivalent to first order. Relative to the standard Gaussian distribution, the signal-plus-noise convolution is subject to tail inflation, the nature and extent of which is also determined by the exceedance measure and the sparsity rate. The relationship between the tail-inflation factor and the exceedance measure is given explicitly for the inverse-power measures by the convolution-mixture theorem, which expresses the signal-noise convolution as a specific two-component mixture.
• [stat.ME]Maximum entropy low-rank matrix recovery
Simon Mak, Yao Xie
http://arxiv.org/abs/1712.03310v1
We propose a novel, information-theoretic method, called MaxEnt, for efficient data requisition for low-rank matrix recovery. This proposed method has important applications to a wide range of problems in image processing, text document indexing and system identification, and is particularly effective when the desired matrix $\mathbf{X}$ is high-dimensional, and measurements from $\mathbf{X}$ are expensive to obtain. Fundamental to this design approach is the so-called maximum entropy principle, which states that the measurement masks which maximize the entropy of observations also maximize the information gain on the unknown matrix $\mathbf{X}$. Coupled with a low-rank stochastic model for $\mathbf{X}$, such a principle (i) reveals several insightful connections between information-theoretic sampling, compressive sensing and coding theory, and (ii) yields efficient algorithms for constructing initial and adaptive masks for recovering $\mathbf{X}$, which significantly outperforms random measurements. We illustrate the effectiveness of MaxEnt using several simulation experiments, and demonstrate its usefulness in two real-world applications on image recovery and text document indexing.
• [stat.ML]Capsule Network Performance on Complex Data
Edgar Xi, Selina Bing, Yang Jin
http://arxiv.org/abs/1712.03480v1
In recent years, convolutional neural networks (CNN) have played an important role in the field of deep learning. Variants of CNN's have proven to be very successful in classification tasks across different domains. However, there are two big drawbacks to CNN's: their failure to take into account of important spatial hierarchies between features, and their lack of rotational invariance. As long as certain key features of an object are present in the test data, CNN's classify the test data as the object, disregarding features' relative spatial orientation to each other. This causes false positives. The lack of rotational invariance in CNN's would cause the network to incorrectly assign the object another label, causing false negatives. To address this concern, Hinton et al. propose a novel type of neural network using the concept of capsules in a recent paper. With the use of dynamic routing and reconstruction regularization, the capsule network model would be both rotation invariant and spatially aware. The capsule network has shown its potential by achieving a state-of-the-art result of 0.25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of 0.39%. To further test out the application of capsule networks on data with higher dimensionality, we attempt to find the best set of configurations that yield the optimal test error on CIFAR10 dataset.
• [stat.ML]Causal Inference for Observational Time-Series with Encoder-Decoder Networks
Jason Poulos
http://arxiv.org/abs/1712.03553v1
This paper proposes a method for estimating the causal effect of a discrete intervention in observational time-series data using encoder-decoder recurrent neural networks (RNNs). Encoder-decoder networks, which are special class of RNNs suitable for handling variable-length sequential data, are used to predict a counterfactual time-series of treated unit outcomes. The proposed method does not rely on pretreatment covariates and encoder-decoder networks are capable of learning nonconvex combinations of control unit outcomes to construct a counterfactual. To demonstrate the proposed method, I extend a field experiment studying the effect of radio advertisements on electoral competition to observational time-series.
• [stat.ML]Elastic-net regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection
Huiming Zhang, Jinzhu Jia
http://arxiv.org/abs/1712.03412v1
We study sparse high-dimensional negative binomial regression problem for count data regression by showing non-asymptotic merits of the Elastic-net regularized estimator. With the KKT conditions, we derive two types of non-asymptotic oracle inequalities for the elastic net estimates of negative binomial regression by utilizing Compatibility factor and Stabil Condition, respectively. Based on oracle inequalities we proposed, we firstly show the sign consistency property of the Elastic-net estimators provided that the non-zero components in sparse true vector are large than a proper choice of the weakest signal detection threshold, and the second application is that we give an oracle inequality for bounding the grouping effect with high probability, thirdly, under some assumptions of design matrix, we can recover the true variable set with high probability if the weakest signal detection threshold is large than 3 times the value of turning parameter, at last, we briefly discuss the de-biased Elastic-net estimator.
• [stat.ML]Fast Low-Rank Matrix Estimation without the Condition Number
Mohammadreza Soltani, Chinmay Hegde
http://arxiv.org/abs/1712.03281v1
In this paper, we study the general problem of optimizing a convex function $F(L)$ over the set of $p \times p$ matrices, subject to rank constraints on $L$. However, existing first-order methods for solving such problems either are too slow to converge, or require multiple invocations of singular value decompositions. On the other hand, factorization-based non-convex algorithms, while being much faster, require stringent assumptions on the \emph{condition number} of the optimum. In this paper, we provide a novel algorithmic framework that achieves the best of both worlds: asymptotically as fast as factorization methods, while requiring no dependency on the condition number. We instantiate our general framework for three important matrix estimation problems that impact several practical applications; (i) a \emph{nonlinear} variant of affine rank minimization, (ii) logistic PCA, and (iii) precision matrix estimation in probabilistic graphical model learning. We then derive explicit bounds on the sample complexity as well as the running time of our approach, and show that it achieves the best possible bounds for both cases. We also provide an extensive range of experimental results, and demonstrate that our algorithm provides a very attractive tradeoff between estimation accuracy and running time.
• [stat.ML]Identifiability of Kronecker-structured Dictionaries for Tensor Data
Zahra Shakeri, Anand D. Sarwate, Waheed U. Bajwa
http://arxiv.org/abs/1712.03471v1
This paper derives sufficient conditions for reliable recovery of coordinate dictionaries comprising a Kronecker-structured dictionary that is used for representing $K$th-order tensor data. Tensor observations are generated by a Kronecker-structured dictionary and sparse coefficient tensors that follow the separable sparsity model. This work provides sufficient conditions on the underlying coordinate dictionaries, coefficient and noise distributions, and number of samples that guarantee recovery of the individual coordinate dictionaries up to a specified error with high probability. In particular, the sample complexity to recover $K$ coordinate dictionaries with dimensions $m_k\times p_k$ up to estimation error $r_k$ is shown to be $\max_{k \in [K]}\mathcal{O}(m_kp_k3r_k{-2})$.
• [stat.ML]On Quadratic Penalties in Elastic Weight Consolidation
Ferenc Huszár
http://arxiv.org/abs/1712.03847v1
Elastic weight consolidation (EWC, Kirkpatrick et al, 2017) is a novel algorithm designed to safeguard against catastrophic forgetting in neural networks. EWC can be seen as an approximation to Laplace propagation (Eskin et al, 2004), and this view is consistent with the motivation given by Kirkpatrick et al (2017). In this note, I present an extended derivation that covers the case when there are more than two tasks. I show that the quadratic penalties in EWC are inconsistent with this derivation and might lead to double-counting data from earlier tasks.
• [stat.ML]Sensitivity Analysis for Predictive Uncertainty in Bayesian Neural Networks
Stefan Depeweg, José Miguel Hernández-Lobato, Steffen Udluft, Thomas Runkler
http://arxiv.org/abs/1712.03605v1
We derive a novel sensitivity analysis of input variables for predictive epistemic and aleatoric uncertainty. We use Bayesian neural networks with latent variables as a model class and illustrate the usefulness of our sensitivity analysis on real-world datasets. Our method increases the interpretability of complex black-box probabilistic models.
• [stat.ML]The PhaseLift for Non-quadratic Gaussian Measurements
Christos Thrampoulidis, Ankit Singh Rawat
http://arxiv.org/abs/1712.03638v1
We study the problem of recovering a structured signal $\mathbf{x}0$ from high-dimensional measurements of the form $y=f(\mathbf{a}^T\mathbf{x}0)$ for some nonlinear function $f$. When the measurement vector $\mathbf a$ is iid Gaussian, Brillinger observed in his 1982 paper that $\mu\ell\cdot\mathbf{x}0 = \min{\mathbf{x}}\mathbb{E}(y - \mathbf{a}T\mathbf{x})2$, where $\mu\ell=\mathbb{E}_{\gamma}[\gamma f(\gamma)]$ with $\gamma$ being a standard Gaussian random variable. Based on this simple observation, he showed that, in the classical statistical setting, the least-squares method is consistent. More recently, Plan & Vershynin extended this result to the high-dimensional setting and derived error bounds for the generalized Lasso. Unfortunately, both least-squares and the Lasso fail to recover $\mathbf{x}0$ when $\mu\ell=0$. For example, this includes all even link functions. We resolve this issue by proposing and analyzing an appropriate generic semidefinite-optimization based method. In a nutshell, our idea is to treat such link functions as if they were linear in a lifted space of higher-dimension. An appealing feature of our error analysis is that it captures the effect of the nonlinearity in a few simple summary parameters, which can be particularly useful in system design.
• [stat.ML]Variational Inference over Non-differentiable Cardiac Simulators using Bayesian Optimization
Adam McCarthy, Blanca Rodriguez, Ana Minchole
http://arxiv.org/abs/1712.03353v1
Performing inference over simulators is generally intractable as their runtime means we cannot compute a marginal likelihood. We develop a likelihood-free inference method to infer parameters for a cardiac simulator, which replicates electrical flow through the heart to the body surface. We improve the fit of a state-of-the-art simulator to an electrocardiogram (ECG) recorded from a real patient.
网友评论