美文网首页
Machine Learning for Combinatori

Machine Learning for Combinatori

作者: MasterXiong | 来源:发表于2019-05-07 19:36 被阅读0次

Machine Learning for Combinatorial Optimization

1  Introduction

1.1  Background

Operations research -> Integer constrained optimization -> Combinarotial (NP-hard) problems

Machine learning -> Deep learning

How to formulate CO as a ML problem: see generic optimization problems as data points and inquire what is the relevant distribution of problems to use for learning on a given task. 

1.2  Motivation

(a) The researcher assumes expert knowledge about the optimization algorithm, but wants to replace some heavy computations by a fast approximation with ML, which is similar to the value function approximation with ML in RL. 

(b) Explore the space of decisions or strategies in CO, as expert knowledge may not be sufficient and some algorithmic decisions may be unsatisfactory. 

1.3  Key Point

What makes CO with ML different from other ML tasks? What is the key point to develop a good algorithm for CO with ML? 

The answer may be that to understand and exploit the CO structure which acts as a relevant prior for the model, just like that we use CNN for image and RNN for sequential data. 

CO with ML also has the issue of generalization, which is an important topic of all ML models. Different CO problems have different problem structures and use different data to learn, which provides an opportunity for the learning to specialize to sub-problems by fully exploiting their structure. It is also worth noting that traditional CO algorithms might not even work consistently across all possible instances of a problem family, but rather tend to be more adapted to particular structures of problems. Consequently, does it also raise a chance to develop some meta-learning methods for CO with ML? 

2  Preliminaries

2.1  Combinatorial Optimization

Three key factors in a CO problem: Variables: continuous / integer; Constraints: feasible region; Objective function. 

Linear Programming (LP): objecttive function and constraints are all linear; guaranteed to be solved in polynomial time through simplex algorithm or interior points methods. 

Mixed-integer linear programming (MILP): LP with integer variables (with nonconvex feasible region); NP-hard problem. 

2.2  Machine Learning

Supervised Learning: optimization + generalization

Reinforcement Learning: exploration - exploitation dilemma; definition of reward signal

2.3  Deep learning

Size-invariant tachnique: RNN; attention mechanism (especially useful for graph model)

3  Recent Approaches

3.1  Learning Methods

Learning through Demenstration: a supervised approach to approximate a given policy, where training pairs of input state and target actions are provided by the expert. Examples: cutting plane selection in SDP, variable selection and branching node selection in B&B. 

Learning through Experience: learn to discover new, potentially better performing policies by trial and error. Examples: TSP with GNN under a greedy heuristic framework. 

The same problem can be viewed and solved by both of these two methods, depending on your prior assumption and algorithm framework. 

3.2  Algorithmic Structure

How the learned policies (whether from demenstration or experience) are combined with traditional CO algorithms. 

End to end learning: train the ML model to output solutions directly from the input instance. Example: 

Pointer Network (supervised learning with RNN), Bello (RL with RNN), Welling (RL with GNN and attention, stronger prior); 

Another approach: a double stochastic matrix

Old works: Hopfield NN for TSP

Learning meaningful properties of optimization problems: ML provides additional pieces of information to CO. It works like an algorithm selection method, as we use ML as a meta controller to decide whether or not to use some CO methods to solve the problem. The underlying assumption here is that the properties of a CO problem can be parameterized by a ML model. 

ML alongside optimization algorithms: A master algorithm controls the highlevel structure while frequently calling a ML model to assist in lower level decisions. The same ML model is used by the CO algorithm to make the same type of decision in each iteration with respect to the current state of the problem. It works like a meta learning approach to replace some human-designed components in a CO algorithm. Examples: 

ML learns the branching decisions at every node under a B&B framework

ML learns the greedy heuristic for node selection in TSP

4  Methodology

Demonstration vs experience: In the demonstration setting, the performance of the learned policy is bounded by the expert, which is a limitation when the expert is not optimal. This means that imitation alone should be used only if it is significantly faster than the expert to compute the policy. On the contrary, learning from experience can achieve potential better policies. Learning from a reward signal (experience) is also more flexible when multiple decisions are (almost) equally good in comparison with an expert that would favor one (arbitrary) decision. However, RL is more difficult to train and easy to get stuck. Start with demenstration and refine with experience is a good combination. 

Partial observability: does your representation of the current state satisfy MDP? For example, a frame of a game is not sufficient enough to represent the current state. 

Evaluation: how to evaluate an algorithm under CO setting, on which problems (instances) the algorithms are evaluated. 

The learning process encourages the policy to improve the performance of the original optimization algorithm, instead of the whole CO problem. 

Can the algorithm learned on specific training problem instances generalize within a distribution of instances? A multi-task learning approach may be considered. An important question here is how to define a distribution of CO problems, maybe respect to problem structure and size

Exactness and approximation: getting the output of a ML model to respect advanced types of constraints is a hard task. 

5  Challenges

5.1  Feasibility

How to guarantee that the output of ML models satisfy the problem constraints. 

5.2  Modelling

It is crucial to find the proper prior to model CO problems. The architectures used to learn good policies in CO might be very different from what is currently used with deep learning. It is even possible that basic components like SGD may need to be modified to apply well on CO problems. GNN is a promising architecture for CO problems. 

5.3  Scaling

Scaling to larger problems may lead to a problem of generalization in CO, as reported in all existing literatures. 

5.4  Data Generation

To define a problem to solve and collect data for training, we first need to define to which family of instances we want to generalize. Another key issue is data representation. Deciding how to represent the data can have a dramatic impact on learning, such as how to properly represent a B&B node, or even the whole B&B tree? 

6  Conclusions

I think the first key issue in ML for CO is how to define and formulate the problem, such as how to define the problem instance distribution, how to represent the data and state. The second key issue is how to analyze and exploit the structure of different CO problems and choose proper architecture to model it. These factors may be much more important than employing advanced RL algorithms to train the model. 

Learning a policy that generalizes to unseen problems is a challenge, this is why we believe learning should occur on a distribution small enough that the policy could fully exploit the structure of the problem and give better results. 

相关文章

网友评论

      本文标题:Machine Learning for Combinatori

      本文链接:https://www.haomeiwen.com/subject/jdfhnttx.html