# COMBINING MACHINE LEARNING AND MATHEMATICAL OPTIMIZATION – Part 4

*Article four of five*

In our first article we identified four scenarios where Machine Learning can cooperate with Mathematical Optimization. In this article, we will consider Scenario D which is defined as follows:

*ML can be used to help MOPT solvers to perform better not only for finding solutions faster but for finding more good solutions or identifying structure in MOPT problems*

In a paper by Elias B. Khalil, Pierre Le Bodic, Le Song, George Nemhauser, Bistra Dilkina entitled ‘Learning to Branch in Mixed Integer Programming’ an interesting methodology is developed. The authors introduce a framework for learning to branch in MIP. They use IBM ILOG CPLEX for developing their machine learning framework.

The methodology they propose is observing and recording the rankings induced by *Strong Branching* (SB). They develop a function of the variables’ features that will rank them, without the need for the expensive SB computations.

The interesting aspect of the algorithm is that satisfies three desirable properties:

*Node-efficiency*: the learned ranking model uses SB scores and node-specific variable features as training data, yielding smaller search trees.*Time-efficiency*: SB is used in the first phase only, typically for a few hundred nodes. The time required for learning (second phase) is small and the third phase is dominated by feature computations, which are designed to be much cheaper than solving the LPs as does SB.*Adaptiveness*: the learned ranking model is instance specific, as it assigns different weights to features depending on the collected data.

The algorithm can be described thus – given a MIP instance and some parameters, the proceed in three phases:

*Collect data*: for a limited number of nodes, SB is used as a branching strategy. The collect the data and create a training dataset.*Learning*: the dataset is used as input into a learning-to-rank algorithm that outputs a vector of weights for the features, such that some loss function is minimized over the training dataset.*ML-based branching*: SB is no longer used, and the learned weight vector is used to score variables, branching on the one with maximum score until termination.

This Machine Learning framework is brilliant at improving the performance of the branch and bound algorithm. It can be used in ranking and selecting parameters in other areas of the B&B tree such as heuristics, cutting plans and node selection.