In this post, I want to discuss an observation about the root node when solving a MIP model.
Problem Description
We have a large matrix \(\color{darkblue}A=\color{darkblue}a_{i,j}\) with values 0 and 1. In addition, we have a minimum on the row and column totals. These are called \(\color{darkblue}r_i\) and \(\color{darkblue}c_j\). The goal is to remove as many 1's in the matrix \(\color{darkblue}A\) subject to these minimum row and column totals.
Mathematical Model
Now, we can write:
Mathematical Model |
---|
\[ \begin{align} \min& \sum_{i,j|\color{darkblue}a_{i,j}=1} \color{darkred}x_{i,j} \\ & \sum_{j|\color{darkblue}a_{i,j}=1} \color{darkred}x_{i,j} \ge \color{darkblue}r_i && \forall i \\ & \sum_{i|\color{darkblue}a_{i,j}=1} \color{darkred}x_{i,j} \ge \color{darkblue}c_j && \forall j \\ & \color{darkred}x_{i,j} \in \{0,1\} \end{align}\] |
If you prefer, we can also write this as:
Mathematical Model |
---|
\[ \begin{align} \min& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}x_{i,j} \\ & \sum_j \color{darkblue}a_{i,j}\cdot \color{darkred}x_{i,j} \ge \color{darkblue}r_i && \forall i \\ & \sum_i \color{darkblue}a_{i,j}\cdot \color{darkred}x_{i,j} \ge \color{darkblue}c_j && \forall j \\ & \color{darkred}x_{i,j} \in \{0,1\} \end{align}\] |
The post in [1] mentions that the goal is to solve this model with a matrix size of 20,000 rows and 500 columns. As I used a fraction of 10% ones in the matrix \(\color{darkblue}A\), the number of (discrete) variables in the model is close to million: 999,469 (we could have predicted this from \(20,000 \times 500 \times 0.1\)). The number of constraints is 20,500.
Observation
- All matrix coefficients are \(-1\) or \(+1\) (check),
- the right-hand-side should be integer valued (check),
- there are two nonzero coefficients in each column of the constraint matrix (check),
- and these two nonzero coefficients have the value \(-1\) and \(+1\).
---- 88 PARAMETER results MIP LP NETWORK status Optimal Optimal Optimal obj 504711.000504711.000504711.000 time 5141.51514.4374.000 iter 571595.00038276.000 nodes NA NA
Iteration log . . . Iteration: 1 Dual objective = 1079.000000 Perturbation started. Iteration: 606 Dual objective = 499871.000000 Iteration: 2167 Dual objective = 499871.008110 Iteration: 3647 Dual objective = 499871.016553 . . . Iteration: 33697 Dual objective = 504711.126635 Iteration: 33962 Dual objective = 504711.126636 Removing perturbation. Primal simplex solved model. Root relaxation solution time = 16.92 sec. (4357.36 ticks) Nodes Cuts/ Node Left Objective IInf Best Integer Best Bound ItCnt Gap * 0+ 0999469.00000.0000100.00% Found incumbent of value 999469.000000 after 20.62 sec. (6918.29 ticks) * 0+ 0522140.00000.0000100.00% Found incumbent of value 522140.000000 after 5140.89 sec. (6925.92 ticks) * 00 integral 0504711.0000504711.00005715950.00% Elapsed time = 5140.97 sec. (7024.08 ticks, tree = 0.00 MB) Found incumbent of value 504711.000000 after 5140.97 sec. (7024.08 ticks) Root node processing (before b&c): Real time = 5140.98 sec. (7037.52 ticks) Parallel b&c, 16 threads: Real time = 0.00 sec. (0.00 ticks) Sync time (average) = 0.00 sec. Wait time (average) = 0.00 sec. ------------ Total (root+branch&cut) = 5140.98 sec. (7037.52 ticks)
Conclusion
Update: some other solvers do not seem to have this extreme behavior. They solve the MIP and LP models in about the same time. It looks like the poor performance on the MIP model is a particularity of Cplex.
References
- Efficiently remove a maximum amount of binary elements while keeping row and column sums above a certain level, https://stackoverflow.com/questions/76105697/efficiently-remove-a-maximum-amount-of-binary-elements-while-keeping-row-and-col
Appendix: GAMS model
$onText
A very large MIP. It takes some time to solve. However the LP is actually automatically integer valued. This means the relaxation is no good? Finally we also solve as a network model.
Reference: https://stackoverflow.com/questions/76105697/efficiently-remove-a-maximum-amount-of-binary-elements-while-keeping-row-and-col
$offText
*----------------------------------------------------------------------------------- * Data *-----------------------------------------------------------------------------------
set i /i1*i20000/ j /j1*j500/ ;
set a(i,j) 'randomly filled with 10% ones'; a(i,j)$(uniform(0,1)<0.1) = YES;
* calculate rows and column sums parameter rowsum(i), colsum(j); rowsum(i) = sum(a(i,j),1); colsum(j) = sum(a(i,j),1);
parameter r(i),c(j); r(i) = ceil(rowsum(i)/2); c(j) = ceil(colsum(j)/2);
*----------------------------------------------------------------------------------- * MIP Model *-----------------------------------------------------------------------------------
binary variable x(i,j) 'only used when a(i,j)=1'; variable z 'objective';
equations obj erowsum(i) 'minimum number of ones in row' ecolsum(j) 'minimum number of ones in column' ;
obj.. z =e= sum(a,x(a)); erowsum(i).. sum(a(i,j),x(i,j)) =g= r(i); ecolsum(j).. sum(a(i,j),x(i,j)) =g= c(j);
model m /all/;
*----------------------------------------------------------------------------------- * Reporting macro *-----------------------------------------------------------------------------------
acronym Optimal;
parameter results; $macro report(m,label) \ results('status',label) = m.modelstat; \ results('status',label)$(m.modelstat=1) = Optimal; \ results('obj',label) = z.l; \ results('frac',label) = sum((i,j)$(x.l(i,j)>0.001 and x.l(i,j)<0.999),1); \ results('time',label) = m.resusd; \ results('iter',label) = m.iterusd; \ results('nodes',label) = m.nodusd;
*----------------------------------------------------------------------------------- * solve as MIP, LP and Network model *-----------------------------------------------------------------------------------
m.solprint = %solprint.Silent%; option threads = 0, bratio = 1;
solve m minimizing z using mip; report(m,'MIP')
solve m minimizing z using rmip; report(m,'LP')
m.optfile=1; solve m minimizing z using rmip; report(m,'NETWORK')
display results;
*----------------------------------------------------------------------------------- * network option file, for use with third solve *-----------------------------------------------------------------------------------
$onecho > cplex.opt lpmethod 3 $offecho
|
Appendix Highs open source solver
C:\tmp\HiGHS\build\RELEASE\bin>highs --parallel on --solver ipm
\tmp\test\Free.mps |