gurobi / gurobi-optimods Goto Github PK
View Code? Open in Web Editor NEWData-driven APIs for common optimization tasks
Home Page: https://gurobi-optimization-gurobi-optimods.readthedocs-hosted.com/en/stable
License: Apache License 2.0
Data-driven APIs for common optimization tasks
Home Page: https://gurobi-optimization-gurobi-optimods.readthedocs-hosted.com/en/stable
License: Apache License 2.0
Why this Mod?
What will the API be?
Input data for the basic TSP is just the complete cost/distance matrix (numpy.ndarray)
tour, objective = solve_tsp(distance_matrix)
Symmetric costs/distances allow a more efficient formulation (reducing symmetry) so a differentiation might make sense.
Variants with additional side constraints might require a time matrix for arcs, time windows for edges, other resource matrices for arcs, node pairs representing precedence relations, etc.
In those cases it makes sense to use a graph representation, e.g., via a pandas dataframe.
Additional context
Currently there are too many mods on the one page in api.rst
. This should have subsections or subpages, and be formatted to remove too much namespacing.
Guide to academic full license, maybe later freemium license too.
This should give a full example for workflow needed to take one nup, and add an individual complication to the underlying guroibpy model, and the nup data interface.
This is to reduce file clutter, e.g., 10 mods wanting to add a file 'data.csv'.
After we have a good collection of models, we should think about which dependencies are the default. Should the user have easy ways (via extras) to install specific dependencies for graphs, machine learning, OR models, without getting everything under the sun with the default install?
Why this Nup?
Maximum (weighted) matching. This is a distinctly different model (MIP) from the linear network flow of bipartite matching.
Does it fall under an existing category?
Yes, graphs
What will the API be?
scipy sparse graph input, scipy sparse output (max. matching as a subgraph)
Additional context
We want to have fit-for-purpose models where we assume:
Two use-cases to cover here:
solver_params
keyword argument so the user can override Gurobi parameterscreate_env
should take a parameter dictionary so the mod can override certain settings (in particular in relation to output)The priority should be:
silent
and logfile
Just to show how nupstup scales across models axis
The readme needs an overhaul, it is currently pitched at internal contributors. I will borrow the structure from gurobipy-pandas / gurobi-machinelearning
While we are in private mode, workflows do not run once we take PRs out of draft, so we do not see tests from new contributions. Unsure if this needs a github permissions change organization wide?
To read: https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-private-forks
Why this Mod?
What will the API be?
node_data = pd.DataFrame([
{"node": 0, "cost": -3, "time": 4, "distance": 3},
{"node": 1, "cost": 5, "time": 3, "distance": 7},
{"node": 2, "cost": 6, "time": 1, "distance": 2},
{"node": 3, "cost": -4, "time": 5, "distance": 3}
])
arc_data = pd.DataFrame([
{"source": 0, "target": 1, "cost": 16, "distance": 1},
{"source": 0, "target": 2, "cost": -13, "distance": 3},
{"source": 1, "target": 2, "cost": 10, "distance": 2},
{"source": 2, "target": 1, "cost": -4, "distance": 6},
{"source": 1, "target": 3, "cost": 12, "distance": 3},
{"source": 3, "target": 2, "cost": -9, "distance": 2}
])
limits = {
"time": 10,
"distance": 18
}
path, objective = solve_espprc(node_data=node_data, arc_data=arc_data, source=0, target=3, limits=limits, elementary=True)
Additional context
Why this Nup?
Many practical aspects of asset allocation are of discrete nature: Minimum buy-in, maximum number of allowed trades per period, market segmentation, etc.
Does it fall under an existing category?
What will the API be?
asset_allocs = nupstup.portfolio_optimization(
factor_matrix=H # H.T@H is the variance-covariance matrix
covariance=Sigma # Sigma is the variance-covariance matrix
mu=mu # estimated first moments of return function
maxnum_transactions=20 # No more than 20 trades
maxnum_allocations=50 # No more than 50 allocations at a time (e.g., online problem)
min_invest_long=0.05 # For long allocations, need at least 5% of total investment
min_invest_short=0.01 # For short allocations, need at least 1% of total investment
max_total_short=0.1 # Maximum 10% short selling
[...]
Here the objective function would be min x.T @ H.T @ H @ x - mu @ x
Note that we focus here on the model
min x.T @ H.T @ H @ x - mu @ x
s.t. something
and another NUP should cover the model
max mu @ x
s.t. something
x.T @ H.T @ H @ x <= omega^2
Both Nups can share most code and documentation.
Other random thoughts:
After discussing with @simonbowly and @tobiasachterberg we figured that showing gurobipy code on the individual nup documentation pages doesn't help the interested much, and is confusing to people not intersted in it. We should remove these.
Instead each nup documentation should have an appropriately placed link "View code on github".
Why this Mod?
I have heard Network Design problems being discussed with a few users now so they seem to be receiving a bit of attention. Essentially you have nodes and possible arcs between the nodes as well as commodities that must flow from one specific node to another in certain quantities. There are costs to build the arcs as well as costs for commodities to flow along them. The task is to design the network to minimize the cost such that all the commodities can get where they need to be in the required quantities. This feels like a nice optimod as it can be expressed and visualised with graphs. Alternatively, it could naturally be described in terms of supply chains.
Does it fall under an existing category?
What will the API be?
The data that needs to be conveyed is:
The API could look something like
network, cost = gurobi_optimods.network_design(arc_labeled_network, commodities)
Where:
arc_labeled_network
is a scipy object representing the network with all possible arcs, and each arc would be labeled with the flow cost, fixed cost, and capacity.commodities
is just a list of tuples containing origin, destination, and flow demand.network
is just another scipy object representing an unlabelled network of nodes and arcs, and the cost is just a float representing the cost to build the network.Additional context
Started looking into Network Design problems as supposedly they lend themselves well to Benders Decomposition but have found that the default Gurobi behavior is very strong so think a basic MIP model will suffice for this.
While iterating with a few different mod proposals and PRs the number of merge conflicts there will be minimized if the list is ordered.
In #18 we removed implementation code from the docs. We should instead include a link to the source of each mod (on github) from the docs page so users can get a quick view of the implementation.
Meta issue to make sure we have consistent APIs between network-y mods #39 #36
Networkx is very clean for this, as we can have node and edge attributes and all the data for a mod is bundled into one python object. We probably want to avoid networkx/igraph/others as dependencies, but only if we can find just as clean an API for network models using the existing numpy/scipy/pandas dependencies.
For weighted or unweighted graphs we use scipy sparse arrays (e.g. mwis
and matching
):
# Graph adjacency matrix (upper triangular) as a sparse matrix.
g = nx.cubical_graph()
adjacency_matrix = sp.triu(nx.to_scipy_sparse_array(g))
# Vertex weights
weights = np.array([2**i for i in range(8)])
# Compute maximum weighted independent set.
mwis = maximum_weighted_independent_set(adjacency_matrix, weights)
however this doesn't quite suit networks which need multiple attributes per edge. We could pass multiple arrays, but this is unweildy as the user has to build two scipy objects with duplicated data for the edge pattern. The alignment between sparse indexes and positions in the demand array is not too nice either, as it's restricted to 0 ... n-1
node labels.
cost = ... # sparse array
capacity = ... # sparse array
demand = ... # dense array
min_cost_flow(cost, capacity, demand)
I also wrote a network flow function for use in bipartite matching, which uses dense arrays to pass all the edge source/target indices, costs & capacities. It's meant as a low-level utility for other mods to use though, it would be fast but not a nice API:
gurobi-optimods/src/gurobi_optimods/network_util.py
Lines 11 to 18 in ae7e1d8
So I think pandas is the better alternative here, with input data as below. It's easy to do a direct formulation of a flow problem using gurobipy-pandas as well, see networkflow.md
edge_data = pd.DataFrame([
{"source": 0, "target": 1, "capacity": 16, "cost": 0},
{"source": 0, "target": 2, "capacity": 13, "cost": 0},
{"source": 1, "target": 2, "capacity": 10, "cost": 0},
{"source": 2, "target": 1, "capacity": 4, "cost": 0},
{"source": 1, "target": 3, "capacity": 12, "cost": 0},
{"source": 3, "target": 2, "capacity": 9, "cost": 0},
{"source": 2, "target": 4, "capacity": 14, "cost": 0},
{"source": 4, "target": 3, "capacity": 7, "cost": 0},
{"source": 3, "target": 5, "capacity": 20, "cost": 0},
{"source": 4, "target": 5, "capacity": 4, "cost": 0},
])
node_data = pd.DataFrame([
{"node": 0, "demand": -23},
{"node": 5, "demand": 23}
])
@torressa @stevedwards what do you think (for the final version, ok to continue hacking with networkx for the time being)? It's also fine to use networkx in the docs examples, either to generate networks or produce figures. @maliheha did this for MWIS; her examples generate a networkx graph and convert to scipy using networksx/numpy functions. It's similarly straightforward for pandas, see to_pandas_edgelist which gives the same structure as edge_data
above from a graph with edge attributes.
Why this Nup?
Finance applications are probably plentiful enough to warrant their own nup category. If there is a dedicated Finance category, the maximal Sharpe ratio problem is popular and focused enough to be included.
Note that this is closely related to #9.
Does it fall under an existing category?
Other (Finance)
What will the API be?
asset_allocs = nupstup.max_sharpe_ratio(
cov_matrix=Q # variance-covariance matrix
mu=mu # estimated first moments of return function
rf_rate=rf # risk-free rate
)
where Q
and mu
are scipy/numpy data structures. We could alternatively require the factor matrix H
as an argument instead of the covariance matrix Q
(where Q = H.T @ H
), though I think Q
is more natural considering how the problem statement and model are written.
Additional context
One complication: to be solved as a QP, the model needs to be reformulated in a different variable space. Additional constraints must be added in the variable space of the reformulated model. The mapping between the two models is straightforward, but this does create a barrier for people interested in modifying the underlying model.
min |b - Ax|
s.t. |x|_0 <= k
At least some model where MIP shines.
There ought to be a way to run black automatically on all incoming PRs, I think it makes sense to do on the Python code parts. Whether or not it makes sense on runnable code snippets intended mostly for the documentation I don't know, it may cause issues. For example one may need to squash some stuff together so that it fits nicely on the doc page. We will have to experiment a bit.
Why this Mod?
There are two quite standard geometric problems which can be solved through quadratic optimization.
The first one is a Minimum Polytope Distance problem, where a user provides 2 sets of n-dimensional points and is interested in the minimal distance between the 2 polytopes defined by the 2 sets.
The second one is a Smallest Enclosing Ball problem, where a user provides a set of n-dimensional points and is interested in the smallest ball enclosing all of the provided points.
Does it fall under an existing category?
What will the API be?
distance, point1, point2 = gurobi_optimods.minpoldist(points1, points2)
where points1,points2
could be a pandas dataframe or alternatively a list.
center,radius = gurobi_optimods.minencball(points)
where points
could be a pandas dataframe or alternatively a list.
Additional context
We could add a function which plots the results nicely for 2D problems.
I am not sure how much those problems are used in real world applications. I can image that this could be used in clustering algorithms and statistics but I am not an expert in those fields.
Using sphinx' :param:/:type:/etc directives seem to be quite limited in terms of the formatting of the description, and the result isn't really pretty. See https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html for a discussion on how to brush this up.
The simplicity of a pure data I/O interface like a function, or a class/method call implies that no Env, Model state can be made persistent across multiple invokations. Applications that require such state, say, for dynamic processes, are not in the scope of the NupStup, and should target gurobipy instead. This should be mentioned in the contributing guideline.
Instead of read_feather for datasets, create a datasets submodule.
The returned object from each load_
function is an sklearn Bunch
(basically an attrdict).
>>> from sklearn.datasets import load_iris
>>> data = load_iris()
>>> data.target[[10, 25, 50]]
array([0, 0, 1])
>>> list(data.target_names)
['setosa', 'versicolor', 'virginica']
Could avoid the feather dependency this way if needed, use read_csv and hide the conversions in the datasets code.
We'll need to reference some academic literature for the mods. The simplest option looks like sphinxcontrib-bibtex.
Should live in the "regression problems" category
By now there is only one single place left in the documentation that includes each of the example codes. They should become self standing doc tests at exactly these places.
We could have a set of functions in the mod that compute...
Some of these have interesting weighted variants (stability number/MWIS)
The current categorization was done with the goal of sketching how alike mods may be aggregated/organized in the future. However many people interpreted the categories "real" and flamed us for the categories we have chosen. To avoid further inconvenience we should simply roll back to a flat list of mods and revisit categorization at some later point in time.
Need to decide what to point the formatters at (maybe just the nupstup code itself initially?). I'll copy the configuration over from gurobipy-pandas.
Entrypoint is one function
def diet(data1=..., data2=..):
return dataframe with solution
with data1, data2 being dataframes
Possible examples to follow:
Why this Nup?
Because of the quantum hype, there are so many solvers which solve QUBOs heuristically where the input is just a Q matrix. We can consider having this nup with a similar interface.
Does it fall under an existing category?
What will the API be?
qubo = nupstup.qubo(coeff_matrix=Q)
Where Q
is a scipy sparse graph structure. The output is an array of 0/1s.
Additional context
Downside: This can encourage comparison of Gurobi with quantum or quantum-inspired solvers for problems where we know QUBO is not necessarily the right form.
Why this Nup?
Flow problems are present in many applications.
Particularly, the following share the same formulation (with some small changes):
All single source/sink.
Given a digraph
Where
Problem | ||||
---|---|---|---|---|
1. | ||||
2. | ||||
3. | n/a |
|
Does it fall under an existing category?
Graphs
What will the API be?
nupstup.min_cost_flow(G, costs, capacities, demand, source, sink)
nupstup.max_flow(G, capacities, source, sink)
nupstup.min_cut(G, capacities, source, sink)
nupstup.shortest_path(G, costs, source, sink)
Additional context
Well-known graph problems, so graph theory terminology is fine.
There are many real-world applications and other graph problem transformations for these, so would be good to have some of these in there as well.
Problem 1 -> Minimum weight bipartite matching
Problem 3 -> Maximum cardinality bipartite matching, closure problem
If we go up another dimension and add
This is needed already for the cardinality-constrained regression doctests (quadratic model with >200 variables), and will also be needed for several OPF mod unittests.
To do:
verbose
and logfile
need to be documented in decorated mods (done via the decorator)make clean html SPHINXOPTS="-W --keep-going -n"
to checkSimilarly to the workforce data description.
Covers what people need to do in order to contribute a nup. This probably won't be a full blown doc chapter, but instead points to the resources we already have creaded (contributing guideline, template nup, etc.). We will have a better idea on how to structure this with our coming implementation kickoff.
Why this Mod?
To give a gentler alternative to the very complex ACOPF mods, a unit commitment mod would be a useful addition. However, the classic example of a few big coal plants serving some smoke-belching factories is not so nice since (a) it's a bit tired, (b) the explanation involves many dirty words, and (c) it's not so likely that a small country would build their power grid on top of the optimods.
It would be more interesting to demonstrate usability for microgrids or household scale power usage. So we could include:
Design requirements
Verify the mod will meet the following criteria:
What will the API be?
Pandas is the natural input format. It may make sense to build this mod as a class to separate static infrastructure from dynamic operation concerns to support scenario modeling, e.g.
grid = MicroGrid(loads, storage, generators)
grid.operation_plan(demand_profile, solar_profile, wind_profile)
We probably don't need network layout information for microgrids, and can focus on meeting demand over time.
Additional context
A quick search reveals plenty of recent literature we could reference:
Many papers refer also to ML for demand forecasting, so this is also a good opportunity to talk about the "what next" after you've applied all your fancy machine learning tricks.
Current example structure is docs/examples/<nup>/gurobipy.py
+ docs/examples/<nup>/nupstup.py
. We can import these files fine for the purposes of testing, but they can't be executed where they live because they partially override gurobipy
and nupstup
. Need to rename these across the board (maybe just some creative suffix like _impl
?).
We should require at least gurobipy v10. Reasoning:
Why this Nup?
The maximum weighted independent set (MWIS) problem is a famous graph theory problem. I got to appreciate this problem more where I worked on a molecular similarity application for designing new drugs. The heart of the application needed solving very big MWIS problems.
Does it fall under an existing category?
What will the API be?
mwis = nupstup.mwis(adjacency_matrix=A, weights=w)
Where A
is a scipy sparse graph structure and w
is a numpy array of weights. The output is an array of vertices in the independent set.
Additional context
This requires solving a MIP and the LP relaxation solution does not include an optimal integer solution.
We should create Env and Model objects only within their context managers.
Why this Mod?
We want to enable usage of Gurobi in the Power industry. Criteria:
What will the API be?
Custom format.
Additional context
Any other details that should be communicated in the Mod writeup to help users/readers understanding?
Add detailed information about the relaxations and terminology used for ACOPF.
@rluce @silkehorn this test run failed, does it just need a looser tolerance?
FAIL: test_risk_factors_1 (tests.test_portfolio.TestMVPFeatures)
https://github.com/Gurobi/gurobi-optimods/actions/runs/5017168216/jobs/8994948645
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.