Method for finding non-dominated options in multi-dimensional data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import numpy as np
def pareto_frontier_multi(myArray):
# Sort on first dimension
myArray = myArray[myArray[:,0].argsort()]
# Add first row to pareto_frontier
pareto_frontier = myArray[0:1,:]
# Test next row against the last row in pareto_frontier
for row in myArray[1:,:]:
if sum([row[x] >= pareto_frontier[-1][x]
for x in range(len(row))]) == len(row):
# If it is better on all features add the row to pareto_frontier
pareto_frontier = np.concatenate((pareto_frontier, [row]))
return pareto_frontier
def test()
myArray = np.array([[1,1,1],[2,2,2],[4,4,4],[3,3,3]])
print pareto_frontier_multi(myArray)
test()
|
This requires that your Pareto frontier is looking for maximum values on each dimension (e.g. bigger is better) and so it is important that you pass a numpy array where your features have been recoded to suit this requirement.
If you have a two dimensional data set, see my other Pareto recipe which allows you to choose whether you want to find maximum or minimum values for each dimension.
More info on Pareto frontiers is available at oCo Carbon.
The algorithm above does not work for me; it has a running time of O(N), while the fastest known running time is O(N log d). Simple Cull runs in O(N^2).
This one works for partial orderings, given that you can define the
dominates
method for checking if one point dominates another (based on the Simple Cull algorithm described in this paper).Example usage: