Welcome, guest | Sign In | My Account | Store | Cart

Method for finding non-dominated options in multi-dimensional data.

Python, 20 lines
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import numpy as np def pareto_frontier_multi(myArray): # Sort on first dimension myArray = myArray[myArray[:,0].argsort()] # Add first row to pareto_frontier pareto_frontier = myArray[0:1,:] # Test next row against the last row in pareto_frontier for row in myArray[1:,:]: if sum([row[x] >= pareto_frontier[-1][x] for x in range(len(row))]) == len(row): # If it is better on all features add the row to pareto_frontier pareto_frontier = np.concatenate((pareto_frontier, [row])) return pareto_frontier def test() myArray = np.array([[1,1,1],[2,2,2],[4,4,4],[3,3,3]]) print pareto_frontier_multi(myArray) test()

This requires that your Pareto frontier is looking for maximum values on each dimension (e.g. bigger is better) and so it is important that you pass a numpy array where your features have been recoded to suit this requirement.

If you have a two dimensional data set, see my other Pareto recipe which allows you to choose whether you want to find maximum or minimum values for each dimension.

1 comment JvP 7 years, 5 months ago

The algorithm above does not work for me; it has a running time of O(N), while the fastest known running time is O(N log d). Simple Cull runs in O(N^2).

This one works for partial orderings, given that you can define the dominates method for checking if one point dominates another (based on the Simple Cull algorithm described in this paper).

def simple_cull(inputPoints, dominates):
paretoPoints = set()
candidateRowNr = 0
dominatedPoints = set()
while True:
candidateRow = inputPoints[candidateRowNr]
inputPoints.remove(candidateRow)
rowNr = 0
nonDominated = True
while len(inputPoints) != 0 and rowNr < len(inputPoints):
row = inputPoints[rowNr]
if dominates(candidateRow, row):
# If it is worse on all features remove the row from the array
inputPoints.remove(row)
elif dominates(row, candidateRow):
nonDominated = False
rowNr += 1
else:
rowNr += 1

if nonDominated:
# add the non-dominated point to the Pareto frontier

if len(inputPoints) == 0:
break
return paretoPoints, dominatedPoints

Example usage:

def dominates(row, candidateRow):
return sum([row[x] >= candidateRow[x] for x in range(len(row))]) == len(row)

inputPoints = [[1,1,1], [1,2,3], [3,2,1], [4,1,1]]
paretoPoints, dominatedPoints = simple_cull(inputPoints, dominates)

print "*"*8 + " non-dominated answers " + ("*"*8)
for p in paretoPoints:
print p
print "*"*8 + " dominated answers " + ("*"*8)
for p in dominatedPoints:
print p Created by Jamie Bull on Sat, 13 Oct 2012 (MIT)