Welcome, guest | Sign In | My Account | Store | Cart

Method for finding non-dominated options in multi-dimensional data.

Python, 20 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np

def pareto_frontier_multi(myArray):
    # Sort on first dimension
    myArray = myArray[myArray[:,0].argsort()]
    # Add first row to pareto_frontier
    pareto_frontier = myArray[0:1,:]
    # Test next row against the last row in pareto_frontier
    for row in myArray[1:,:]:
        if sum([row[x] >= pareto_frontier[-1][x]
                for x in range(len(row))]) == len(row):
            # If it is better on all features add the row to pareto_frontier
            pareto_frontier = np.concatenate((pareto_frontier, [row]))
    return pareto_frontier

def test()
    myArray = np.array([[1,1,1],[2,2,2],[4,4,4],[3,3,3]])
    print pareto_frontier_multi(myArray)

test()

This requires that your Pareto frontier is looking for maximum values on each dimension (e.g. bigger is better) and so it is important that you pass a numpy array where your features have been recoded to suit this requirement.

If you have a two dimensional data set, see my other Pareto recipe which allows you to choose whether you want to find maximum or minimum values for each dimension.

More info on Pareto frontiers is available at oCo Carbon.

1 comment

JvP 9 years, 7 months ago  # | flag

The algorithm above does not work for me; it has a running time of O(N), while the fastest known running time is O(N log d). Simple Cull runs in O(N^2).

This one works for partial orderings, given that you can define the dominates method for checking if one point dominates another (based on the Simple Cull algorithm described in this paper).

def simple_cull(inputPoints, dominates):
    paretoPoints = set()
    candidateRowNr = 0
    dominatedPoints = set()
    while True:
        candidateRow = inputPoints[candidateRowNr]
        inputPoints.remove(candidateRow)
        rowNr = 0
        nonDominated = True
        while len(inputPoints) != 0 and rowNr < len(inputPoints):
            row = inputPoints[rowNr]
            if dominates(candidateRow, row):
                # If it is worse on all features remove the row from the array
                inputPoints.remove(row)
                dominatedPoints.add(tuple(row))
            elif dominates(row, candidateRow):
                nonDominated = False
                dominatedPoints.add(tuple(candidateRow))
                rowNr += 1
            else:
                rowNr += 1

        if nonDominated:
            # add the non-dominated point to the Pareto frontier
            paretoPoints.add(tuple(candidateRow))

        if len(inputPoints) == 0:
            break
    return paretoPoints, dominatedPoints

Example usage:

def dominates(row, candidateRow):
    return sum([row[x] >= candidateRow[x] for x in range(len(row))]) == len(row)    

inputPoints = [[1,1,1], [1,2,3], [3,2,1], [4,1,1]]
paretoPoints, dominatedPoints = simple_cull(inputPoints, dominates)

print "*"*8 + " non-dominated answers " + ("*"*8)
for p in paretoPoints:
    print p
print "*"*8 + " dominated answers " + ("*"*8)
for p in dominatedPoints:
    print p