Multi-Objective Evolutionary Algorithms

Multiobjective Optimization Problems

With ecr, both single- and multi-objective optimization problems can be addressed. In the former, an attempt is made to find to find a single solution that maximizes a fitness value corresponding directly to a single underlying measure of quality. Often, however, we look at problems in which we try to optimize different (conflicting) goals at the same time. Examples of these so-called multi-objective problems (MOPs) include:

  • Hotel search: Close to the beach and at the same time reasonably priced
  • Feature Selection in Machine Learning: Few features with high accuracy
  • Real estate purchase: location, price, number of rooms, square meter, …

Different objectives can also mean different optimization goals: When buying a new apartment, one hopes for a low price (minimization problem) and a large living space (maximization problem).

In the following, we will look at the example of Feature Selection to see how multi-objective problems can be solved with ecr.

For this purpose, we will iteratively train a RandomForest-Classifier on the Wisconsin Breast Cancer data set and analyze the effect on the prediction accuracy by including or omitting features of the data set in the training. The goal is to achieve high accuracy along while using a minimum number of features. To train the RandomForest-Classifier we use the R-Package mlr. The BreastCancer data set is taken from the package mlbench.

##       Id             Cl.thickness   Cell.size     Cell.shape  Marg.adhesion
##  Length:699         1      :145   1      :384   1      :353   1      :407  
##  Class :character   5      :130   10     : 67   2      : 59   2      : 58  
##  Mode  :character   3      :108   3      : 52   10     : 58   3      : 58  
##                     4      : 80   2      : 45   3      : 56   10     : 55  
##                     10     : 69   4      : 40   4      : 44   4      : 33  
##                     2      : 50   5      : 30   5      : 34   8      : 25  
##                     (Other):117   (Other): 81   (Other): 95   (Other): 63  
##   Epith.c.size  Bare.nuclei   Bl.cromatin  Normal.nucleoli    Mitoses   
##  2      :386   1      :402   2      :166   1      :443     1      :579  
##  3      : 72   10     :132   3      :165   10     : 61     2      : 35  
##  4      : 48   2      : 30   1      :152   3      : 44     3      : 33  
##  1      : 47   5      : 30   7      : 73   2      : 36     10     : 14  
##  6      : 41   3      : 28   4      : 40   8      : 24     4      : 12  
##  5      : 39   (Other): 61   5      : 34   6      : 22     7      :  9  
##  (Other): 66   NA's   : 16   (Other): 69   (Other): 69     (Other): 17  
##        Class    
##  benign   :458  
##  malignant:241  
##                 
##                 
##                 
##                 
## 

First, we remove the observations with missing data (see Bare.nuclei’s 16 NA entries) along with the Id column which is irrelevant for the training of the model. The data set is then divided into a feature data set and a target data set. The prediction target is the column “Class”.

cancer = BreastCancer[, 2:11]
cancer = cancer[!(rowSums(is.na(cancer)) > 0),]
cancer.features = cancer[, 1:9]
cancer.target = cancer[, 10]

Next, we define the fitness function. First a few conceptual considerations: The fitness of an individual results from the number of features used for prediction and the accuracy of the prediction. In order to determine the accuracy, the model must be trained on the features encoded in the individual. Consequently, each time the fitness function is called, the model is first trained and then the fitness is determined. We use a resampling strategy to determine the performance of the learning algorithm on the selected features. The fitness corresponds to the average accuracy of the individual samples.

  fitness.fun = function(ind) {
    ind = as.logical(ind)
    # all features deselected is not a supported solution.
    # Thus, we set the accuracy to 0 and number of features to its maximum.
    if (!any(ind))
      return(c(0, length(ind)))
    # add target column to individual
    task = makeClassifTask(data = cancer[, c(ind, TRUE)],
                           target = "Class",
                           id = "Cancer")
    # Subsampling with 5 iterations and default split ratio 2/3
    rdesc = makeResampleDesc("Subsample", iters = 2)
    # Classification tree
    lrn = makeLearner("classif.randomForest")
    r = do.call(resample, list(lrn, task, rdesc, list(acc), show.info = FALSE))
    measure = r$aggr[[1]]
    nFeatures = sum(ind)
    return(c(measure, nFeatures))
  }

Black-box approach

Since ecr supports multi-objective optimization as a standard task, we can use the black-box function ecr() to execute the feature selection. We decide to use an evolutionary (5 + 10)-strategy, i.e., an algorithm that keeps a population of size mu = 5, in each generation creates lambda = 10 offspring by variation and selects the best mu out of mu + lambda individuals to survive. In the context of our multi-objective optimization problem, two important arguments should be noticed. First, n.objectives has to reflect the number of objectives and minimize has to be a vector of length = n.objectives, indicating for each objective as to whether it should be minimized or maximized. For the problem at hand, our two objectives (accuracy, number of features) are to be maximized and minimized respectively.

MU = 5; LAMBDA = 1L; MAX.ITER = 25; N.BITS = ncol(cancer.features);
res = ecr(fitness.fun = fitness.fun,
            n.objectives = 2L,
            minimize = c(FALSE, TRUE),
            representation = "binary",
            n.bits = N.BITS,
            mu = MU,
            lambda = LAMBDA,
            survival.strategy = "plus",
            mutator = setup(mutBitflip, p = 1 / N.BITS),
            p.mut = 0.3,
            p.recomb = 0.7,
            terminators = list(stopOnIters(MAX.ITER)),
            log.pop = TRUE,
            initial.solutions = list(rep(1,N.BITS)))

The resulting Pareto-set consists of all non-dominated solutions and can be plotted by using plotFront.

plotFront(res$pareto.front)
Pareto front on multi-objective optimization problem

Pareto front on multi-objective optimization problem

White-box approach

Writing the evolutionary loop by hand, we first create a control object that stores the information about the target function and the evolutionary operators. Note that the number of objectives is passed explicit.

control = initECRControl(fitness.fun, n.objectives = 2L, minimize = c(FALSE, TRUE))
control = registerECROperator(control, "mutate", mutBitflip, p = 0.3)
control = registerECROperator(control, "selectForSurvival", selNondom)

Here, we decide to perform mutation only. The best mu individuals (regarding fitness values) are going to be selected to build up the next generation. However, as we are looking at a multi-objective optimization problem, the “best mu individuals” must be determined by applying non-dominated sorting of the objective vectors and subsequent computation of the crowding distance. An alternative multi-objective ecr selection operator is the selDomHV. Furthermore, you can write your own selector via makeSelector. Now, an initial population is sampled, their respective fitness evaluated and a Pareto-archive initialized. A Pareto-archive is usually used to store all or a part of the non-dominated points during a run of an multi-objective evolutionary algorithm.

  population = genBin(MU, N.BITS)
  fitness = evaluateFitness(control, population)
  archive = initParetoArchive(control)

Finally, the evolutionary loop is implemented. In each iteration, the Pareto-set is updated in the Pareto-archive via the function updateParetoArchive.

  for (i in seq_len(MAX.ITER)) {
      # sample lambda individuals at random
      idx = sample(1:MU, LAMBDA, replace = TRUE)
      # generate offspring by mutation and evaluate their fitness
      offspring = mutate(control, population[idx], p.mut = 1)
      fitness.o = evaluateFitness(control, offspring)
      # now select the best out of the union of population and offspring
      sel = replaceMuPlusLambda(control, population, offspring, fitness, fitness.o)
      population = sel$population
      fitness = sel$fitness
      updateParetoArchive(archive, population,fitness)
  }

Let’s have a look at our Pareto-front:

pareto.front = getFront(archive)
plotFront(pareto.front)
Pareto front on multi-objective optimization problem

Pareto front on multi-objective optimization problem