Exercise 4. Discovering the fitness landscape: measures of ruggedness and convexity 

Question 1

Both in this course and earlier throughout the curriculum, the fitness landscape, its shape, convexity and smoothness were often referenced. These aspects of the landscape have been presented as an important (or perhaps the most important) determinant of successful optimization. In this exercise, we will try to discover the nature of the fitness landscape – although not by showing it in a simple "fitness for different argument values" plot because the solution space is combinatorial, but in other visualizations.

Landscape visualization poses a problem: in order to study the landscape, we must have samples of solutions from the locations where we want to study it. If these samples are only poor or very bad (e.g., randomly generated solutions), we will only discover the surroundings of these solutions – and there is not much interesting to see in such surroundings. In order to probe the more interesting parts of the landscape (the neighborhood of good solutions) we first have to obtain such solutions, and once we obtain them, looking at their neighborhood is only useful to gain "post hoc" knowledge about the nature of the landscape and its topology – not to discover good solutions, because, after all, we already have these. In other words, you have to put in a lot of effort first to reach the more interesting areas you want to see (just like in the mountains).

So first, we need a set of solutions with as much variation in fitness as possible – ranging from the worst to the best, including, for example, local optima. We can get them by running repeatedly (e.g., 50 times) a relatively greedy and fast evolution from the initial poor genotype until we reach stagnation, and periodically, during evolution, logging the best individual in the population (we don't want to unnecessarily save almost identical solutions). Alternatively, we can, for example, log all changes in a single-individual Hall of Fame. This way, from each course of evolution, we will get many possibly different samples collected during the climb. Try to obtain many (at least 200) diversified fitness values of individuals (so that, if possible, there are no large gaps between fitness values).

Choose one of the optimization problems we have covered so far:

  • "vertpos" without a neural network. Compare four representations: f0, f1, f4, f9. You can use the trick with an auxiliary objective function to quickly escape from the plateau, but once you have the solutions, only use the original non-modified "vertpos" for fitness evaluation in this exercise.
  • "velocity" with a neural network. Compare four representations: f0, f1, f4, fH. In the fH representation, a genotype consists of a set of sticks and neurons equipped with "handles" – vectors of numbers, and the elements of the structure fuse together according to the similarity of these evolving "handles"; for details, see sections 3.6 and 4.2.

Use the same sequence of the *.sim files that we used in previous lab exercises for the problem you have now chosen. From the optimization runs, generate the same plots as usual (three types, possibly without the "separate line for each optimization run" plot).

For each of the saved solutions-samples (save their genotypes and fitness values), now generate a certain number (at least 20) of its mutants-neighbors (each mutant has the same parent-sample) and evaluate each of them. Reject invalid solutions (with fitness=FITNESS_VALUE_INFEASIBLE_SOLUTION). Draw a scatter plot, where for each sample on the X-axis is its fitness, and on the Y-axis are the fitness values of its mutants-neighbors. Generate such a plot for each of the compared representations and add the y=x line. Calculate what fraction of mutants in a given representation were better than their parents. Describe the conclusions. What can you tell about the landscape (at least in the neighborhood of the collected samples) and what are the differences between the representations?

...

Question 2

Now that we have a variety of solutions-samples, try to construct an analogous plot for crossover. Crossover pairs of selected parents and evaluate their offspring. Reject invalid offspring (with fitness=FITNESS_VALUE_INFEASIBLE_SOLUTION). Using this plot, we want to answer the question of how the fitness of the offspring relates to the fitness of its parents, and for which solutions (with what fitness) the crossover is beneficial, and for which it is not. Come up with an adequate way to visualize this relationship – it can be 3D saved as a self-contained html, or maybe a properly designed 2D plot would be better. The order of the parents does not matter.

As in the previous question, you will have as many plots as the number of genetic representations being compared. Propose and calculate at least one statistic (a formula) that, based on the information collected here for a given representation, will determine a single number capturing a significant property of crossover in that representation. Describe the conclusions.

...

Question 3

For the final step, we will perform one more analysis revealing a certain characteristic of the landscape (and, at the same time, of mutation operators). Select about 20 solutions (from best discovered fitness to worst, lowest fitness) and for each of them, perform a sequence of, say, 30 mutations ("random walk" – it must be a sequence!), evaluating each mutant. If any mutant comes out invalid (its pseudo-fitness will be FITNESS_VALUE_INFEASIBLE_SOLUTION), repeat the mutation until it succeeds (until a correct mutant is produced).

Then draw simple line plots (on the horizontal axis – the number of the mutant in the sequence starting from the original sample, on the vertical axis – fitness values). Adjust colors depending on the quality of the initial solutions (for example, 4 different colors for groups of 5 solutions: the best 5 solutions – 5 green lines, the next 5 – blue, the next 5 – orange, the worst 5 – red). The goal is to see how random fitness trajectories produced by successive mutations unfold at different fitness levels [discussion during the class, solution as "memory"].

Attach the plots, describe how you constructed them, and interpret the results in the context of fitness landscapes.

...

Question 4

Summarizing the conclusions of your landscape analyses for different genetic representations (different search topologies) – do you see relationships or correlations of the landscape shape with the evolutionary efficiency (i.e., with the plots from question 1)?

During this lab class, we learned how to investigate and visualize the ruggedness/smoothness of the fitness landscape and the characteristics of the mutation and crossover operators. And how would you examine the property of global convexity of the landscape?

...