Exercise 5. Detecting, estimating and visualizing epistasis 

Question 1

Recall from the lecture how epistasis is defined and how it can be empirically estimated.

Choose one of the genetic representations: f9 or f1.

Just as before, evolve a few tall structures (the "vertpos" criterion) using only-body.sim. From these, choose a few (at least 3) dissimilar structures.

Visualize epistasis (a 2D matrix) between all pairs of genes. If disabling a gene results in an invalid genotype, ignore that gene in the analysis (do not disable it). You can use the following code as an inspiration:


import argparse
import numpy as np
import os,sys
from FramsticksLib import FramsticksLib


def setstr(s, pos, char):
	assert len(char) == 1
	assert 0 <= pos < len(s)
	return s[:pos] + char + s[pos + 1:]

def show_array(arr):
	import matplotlib.pyplot as plt
	import matplotlib.colors as colors
	fig, ax = plt.subplots(figsize=(5, 5))
	from  matplotlib.colors import LinearSegmentedColormap
	red_white_green=LinearSegmentedColormap.from_list('my-gradient',["r", "w", "g"], N=256) # because matplotlib only provides red-YELLOW-green 
	im = ax.imshow(arr, cmap=red_white_green, norm=colors.CenteredNorm())
	# ...

def evaluate(prefix, geno): # TODO if needed, handle errors (due to invalid genotypes)
	return framsLib.evaluate([prefix + geno])[0]['evaluations']['']['vertpos']



# parsed_args = parseArguments()
# framsLib = FramsticksLib(parsed_args.path, parsed_args.lib, parsed_args.simsettings)

g = 'UDDDLFR...' # the genotype to test
prefix = '/*9*/'
fit = evaluate(prefix, g)
print(fit)
arr = np.full([len(g), len(g)], np.nan)
for i in range(len(g)):
	for j in range(len(g)): # the diagonal (i==j) stores the effect of disabling a single gene
		g2 = g
		g2 = setstr(g2, i, ' ')
		g2 = setstr(g2, j, ' ')
		g2 = g2.replace(' ', '')
		arr[i][j] = evaluate(prefix,g2) - fit

# calculate epistasis...
Attach the resulting matrices and interpret the results. Are the matrices different in their characteristics for each solution you tested? You can visually identify which gene corresponds to which phene (a part of the 3D structure) using the gene editor in Framsticks GUI.

...

Question 2

Repeat the experiment as in the previous question, but this time investigate the epistasis among all triplets of genes. For the baseline (needed to estimate the amount of epistasis by calculating the difference), compare separately two variants that were already calculated and are known from the previous question: the influence of individual genes in the triplets, and the influence of all pairs of genes involved in the triplets (the specific formulas were discussed during the lecture; mention the formulas used in your response).

Visualize the results as a "3D heatmap" (a grid of small balls in a 3D grid, their color used as before).

Analyze the results and describe the conclusions as before. For all baselines and genotypes you tested, compare the sum of raw epistasis values (i.e., sum(epistasis_3d_matrix)) and the sum of absolute values (i.e., sum(abs(epistasis_3d_matrix))). Which of the two baselines turned out to be more useful (assuming that now we want to discover new information regarding the interaction between triplets of genes, not the same information we already know about the pairs) and why?

...