Section-structured, BFS approach.
Patryk Aleksander
1. Introduction
The main goal of this project was to come up with a representation of genome that is a string made of A to Z capital
letters only, and can always be converted into a correct genome in the f0 original Framstick representation. Since
the string has no restrictions on syntax the representation is called "messy" and will be denoted as f7.
2. Sections
In the process of conversion, input string is treated as a repeating sequence of four sections responsible for:
parts, joints, neurons and connections respectively. Each section starts and ends with a letter which takes the role
of a SectionTag at that particular moment of conversion. First letter of the string is the first
SectionTag. Following letters form the first parts' section (in general a consecutive section), up to the
second occurrence of the SectionTag which terminates the current section (exceptions from this rule are
explained later). Next letter takes the role of a new SectionTag (in general it can be a different letter
every time) and so on... (see example below)
Example 2.1:
Which letters are SectionTags?
(NOTE: spaces are added for clarity only)
A QWERTY... AB QWERTY... BC QWERTY... CB QWERTY... BH QWERTY... H...
SectionTags: A, B, C, B, H
Information inside each section is coded in 5-letter long 'chunks' called tokens. Tokens are treated either as
labels or values depending on the situation. In the latter case, value in the base-26 numeral system (26 capital
letters) is recalculated to base-10 numeral system (digits 0 9). A 5 letter word from 26 letter alphabet can take
over 11 million states. The assumed precision is 3 digits for the integer part and 3 digits for the fractional part
(further domain restrictions may apply) which requires 1 million states (-500.000 499.999). Since there are over
11 times more states than required, modulo function is used (see example below).
Example 2.2:
What coordinate does token ZABCD stand for?
Z = 25 (position in the alphabet),
A = 0,
B = 1,
C = 2,
D = 3.ZABCD = ( (25*26^4 + 0*26^3 + 1*26^2 + 2*26^1 + 3*26^0) % 1000000) / 1000) - 500.0 = (11425131 % 1000000) / 1000 -
500.0 = 425131 / 1000 - 500.0 = -74.869
Stage I. base-26 to base-10 conversion
ZABCD = 25*26^4 + 0*26^3 + 1*26^2 + 2*26^1 + 3*26^0 = 11425131,Stage II. range wrapping
11425131 % 1000000 = 425131,Stage III. precision and range adjustment
425131 / 1000 - 500.0 = -74.869
3. Parts' section
There are two types of definitions for parts: simplified and extended. Simplified definition consists of 4 tokens:
label and x, y, z coordinates. Label serves as part's identifier. Tokens for coordinates are recalculated to
numbers. If the distance between the next token and current part's label (after recalculating into numbers) is not
exceeding a predefined distance, we are dealing with an extended definition. Three consecutive tokens are
recalculated to numbers and serve as values for properties: friction, ingestion and assimilation respectively. To
protect part definitions from being torn apart too often, only label tokens are checked for section tags. Every part
is stored right after decoding and is created only before the creation of a joint linking it.
4. Joints' section
To ensure model consistency this section is decoded in the same way Breadth First Search algorithm goes through a
graph. Each token is recalculated to a number. Each number corresponds with an index of a stored part (if it's too
big it's wrapped). First occurrence (o1) of a specific part starts the branching process until second occurrence
(o2) of that part is reached. Joints link o1 with all parts that are between o1 and o2. If the distance between o2
and the next token is not exceeding a predefined distance, it's an extended definition. Three consecutive tokens
provide the values of: stiffness, rotation stiffness and stamina joint properties. Each part between o1 and o2 is
added to a list and becomes the new o1 after o2 is reached (see example below)
Example 4.1:
What joints will the following sequence create?
(NOTE: spaces are added for clarity only)
AAAAA AAAAB AAAAC AAAAD AAAAA AAAAB AAAAE AAAAF AAAAC AAAAG AAAAD AAAAE AAAAF AAAAG
Tokens are recalculated to indices of stored parts:
AAAAA = 0,
AAAAB = 1,
AAAAC = 2,
AAAAD = 3,
AAAAE = 4,
AAAAF = 5,
AAAAG = 6.Following joints in a (part1, part2) notation are added to the model:
(0, 1), (0, 2), (0, 3),
(2, 4), (2, 5),
(3, 6).
5. Neurons' section
There are two types of definitions for neurons: simplified and extended. Simplified definition consists of 3 tokens:
label, neuron type and attachment type. Label serves as neuron's identifier. Neuron type is recalculated to a number
corresponding with an index of active (option set in simulator) neurons (if it's too big it's wrapped). If neuron
type token matches label token, neuron of default type "N" is created. Attachment type is recalculated to a
number.
Three cases may occur:
- neuron of current type must be attached to a part,
- neuron of current type must be attached to a joint,
- neuron of current type doesn't need any assignment to body element.
Attachment type corresponds with an index of a part (in the 1st case), of a joint (in the 2nd case), of either a
part or a joint or a dummy element (in the 3rd case). The latter are added to index range to ensure a situation with
no attachment is possible. If attachment type token matches label token, neuron with no assignment to body element
is created by default (assuming 3rd case occurred). If the distance between the next token and current part's label
(after recalculating into numbers) is not exceeding a predefined distance, we are dealing with an extended
definition. Number of consecutive tokens equal to number of properties of current neuron type, are recalculated to
new values for these properties (domain restrictions apply). To ensure correctness of the model, neuron type and
attachment type tokens are not checked for section tags.
Example 5.1:
What neurons will the following sequence create?
(NOTE: spaces are added for clarity only)
Additional information: model.getPartCount() = 4, model.getJointCount() = 2, active neurons: N, G, T, S, *, | and @
Tokens are recalculated to numbers:
AAAAA = 0,
AAAAB = 1,
AAAAC = 2,
AAAAD = 3,
AAAAE = 4,
AAAAF = 5,
AAAAQ = 16.Active neurons' indices:
N = 0, G = 1, T = 2, S = 3, * = 4, | = 5, @ = 6.AAAAA AAAAA AAAAA
label = AAAAA,
neuron type == label => N, (NOTE: default, doesn't need attachment)
attachment type == label => not attached. (NOTE: default)
Created: neuron N, not attached.AAAAB AAAAD AAAAF
label = AAAAB,
neuron type = AAAAD = 3 => S, (NOTE: needs to be attached to body part)
attachment type = AAAAF = 6 => 6 % getPartCount() = 6 % 4 = 2 => part(2).
Created: neuron S, attached to part(2).AAAAC AAAAE AAAAQ
label = AAAAC,
neuron type = AAAAE = 4 => *, (NOTE: doesn't need attachment)
attachment type = AAAAQ = 16 => 16 % (1.5*(getPartCount() + getJointCount())) = 16 % 9 = 7 => not attached. (NOTE: 0-3 => to part, 4-5 => to joint, 6-8 => not attached)
Created: neuron *, not attached.
6. Connections' section
Three consecutive tokens form a connection. Each token is recalculated to a number. First token corresponds with an
index of a neuron (if it's too big it's wrapped), from list of neurons with at least one free input. This neuron
serves as a parent neuron. Second token corresponds with an index of a neuron (if it's too big it's wrapped), from
list of neurons with at least one free output. This neuron is connected as parent neuron's input. Third token serves
as weight (after normalization).
7. Square example
Convert the following genotype from f7 into f0
ZAAAAABCLQUBCLQUBCLQUYYYYBBCNDGBCLQUBCLQUAAAACBCNDGBCNDGBCLQUYYYYDBCLQUBCNDGBCLQUZZAAAAAYYYYBAAAAAAAAACYYYYBYYYYDAAAACAAAAAYYYYDAAAAAZ
Speces are added for clarity
Z AAAAA BCLQU BCLQU BCLQU YYYYB BCNDG BCLQU BCLQU AAAAC BCNDG BCNDG BCLQU YYYYD BCLQU BCNDG BCLQU Z Z AAAAA YYYYB AAAAA AAAAC YYYYB YYYYD AAAAC AAAAA YYYYD AAAAA Z
SectionTag Z, opens parts' section:
AAAAA - label, BCLQU BCLQU BCLQU - coordinates,
YYYYB - label, BCNDG BCLQU BCLQU - coordinates,
AAAAC - label, BCNDG BCNDG BCLQU - coordinates,
YYYYD - label, BCLQU BCNDG BCLQU - coordinates,
Labels are recalculated as follows:
AAAAA = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 0*26^0 = 0,
YYYYB = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 1*26^0 = 11406097,
AAAAC = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 2*26^0 = 2,
YYYYD = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 3*26^0 = 11406099,
Threshold for extended part definition is 5940688 (50% of the number of 5 letter permutations)
Distances between consequent labels exceed the threshold, which implies the use of simple part definitions.
There are two values of tokens used: BCLQU and BCNDG. They are recalculated as follows:
Stage I. base-26 to base-10 conversion
BCLQU = 1*26^4 + 2*26^3 + 11*26^2 + 16*26^1 + 20*26^0 = 500000,
BCNDG = 1*26^4 + 2*26^3 + 13*26^2 + 3*26^1 + 6*26^0 = 501000,Stage II. range wrapping
500000 % 1000000 = 500000,
501000 % 1000000 = 501000,Stage III. precision and range adjustment
500000 / 1000 - 500.0 = 0.0,
501000 / 1000 - 500.0 = 1.0.
Parts defined in the first section:
0: 0, 0, 0,
1: 1, 0, 0,
2: 1, 1, 0,
3: 0, 1, 0.
SectionTag Z, closes parts' section.
SectionTag Z, opens joints' section:
Labels are recalculated as follows:
AAAAA = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 0*26^0 = 0,
AAAAB = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 1*26^0 = 1,
AAAAC = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 2*26^0 = 2,
AAAAD = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 3*26^0 = 3,
YYYYB = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 1*26^0 = 11406097,
YYYYC = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 2*26^0 = 11406098,
YYYYD = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 3*26^0 = 11406099,
Threshold for extended joint definition is 5940688 (50% of the number of 5 letter permutations)
Distances between consequent labels exceed the threshold, which implies the use of simple joint definitions.
Range of the labels is wrapped modulo 4 (the number of existing parts -> parts' section):
AAAAA: 0 % 4 = 0,
AAAAB: 1 % 4 = 1,
AAAAC: 2 % 4 = 2,
AAAAD: 3 % 4 = 3,
YYYYB: 11406097 % 4 = 1,
YYYYC: 11406098 % 4 = 2,
YYYYD: 11406099 % 4 = 3.
Joints' section can be rewritten using parts' indices as follows:
0 1 0 2 1 3 2 0 3 0
BFS algorithm turn this sequence into a graph as follows:
empty part list is created []
0 added to the list [0], created, and processed1 added to the list [0, 1], created, joint (0, 1) created
0 deleted from list [1]1 processed (being next on the list)
2 added to the list [1, 2], created, joint (1, 2) created
1 deleted from list [2]2 processed
3 added to the list [2, 3], created, joint (2, 3) created
2 deleted from list [3]3 processed
0 added to the list [3, 0], not created (part 0 already exists),
joint (3, 0) created
3 deleted from list [0]0 processed
0 deleted from list []
SectionTag Z, closes joints' section.
These four parts and four joints form a square.
Previous realization - description in Polish: