A syntax tree has functions as internal nodes and terminals as leaves. The evaluation proceeds bottom-up. This representation makes program structure explicit and easy to manipulate.

For example, for the expression :

Symbolic expressions can be defined by two fundamental sets.

The function set is the internal computational operators. For example:

The terminal set is the leaves of the tree:

All candidate programs are compositions built from . The design of and strongly influences search effectiveness.

We enforce a closure requirement: every function must accept any input produced by the tree. However, random trees may generate invalid expressions, like division by zero. To deal with this, we introduce protected operators. For example:

This ensures that output is always defined, and makes division closed over all inputs.

In the context of GP, a single individual can be written in several equivalent forms:

Each form is useful. Trees are good for visualization, prefix are good for implementation (Lisp?), and infix is good for human interpretation.

Each function in has a fixed number of children called its arity.

where implements “if , return , else ”:

Arity determines the legal tree structure. If the function set contains mostly binary operators, subtree crossover can be disruptive, and size will grow quickly with depth. Thus, the choice of functions shapes both representation and search behavior.

With the above components, we can formally define a GP individual to be modeled as a rooted ordered tree:

where is the set of nodes, is the set of edges, and is the root node.

We partition the node set as:

where are function nodes and are terminal nodes. This formal view helps when reasoning about tree size and depth.

Strongly Typed Representation

Standard GP allows any subtree as long as closure holds. In strongly typed GP, nodes carry data types. For example:

  • Arithmetic nodes return real values
  • Logical nodes return Boolean values
  • Conditional nodes combine both.

This prevents nonsensical constructions such as . Typing constrains the search space but often involves semantic quality.

Automatically Defined Functions

GP can evolve reusable subprograms. Instead of a single tree, an individual may contain a main program, along with one or more auxiliary functions

Conceptually, we say:

This promotes modularity and encourages code reuse. Repeated structure is represented more compactly.

Tree Execution

Tree execution is recursive. For , the evaluation order is:

  1. Read terminal
  2. Read terminal
  3. Compute
  4. Compute

We can consider this more cleanly by writing it in prefix form:

Let and . Then:

and finally

Search Space

GP search spaces are huge; even shallow trees produce enormous search spaces. To see this, let and . For a rough binary-tree estimate, the number of syntactic probabilities grows on the order of

for depth .

This implies that exhaustive search is impossible, and local structure matters greatly. Thus, good variation operators are crucial.

There is a distinction that we can make: semantic vs. syntactic search spaces. Two different trees can compute the same function. For example, and . This means that GP searches in a huge syntactic space, but a smaller semantic space. This redundancy can both help and hurt search.