\chapter{\label{cha:crossmut}Efficient Implementation of crossover and mutation
operators}
Crossover and mutation are crucial parts of genetic algorithms and
biological simulations with explicit genetics. This work focuses on
the development of fast and memory-efficient implementations of a
number of operators using bit-masks to perform the operators simultaneously
on more than one locus at a time in the way described by \citet{vos99}.
This optimized operator implementations have already been successfully
incorporated in a range of biological simulations \citep{gav05a,gav07a,gav07b,gav09},
including the model described in Chapter \ref{cha:hybrids}.
Four types of crossover (one-point, two-point, within-chromosome recombination
and half-uniform) and two mutation operators (standard or \emph{bit-flip}
and step-mutation) are analyzed here. Verbal descriptions have focused
on the action of genetic operators on individuals, and have led to
straightforward implementations \citep{gol89}. These implementations
are easily checked to represent the genetic operators, but are far
from optimal. While some optimized implementations might be more obscure,
when presented with a statistical test to support the qualitative
correctness of the implementation and its numerical stability, these
implementation can be used safely and represent a significant improvement
both in running time and memory usage.
Memory-efficient refers specifically to the way individuals are stored
in memory, which usually leads to global memory-efficiency. The computational
speed will mainly be gained by a sort of parallelization, achieved
by the use of bit-masks that allow us to operate on multiple loci
(usually 8) at the same time, and by simulating a long series of independent
rare events with only one biased random number.
\section{Introduction}
Simulation of individuals, be it for a genetic algorithm or for a
biological simulation, ultimately requires encoding their genetic
information as a data structure in memory. The genetic data of individuals
can be stored as an array of bytes, where each byte represents one
locus. Alternatively, we can think of each byte as a binary string,
thus containing 8 bits and potentially being able to store more than
one locus. In the special case of binary (diallelic) loci, we can
store 8 loci per byte. In this way, an individual, still an array
of bytes, requires up to an eighth the memory than in the one byte
per locus case. Populations would then be arrays of individuals. Generally,
genetic algorithms and simulations with non-overlapping generations
maintain at all times two populations, usually of the same size. One
represents the parents' population, while the other one represents
the offsprings' population.
For the sake of simplicity, we will use the notation $\mathcal{B}=\{0,1,\dots,2^{L}-1\}$
to refer to the set of all possible \emph{binary words} of length
$L$. Elements of this set will be termed \emph{individuals} or \emph{mask}s,
to refer to a chromosome in the genetic algorithm or simulation, or
a bit-mask to be applied to genotypes respectively. Furthermore, since
we will be operating on binary strings, we will use the logical bit-wise
operators \emph{AND} ($\&$), \emph{OR} ($|$), \emph{NOT} ($\sim$)
and \emph{XOR} ($\wedge$), as the most elementary operations defining
the genetic operators. In addition, we will use two functions to construct
the operators: one is the \emph{number of ones} and the other is the
\emph{Hamming distance} function.
The \emph{number of ones} in the binary number $x$ is a function
\begin{eqnarray*}
\#:\mathbb{N} & \rightarrow & \mathbb{N}\\
\#(x) & = & \sum_{i=0}^{L}(x>>i)\&1\end{eqnarray*}
recall that the result of an \emph{AND} operator will have a one in
the $i$-th position, if and only if both operands have a one in that
position.
The \emph{Hamming distance} of two binary strings of the same size
represents the number of differing bits, and can be defined as \[
H(x,y)=\#(x\wedge y)\]
Observe that, since the \emph{XOR} operator returns a binary string
with ones in positions where the strings differ, and zero elsewhere,
by counting the number of ones in the string $x\wedge y$ we are effectively
counting the number of differing bits of two binary strings.
\section{Crossover Operators\label{sec:Crossover}}
A \emph{crossover operator} $\chi$ is a non-deterministic function
mapping pairs of binary strings to tuples of binary strings. The pair
of strings are normally called the parents, and elements of the tuple
are called offspring. Normally the number of offspring is 1 or 2 for
genetic algorithms, but is usually Poisson distributed for biological
simulations. Formally, we define $\chi$ as \[
\chi:\mathcal{B}\times\mathcal{B}\rightarrow\mathcal{B}^{n}\]
where $n$ is the number of offspring of a pair of parents. Observe
that in most biological simulations, $n$ is a random value. In these
cases, we can, alternatively, think of the crossover operator as $\chi:\mathcal{B}\times\mathcal{B}\rightarrow\mathcal{B}$,
and we construct a vector of binary strings $(\chi(x,y),\dots,\chi(x,y))\in\mathcal{B}^{n}$.
With certain crossover operators we might think of the crossover operator
as the composition of two functions, one choosing a \emph{mask}, and
the other computing the children. The mask selecting function $m_{\chi}$,
associated with a crossover operator $\chi$, is non-deterministic,
and the computation of children is then be defined as \begin{eqnarray*}
C:\mathcal{B}\times\mathcal{B}\times\mathcal{B} & \rightarrow & \mathcal{B}\\
C(x,y,m) & = & (x\&m)|(y\&\sim m)\end{eqnarray*}
Typically in evolutionary algorithms where $n=2$, we use $\chi(x,y)=(C(x,y,m),C(y,x,m))$,
where $m=m_{\chi}(x,y)$ is the non-deterministic mask to be used
in this crossover. Observe that the function $C$ is independent of
the crossover operator chosen, and thus we can define the crossover
operator by defining only the function $m_{\chi}$. In particular,
by defining $\Pr[m_{\chi}(x,y)=m]$ for all $x,y$ and $m$ we uniquely
characterize the mask selecting function $m_{\chi}$, and thus, the
crossover operator $\chi$.
We are interested in four crossover operators: one-point ($1X$),
two-point ($2X$) and half-uniform ($HUX$) are typically used in
genetic algorithms; within-chromosome recombination ($WCX$) is used
in biological simulations. These operators can be defined in terms
of their $m_{\chi}$ function as follows:\begin{eqnarray*}
\Pr[m_{1X}(x,y)=m] & = & \begin{cases}
\frac{1}{L-1} & \mbox{if }m\in\{2^{i}-1:1\leq ip@ of doubles (with size $n$) such that the
probability of obtaining the value \texttt{i} when asking for a random
value is proportional to \verb@d@$-$\verb@>p[i]@.
After defining the pseudo-probabilities, the distribution needs to
be initialized with a call to \verb@initdist(d,sum)@ which, aside
from the distribution structure, takes the value of $\sum_{j=0}^{n-1}$\verb@d@$-$\verb@>p[j]@
as its second argument. This call requires $O(n)$ time to execute
and no extra memory. After that, we can freely call the function \verb@drand(d)@
which takes the distribution as its sole argument and returns an integer
value $i$ such that $0\leq ip[i]@$/\sum_{j=0}^{n-1}$\verb@d@$-$\verb@>p[j]@.
This call requires $O(1)$ time to execute and no extra memory allocation.
\section{Performance and statistical analysis\label{sec:Performance}}
In this section, a comparison of the straightforward implementation
against the efficient implementation is performed on all of the genetic
operators discussed in this Chapter.
\subsection{Performance analysis}
By using the technique described in Sections \ref{sec:Crossover}
and \ref{sec:Mutation}, a speedup of over three times in crossover
operators and over ten times in mutation operators is achieved. Figures
\ref{fig:wcxTime} to \ref{fig:bfmTime} show the execution time,
in seconds, of one million crossover and mutation operators for individuals
with a number of loci ranging from 8 to 392.
%
\begin{figure}
\hfill{}\includegraphics[width=4in]{Figures/time_1}\hfill{}
\caption[~~~Execution time of two implementations of $WCX$]{\label{fig:wcxTime} Execution time of two implementations of $WCX$.
The plot shows the execution time in seconds of $10^{6}$ crossovers
for individuals with a number of loci ranging from 8 to 392. The dashed
line represents the straightforward implementation while the solid
line represents the efficient implementation proposed here.}
\end{figure}
%
\begin{figure}
\hfill{}\includegraphics[width=4in]{Figures/time_2}\hfill{}
\caption[~~~Execution time of two implementations of $1X$]{\label{fig:1xTime} Execution time of two implementations of $1X$.
The plot shows the execution time in seconds of $10^{6}$ crossovers
for individuals with a number of loci ranging from 8 to 392. The dashed
line represents the straightforward implementation while the solid
line represents the efficient implementation proposed here.}
\end{figure}
%
\begin{figure}
\hfill{}\includegraphics[width=4in]{Figures/time_3}\hfill{}
\caption[~~~Execution time of two implementations of $2X$]{\label{fig:2xTime} Execution time of two implementations of $2X$.
The plot shows the execution time in seconds of $10^{6}$ crossovers
for individuals with a number of loci ranging from 8 to 392. The dashed
line represents the straightforward implementation while the solid
line represents the efficient implementation proposed here.}
\end{figure}
%
\begin{figure}
\hfill{}\includegraphics[width=4in]{Figures/time_4}\hfill{}
\caption[~~~Execution time of two implementations of $HUX$]{\label{fig:huxTime} Execution time of two implementations of $HUX$.
The plot shows the execution time in seconds of $10^{6}$ crossovers
for individuals with a number of loci ranging from 8 to 392. The broadly
dashed line represents the straightforward implementation, the solid
line represents the efficient implementation proposed here, and the
finely dashed line represents the efficient implementation with the
precomputed distributions for the crossover masks.}
\end{figure}
%
\begin{figure}[p]
\hfill{}\subfloat[]{
\includegraphics[width=4in]{Figures/time_6}}\hfill{}
\hfill{}\subfloat[]{
\includegraphics[width=4in]{Figures/time_5}}\hfill{}
\caption[~~~Execution time of two implementations of $BFM$]{\label{fig:bfmTime} Execution time of two implementations of $BFM$.
Both plots shows the execution time in seconds of $10^{6}$ crossovers
for individuals with a number of loci ranging from 8 to 392. The dashed
line represents the straightforward implementation while the solid
line represents the efficient implementation proposed here. (a) probability
of mutation $10^{-3}$, (b) probability of mutation $10^{-5}$}
\end{figure}
\afterpage{\clearpage}
As discussed at the end of Section \ref{sec:Crossover}, the $HUX$
operator incurs an initialization time in order to set-up two distributions
to sample the distributions given by Equations \ref{eq:distJ} and
\ref{eq:distM}. The size of the distribution associated with Equation
\ref{eq:distJ} has a quadratic size on the number of loci, which
means an initialization time which is quadratic as well. Figure \ref{fig:huxTime}
shows that this initialization cost is still small enough that the
total execution time of the efficient implementation of the operator
is better than the straightforward implementation. Moreover, if the
distribution were to be precomputed and then read from a file at initialization
time, then further speedup can be achieved. This can be seen in Figure
\ref{fig:huxTime}, on the finely dashed graph.
The most dramatic increase in performance is observed on the mutation
operator, specially when the mutation probability is very small (as
is typically the case in biologically realistic simulations.) This
can be seen in the contrast between the running time of the efficient
implementation with a mutation probability of $10^{-3}$ in Figure
\ref{fig:bfmTime}(a) and with a mutation probability of $10^{-5}$
in Figure \ref{fig:bfmTime}(b).
\subsection{Statistical analysis}
One of the goals of this chapter is to create efficient implementations
of the some commonly used genetic operators. The other goal is that
these implementations are guaranteed, at least from the statistical
point of view, to be equivalent to straightforward versions of them.
In order to guarantee this, an statistical test was used to assure
equality of two sampled distributions. The reasons for the statistical
testing are two fold. On one hand, a correct implementation can be
qualitatively ensured. On the other hand, the use of the special random
number generator and in particular the calculation of very small probabilities
could cause numerical instabilities in the implementations.
The test used was described by \citet{kup60}, and is a variant of
the Pearson's-$\chi^{2}$ test of fit. Note that in this section,
$\chi^{2}$ will not denote a crossover operator, but the probability
distribution (chi squared) and its associated statistic.
Formally, given two samples $\{X_{i}\}_{i=1,\dots,N_{1}}$ and $\{Y_{j}\}_{j=1,\dots,N_{2}}$
of discrete random variables with the same discrete support $\{S_{k}\}_{k=1,\dots,n}$,
let \[
f_{i}=\sum_{j=1}^{N_{1}}[X_{j}=S_{i}]\,,\,\,\, g_{k}=\sum_{j=1}^{N_{1}}[Y_{j}=S_{k}]\]
where we use the following special notation, \[
[x]=\begin{cases}
1 & \mbox{if }x\mbox{ is true}\\
0 & \mbox{otherwise}\end{cases}\]
Note that $f_{i}$ and $g_{i}$ are the number of times that the value
$S_{i}$ is observed in the first and second sample, respectively.
Moreover, \[
\sum_{i=1}^{n}f_{i}=N_{1}\,,\,\,\,\sum_{i=1}^{n}g_{i}=N_{2}\]
Now, following \citet{kup60}, the statistic to test the equality
of both samples is given by \begin{equation}
\chi^{2}=\frac{(N_{1}+N_{2})^{2}}{N_{1}N_{2}}\left(\sum_{i=1}^{n}\frac{f_{i}^{2}}{f_{i}+g_{i}}-\frac{N_{1}^{2}}{N_{1}+N_{2}}\right)\label{eq:chisq}\end{equation}
and whenever the two samples come from the same distribution, then
the statistic $\chi^{2}$ has distribution $\chi^{2}$ with $n-1$
degrees of freedom.
In the case of the crossover operators studied in this Chapter, the
support of the distribution varies depending on the parents' genotypes.
The actual distribution varies for both, mutation and crossover operators
depending on parameters. Note that given that a mutation can produce
any genotype from any starting genotype, the support is the whole
space of genotypes. For this reason and for convenience, statistical
equivalence was only assessed for 8, 16 and 24 loci. This keeps the
memory requirements to calculate the observed frequencies of offspring
and mutated individuals to a reasonable size.
For all operators we conducted 20 repetitions of $10^{6}$ applications
of the genetic operator to random individuals. Table \ref{tab:stats}
summarizes the results of the statistical analysis. All differences
with a $p$-value smaller than $0.05$ were deemed significant. Therefore,
the expectation is that one run out of the 20 will be found significantly
different. The actual match of the efficient implementation is better
than this with only about $2\%$ of the runs found statistically different.
%
\begin{table}
\hfill{}\begin{tabular}{|c|c|c|c|}
\hline
Operator & $L=8$ & $L=16$ & $L=24$\tabularnewline
\hline
\hline
$WCX$ & 1 & 0 & 1\tabularnewline
\hline
$1X$ & 0 & 1 & 0\tabularnewline
\hline
$2X$ & 0 & 0 & 1\tabularnewline
\hline
$HUX$ & 0 & 1 & 0\tabularnewline
\hline
$BFM$ & 1 & 0 & 1\tabularnewline
\hline
\end{tabular}\hfill{}
\vspace*{\bigskipamount}
\caption[~~~Summary of statistical results for genetic operators]{\label{tab:stats} Summary of statistical results for genetic operators.
The entries in the table represent the number of runs, out of 20,
that showed statistically significant (at $5\%$) difference in the
observed frequencies of offspring. The results are given for three
numbers of loci ($L$): 8, 16 and 24.}
\end{table}
To assess the sensibility of this statistical test, a change of $0.01$
in the parameter of one of the two implementations was used, and then
the test was run again. In this case, at least 19 runs in each category
were found to be significantly different. This strongly suggests that
not only the test is reasonable at detecting differences in the observed
frequencies, but also that the efficient implementations match with
the straightforward ones.
\section{Discussion}
Genetic operators are potential bottlenecks of genetic algorithms
and biological simulations. The efficient implementations given in
this chapter offer a speedup that varies depending on the operator
from a factor of two, to over a factor of ten. It is important to
note that, while the algorithmic complexity of these operators was
not reduced, a significant improvement in both, memory usage and execution
time was obtained. Moreover, the techniques used to optimize the operators
are general, and can be applied to other operators. Also, as the random
number generator, the efficient implementation of the genetic operators
discussed here, and the program for statistical analysis are all open-source,
any custom implementations would benefit from the same framework,
and could be easily tested for accuracy.
It was stated that most of the implementations discussed here benefit
from reduced memory usage by packaging several loci on a byte. The
one exception to this is the $HUX$ crossover operator, which might
require more memory that the straightforward implementation depending
on the number of individuals used in the simulation. Given that the
memory requirements of the distributions used by $HUX$ depend on
the number of loci on the individuals, if only a small number of individuals
with a large number of loci is required, it is possible that the overhead
would exceed the benefit of packing several loci in a byte. Memory
constraints, however, were not found to be prohibiting. For instance,
having 2048 loci per individual, incurs an overhead of less than 500MB
of RAM.
Aside from memory overheads, there are also initialization costs to
the approach given in this chapter. However, as discussed before,
such costs can be amortized over a long run. Since most distributions
required by the implementations are problem independent (as is the
case of the skip distribution for mutation and the distribution required
by $HUX$), it is possible to have a database in secondary storage
(like a hard drive) with the precomputed distributions.
The efficient implementations presented here are already in use in
simulations developed by \citet{gav07a}, \citet{gav07b} and \citet{due09}.
By making the programs open source, it is hoped that other researchers
would used them and benefit from the speedups obtained. This initiative
is pursued in the hope to promote both, the use of tested components
that can be easily used in new simulations and evolutionary algorithms,
and the development of efficient implementations of other genetic
operators that could be contributed back for the use of the scientific
community.