Cellular Automata Implemented on FPGA Based on Totalistic Rules for Deterministic Systems

Alexander Bautista-Torres¹, Edgar Alexander Bautista-Aleman² and Helbert Eduardo Espitia-Cuchango³

¹,²,³Facultad de Ingeniería, Universidad Distrital Francisco José de Caldas, Bogotá, Colombia.

Abstract

This paper presents the design and implementation of a cellular automata based on totalistic rules for dynamic deterministic systems. The implementation is made on FPGA and the simulation results are shown via a software user interface. With this development, the parallelism of the FPGA is used for the simulation of dynamic systems by means of cellular automata. The results show that the proposed system obtains the simulation of the dynamic system using less time than conventional software of cellular automata.

Keywords: Cellular automata, dynamic system, FPGA, simulation.

I. INTRODUCTION

Commonly, phenomena and interactions among the different parts of a system (namely physical, biological, etc.) are described in models that in a global manner, explain the behavior of each part. Even though such approximations can be technically appropriate, those lack of viability in terms of implementation. In this regard, cellular automata arise as an alternative for the simulation in these models taking a group of cells at different stages as a start point and an upgrade rule aiming the prediction of behavior in time of a particular system.

Cellular Automata CA were initially proposed in the fifties by John von Neumann [1] and Konrad Zuse [2] and later developed in the eighties by the physicist Stephen Wolfram [3]. A CA is considered a dynamic system in discrete time in which the interactions among cells are governed by an upgrade or transition rule [4]. In itself, the concept of CA is associated with concepts like space and locality of influence [5], [6], and is widely employed in the simulation of dynamic systems of a different order (physics biology, sociology, and chemistry, among others) [4], [7], [8].

Currently, there are software applications that allow executing cellular automata of diverse kinds (NETLOGO, SPASIM, STARLOGO, SR-CA [9]). Such applications are useful when the number of evolutions and/or the number of cellular stages is small; nevertheless, when some of these parameters arise, the execution time of the CA gradually increases depending on the software type of sequential processing [4], which tends to be inefficient when having considerable increments [10].

There are several implementations of CA built on FPGA (Field Programmable Gate Array) [10-18]. Some of them pose models to evaluate and improve the performance while others employ code generators based on a high-level language to get VHDL/VERILOG code for a particular CA.

These implementations result to be more efficient when executed on parallel processing platforms than when using software applications [10], especially when the number of evolutions and/or stages of the CA arise; however, those lack versatility for the absence of a suitable graphic interface to visualize results and change the parameters of the simulation of the CA, moreover, it is necessary that the user interact with tools of FPGA/C++ [13].

Consequently, this document proposes the design of hardware architecture of a CA for a maximum size of 10.000 cells, which is executed in a time considerably less extended than when using software applications and dynamically displaying the results using software designed with Java. This architecture is applied to CA deterministic based on totalistic Boolean rules and cells of four stages aiming the most optimal usage of the parallelism of the FPGA for the simulation of dynamic systems through cellular automata.

II. METHODOLOGY

This section describes the design and implementation of the system for a simulation of cellular automata on FPGA, including the revision of the concepts of cellular automata, the tool for developing the FPGA, and lastly, the design and implementation of the cellular automaton is displayed in detail.

II.I Cellular Automata

A cellular automaton refers to a discrete dynamic system formed by a set of cells distributed in a grid which have a finite set of stages that change according to an upgrade rule applied in steps of discrete time [19]. CAs are uniform since the cells are upgraded from the same set of rules, parallel since all the cells evolve simultaneously, and local since the further stage of each cell depends on its current stage and the stage of the neighbors [4]. Fig. 1 shows the graphic representation of a single cellular automaton.
The CA execution algorithm consists of three steps. First, the assignment of initial stages to the cells, second, the execution of the transition rule on each one, and third, the cells upgrade according to the result obtained in the second step [19].

This process must be executed concurrently on the cells [10]. Every time that these three steps are made represents the accomplishment an evolution process.

CAAs can be classified according to their rules in deterministic and probabilistic, or according to the spatial distribution in one-dimensional, two-dimensional, or three-dimensional; thus, the main components of a CA are grid, cells [3], transition rules [4], neighborhood [20], and border conditions [21]. For practical issues, this document considers two-dimensional deterministic CAs, and border conditions fixed or recurring.

II. Development Tool

While implementing the CA the parallelism is aimed during the execution, therefore, the Micro Blaze Development Kit Spartan-3E1600E card was employed which is a robust development platform that allows multiple projects of a different kind. The board has an FPGA Spartan3E with 1600K gates. It also includes a Flash Xilinx platform, USB interfaces and parallels for programing JTAG with multiple configuration options for the FPGA using an Intel Strata Flash and a Serial Flash ST [22].

II. III Cellular Automaton Design

Two principles are necessary for the design of the hardware architecture: first, the number of cells, which is required to be large enough to have a wider concept of the dynamic of the behavior, and second, the time interval between two consecutive evolutions (time of evolution) that needs to be as small as possible to accelerate the execution (especially when the number of evolutions is high).

From the perspective of the hardware employed in the proposed architecture, each cell of the CA needs to have an arithmetic-logic unit for solving the rule of transition stages which would demand a large number of hardware resources, making almost impossible implementing a CA of 100X100 cells in an FPGA Spartan-3E1600E.

Thus, for solving this issue a semi-parallel processing architecture is proposed which executes blocks of 100 cells in a recurring way.

II. IV System General Architecture

The system consists of a software and hardware module interacting exchanging information on the process of execution of the CA. Fig. 2 presents a system general diagram of blocks.

II. V Software Module

This includes an application developed using Java, this module communicates with the FPGA using RS232 to transmit and receive information of the CA. The graphic interface permits the user simulate and configure the parameters in the CA (number of evolutions, type of neighborhood whether Von Neumman or Moore and 100X100 maximum size); likewise, it allows visualizing the results after a set of evolutions programed.

II. VI Hardware Module

This module includes an FPGA SPARTAN3E1600 and contains algorithms programmed in VHDL for the execution of the CA and communication processes with the software. Internally, the module is composed by the block of communications, which is established by a half-duplex channel between the FPGA and the software module through the RS232, and ASCII character commands; next is the processing information block that codes and decodes the signals interchanging with the software module and the cellular automaton block in charge of the execution. Fig. 3 shows the way of operation of the blocks.
II.VII Cellular Automaton Architecture

The architecture here presented is designed for executing four CA based on deterministic models with totistic rules of maximum four stages. Fig. 4 presents a diagram of the internal distribution of this block.

Fig. 4. Composition of the cellular automaton block.

Fig. 4 shows the RAM Unit Memory (RAM-UM) which storages the cells current and future stages in two volatile memory spaces, the Unit of Control of Memory (UCM) is also seen, this is in charge of the reading/writing processes, and the Arithmetic Logic Unit (ALU) corresponding to a one-dimensional of a hundred (100) cells based on logical-mathematical algorithms that concurrently execute one of the four CA.

II.VIII Execution Process

The evolutive process of a CA is made of three steps: Reading of the current stage in each and the surrounding cells, calculations of the future stage of each cell through the CA model, and data update of the different stages of each cell. These processes are sequential and are executed concurrently on a one-dimensional vector. The maximum length of the vector is 100 cells; however, it varies according to the CA executed. This process is repeated sequentially as many times as the size of the CA indicates. In cases where the CA is 50X80, the sector may have 50 cells and the three steps would be repeated 80 times. Fig. 5 shows an example of this situation.

Fig. 5. CA execution process by sectors.

In Fig. 5, the example displays a CA of 10X10 whose upgrade rule is the game of life, green cells are neighbors of the CA extremes and define the border conditions; Fig. 5-a presents the CA with zero evolutions in a time $t$, while Fig. 5-b exposes the same CA with one evolution. The step from zero to one evolution happens in a time equals to $t + 12k$, $k$ is the product of the lapse of the clock signal multiplied by the number of clock cycles used when performing the three processes.

Likewise, the time taken by a CA for a single step is given by:

$$T_{\text{Evolution}} = t_0 + (N + 2) \cdot T_{\text{Clock}} \cdot N_{\text{Cycles}}$$  \hspace{1cm} (1)

Here, $t_0$ is the instant where the automaton starts another evolution, $t_0$ which means the lapse of the system clock which comes of the Central Processing Unit (CPU) of the FPGA, and $N_{\text{Cycles}}$ is the number of cycles employed in the execution of read-write processes and cells upgrade. Such implementation takes place with a 100 MHz clock. To manage the two memory spaces during the CA process is necessary to consider that each evolution uses one memory space to read the current stages and the other to write the future stages; thus, it is proposed that the memory space limited by 0X00 and 0X65 (0 and 101) is employed to read the current stages when the number of evolutions is even, and to write the future stages when the number of evolutions is odd. On the other hand, the memory space two, limited by 0X66 and 0xCB is used in reading mode when the number of evolutions is odd and in writing mode when even. When the first evolution takes place, initial stages of the CA must be written in the memory space 1; Table 1 shows the ranges of different directions and the possibilities.

Table 1. Memory space configurations.

<table>
<thead>
<tr>
<th>Range of Directions</th>
<th>Even Evolutions</th>
<th>Odd Evolutions</th>
<th>Memory Space</th>
</tr>
</thead>
<tbody>
<tr>
<td>0X00-0X65</td>
<td>Read</td>
<td>Write</td>
<td>One</td>
</tr>
<tr>
<td>0X66-0xCB</td>
<td>Write</td>
<td>Read</td>
<td>Two</td>
</tr>
</tbody>
</table>

According to this, there must be the same number of consecutive evolutions and iterations in accordance with the number of sectors in the grid. Below the review of the steps taken in each iteration:

**Step one:** Read and store the stages of the cells of a sector which is the same number of iterations, that is, iteration one is chosen from sector one, iteration two is chosen from sector two, and so on.

**Step two:** Next, read and store the stages of the cells hosted in the previous sector and next to the chosen sector. This is done to obtain eight neighbors for each cell of the selected sector in step 1. Reading operations made in step 1 and 2 are made in the reading sector of the memory space.

**Step three:** After obtaining the stages of the cells located in the selected sector with those of the neighbors, calculations and storage can be made on the future stage of the cells using the general transition rule.
**Step four:** Once obtained the future stage of the cells, such stages are written in the same memory space predetermined for writing.

According to this, each iteration needs to generate four different directions that vary according to parameters of the number of evolutions and the number of the sector selected in each iteration; where \( N \) represents the evolution stage while \( S \) is the number of the sector selected in each iteration, thus, four directions may be generalized as:

\[
ADD_{secr}(N,S) = \begin{cases} S, N = 2n - 1 \\ S + 102, N = 2n \\ S - 1, N = 2n - 1 \\ (S - 1) + 102, N = 2n \end{cases} \\
ADD_{vec1}(N,S) = \begin{cases} S + 1, N = 2n - 1 \\ (S + 1) + 102, N = 2n \end{cases} \\
ADD_{vec2}(N,S) = \begin{cases} S + 102, N = 2n - 1 \\ S, N = 2n \end{cases} \tag{2}
\]

Where \( n \in \mathbb{Z}^+ \) and \( ADD_{secr} \) is the direction of the sector memory. Meanwhile, \( ADD_{vec1} \) and \( ADD_{vec2} \) are the directions of memory from where the neighbors of the cells are obtained and \( ADD_{secw} \) is the direction of the sector where the future stages of the cells of the sector selected are written. Table 2 displays value ranges that can take the aforementioned directions whether the number of evolutions is even or odd for the CA implemented.

**Table 2.** Range of directions in each iteration.

<table>
<thead>
<tr>
<th>Direction</th>
<th>Range of directions for even evolutions</th>
<th>Range of directions for odd evolutions</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADDsecr</td>
<td>0X67 - 0XCA 103 - 202</td>
<td>0X01 - 0X64 1 - 100</td>
</tr>
<tr>
<td>AADvec1</td>
<td>0X66 - 0XC9 102 - 201</td>
<td>0X00 - 0X63 0 - 99</td>
</tr>
<tr>
<td>AADvec2</td>
<td>0X68 - 0XD5 104 - 203</td>
<td>0X02 - 0X65 2 - 101</td>
</tr>
<tr>
<td>ADDsecw</td>
<td>0X01 - 0X64 1 - 100</td>
<td>0X67 - 0XCA 103 - 202</td>
</tr>
</tbody>
</table>

III. RESULTS

Firstly, the comparison is made using conventional software to simulate cellular automata. Later, as an example, the results delivered for one of the models implemented are qualitatively shown using graphics and a table containing the associated characteristics with the respective simulation.

The execution of the experiments considered different automata models like:

- Model 1: Lotka-Volterra (predator-prey).
- Model 2: Greenberg and Hastings.
- Model 3: Population growth with limited resources.

III.I Execution Time

Software SR-CA is employed aiming to have a comparison for the time employed in the execution, this software is widely used for cellular automaton [9]. This applet is developed using Java since this permits a set of predetermined rules, including the virtual life game; this software includes nine predetermined square grids with 16X16 as minimum and 4096X4096 as the maximum size.

Using it as a pattern of comparison includes the average measurement of the time employed by the software to complete an evolution; for this purpose, a fixed lapse of 10 seconds is considered then counting the numbers of evolutions achieved during that lapse. Table 3 shows different measurements with this tool.

**Table 3.** Time evaluation of the software SR-CA.

<table>
<thead>
<tr>
<th>Time</th>
<th>10 sec</th>
<th>10 sec</th>
<th>10 sec</th>
<th>10 sec</th>
<th>10 sec</th>
</tr>
</thead>
<tbody>
<tr>
<td>Evolutions</td>
<td>7526</td>
<td>7552</td>
<td>7742</td>
<td>7292</td>
<td>7451</td>
</tr>
<tr>
<td>Time</td>
<td>10 sec</td>
<td>10 sec</td>
<td>10 sec</td>
<td>10 sec</td>
<td>10 sec</td>
</tr>
<tr>
<td>Evolutions</td>
<td>7417</td>
<td>7380</td>
<td>7574</td>
<td>7517</td>
<td>7403</td>
</tr>
</tbody>
</table>

According to the experimental data in Table 3, the average number of evolutions is 7485.4; therefore, the average lapse of one evolution is 1.33 milliseconds.

Meanwhile, Table 4 shows the times employed for an evolution in the system developed for the models considered.

**Table 4.** Times of one evolution of the models in microseconds.

<table>
<thead>
<tr>
<th>Execution</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
</tr>
</thead>
<tbody>
<tr>
<td>Model 1</td>
<td>13,194</td>
<td>15,057</td>
<td>13,624</td>
<td>13,243</td>
</tr>
<tr>
<td>Model 2</td>
<td>17,597</td>
<td>11,047</td>
<td>10,959</td>
<td>10,595</td>
</tr>
<tr>
<td>Model 3</td>
<td>17,597</td>
<td>11,047</td>
<td>10,959</td>
<td>11,047</td>
</tr>
</tbody>
</table>

From Table 4, considering that the average time for execution is 12,997 microseconds, and the time taken by the software is 1,336 milliseconds, then, it is observable the advantage provided by an architecture dedicated only for performing simulations on cellular automata.

III.II Greenberg and Hastings Simulation Model

Greenberg and Hastiong developed a model with cellular automata simulation the reaction of Belousov-Zhabotinsky, the model employs four neighbors [23], the possible stages are \( \{0, 1, 2\} \) where:

- 0 is the rest stage or global equilibrium.
- 1 or 2 is the active stage.
- 3 represent the refractory stage.

The last one is characteristics since once achieved the finite automata will be insensitive to neighbors and unable to activate them. The transition rule was established by
Greenberg and Hasting in the original model [23] according to the expression:

\[
c_{i,j}(t + 1) = R[c_{i,j}(t)] + D[c_{i-1,j}(t), c_{i+1,j}(t), c_{i,j-1}(t), c_{i,j+1}(t)]
\]  (3)

Where,

\[
R = \begin{cases} 2, & \text{if } c_{i,j}(t) = 1 \\ 0, & \text{otherwise} \end{cases} \]  (4)

\[
D = \begin{cases} k, & \text{if } c_{i,j}(t) = 0 \\ 0, & \text{otherwise} \end{cases} \]  (5)

In equation (5) \(k = 1\) if there is at least one active neighbor cell in the Von Neumman neighborhood, otherwise, \(k = 0\). It is noteworthy that \(R\) refers to the process of reaction and \(D\) to the process of diffusion of the equation reaction-diffusion of Greenberg and Hastings.

Multiple stages of refractory and excitement can be present when employing Greenberg and Hastings model. The simulations made of this model are performed using a resting 0 stage, two stages of excitement 1, 2, and a refractory stage 3.

Fig. 6 displays the configuration of initial stages where the blue cells represent the resting stage, magenta cells the stage of excitement, and the green cells represent the refractory stage. It is noteworthy that the neighborhood of Von Neumann and constant border conditions were used for the simulation.

The results obtained after subdued the initial pattern to ten evolutions are shown in Fig. 7 where it is observed how the waves are generated in the center of the grid and move toward the exterior part of it in a way that allows clear identification of the processes of reaction-diffusion of the model Greenberg and Hastings. Note that the pattern of spatial distribution shows certain homogeneity in the central part which is shown in the wave shape.

After 63 evolutions, Fig. 9 shows how the waves are generated from the center of the grid and move toward the external part. Here, the diffusion characteristic in the Greenberg and Hastings model is more notorious displaying also homogeneity in the wave generated from the central pattern of the grid. Note how the changes mold the expanded wave starting from the cells in the excitement stage. There is also a great density of cells in the refractory stage that pose a prompt stabilization of the expanded wave.

For 100 evolutions, Fig. 10 displays an identical pattern respect from Fig. 9, which indicates that the wave arrived to a
stable state; therefore, both processes of reaction and diffusion remain constant.

The convergence of the software application and hardware development allowed the creation of a tool with strengths in both sides, having a user interface to interact with the tool and hardware platform that optimizes the execution time of the cellular automaton.

An improvement of the system may well modify the standard of communication to optimize the connection between the modules of software and hardware as there exists a more universal standard than RS-232 such as USB and TCP/IP.

Acknowledgements

The authors express their gratitude to the Facultad de Ingeniería de la Universidad Distrital Francisco José de Caldas.

REFERENCES


