The primary objective of this article series is to develop a mechanism to transfer data quickly and reliably between a microcontroller (uC) and a Field Programmable Gate Array (FPGA). In this installment we describe the communications protocol and the attributes of the associated frame.
There are many ways to design a uC to FPGA interface. Let’s frame this conversation by first establishing a general set of design requirements:
- minimize the number of wires between the uC and FPGA
- allow targeted (addressable) exchange of information between the uC and FPGA
- accommodate several thousand addressable FPGA registers
- optimize the message length
- easy of programming in the uC without excessive complexity in the FPGA
- high speed
- provide a measure of data integrity
- cross platform compatible e.g., not Xilinx specific
Notice that minimal utilization of FPGA resources is not one of the design requirements. The reasons for this apparent omission are explored in Part 1 of this series. Recall that all FPGA module outputs are registered and synchronous withing a clock domain. This design stipulation is considered an acceptable tradeoff balancing consumption of FPGA fabric against design stability and reduced troubleshooting time. While the RTL described in this article is not optimized, it consumes a small fraction of the total resources available in the Digilent Basys 3’s Xilinx Artix-7 XC7A35T-1CPG236C FPGA.
The design and its requirements are best visualized in terms of a hardware and data protocol. This is analogous to the layers 1 and 2 of the Open Systems Interconnect (OSI) model. The layer 1 physical level describes the electrical specifications. In this design, low wire count implies that we could use I^2C, SPI, dual SPI, or even Quad SPI. From these options, the SPI stands out as it allows full-duplex communication, and it is easy to program from within the uC. As an aside, we will assume the FPGA can process data faster than the uC thereby eliminating the need for a flow control mechanism.
The layer 2 data link level requires us to define a protocol for the exchange of data. This layer is concerned with framing of data into appropriately sized blocks along with appropriate headers and error detection. Note that layer 2 and layer 1 are independent of each other as layer 2 is not directly concerned with the electrical interface specified in layer 1. In the future, this may allow our SPI interface to be replaced with a quad SPI or even an octa SPI provided the byte-wide interface between layers is preserved.
To my knowledge, there is no established protocol for data exchange using a SPI. Device specific implementations are the norm. As an example, a sensor may have a manufacturer or device specific protocol to read and write data. This would include rules for data exchange involving a single or a block of registers. Select devices may incorporate a Cyclic Redundancy Check (CRC) allowing a measure of data integrity for data read and write operations. Finally, most SPI devices operate with a half-duplex protocol. A typical SPI transfer begins with a byte containing a read / write flag. The device will than take appropriate action on subsequent clock octets. It is rare to find full duplex with simultaneous transmission on the MOSI and MISO lines.
From the uC perspective a full-duplex SPI master is each to achieve. Here is an Arduino implementation including the framing of the SPI transfer within the chip select signal.
// prepare buf for SPI transmission // append CRC digitalWrite(CS_PIN, LOW); SPI.transfer(buffer, bufferSize); // full-duplex: each uC byte in buff is replaced with an FPGA byte digitalWrite(CS_PIN, HIGH); // verify and handle CRC // unpack and handle FPGA data
Before we continue, it’s useful to examine an 802.3 Ethernet frame as it serves as a starting point for a SPI protocol. As seen in Figure 1, the Ethernet frame begins with a header containing the preamble, MAC destination plus source, and the length fields. Next is the payload which can vary from 46 to 1500 bytes. This is followed by a 32-bit CRC code.
Figure 1: Simplified 802.3 Ethernet frame.
We can reduce the size of the Ethernet frame to better suit the uC to FPGA interface. With SPI, there is no need for the preamble and addressing. The SPI CS_not control line and the one-to-one nature of SPI makes these field redundant. The size filed and the variable length payload are desirable features that will be retained. This will allow the uC to read or write a single FPGA register. It will also allow a block of data to be transferred. Finally, the CRC will allow a measure of data validation.
For our purposes, the size of the fields may be reduced. A reasonable compromise is to use a single byte to identify the length of the payload. This naturally limits the frame to a length of 256 elements including space for the header and CRC. Considering the reduction in frame length, a 16-bit CRC is reasonable.
Recall that SPI provides a full-duplex communication channel. As has been shown, this is not overly complicated from a uC programming perspective. However, to leverage this full-duplex mode we must now determine how to address specific hardware inside the FPGA.
To use this method, the FPGA hardware must be addressable. For instance, the FPGA training boards’ LEDs may be associated with a base address – two bytes for the 16 LEDs installed on the Basys 3. The board’s 16 slide switches could be associated with another two addresses. With this construct, the FPGA acts as if it were just another peripheral to the uC. The uC reads and writes to FPGA hardware using the frame protocol via SPI.
For the remainder of this article, we will refer to the FPGA hardware as a peripheral. This reflects the design where the uC effectively uses the FPGA as a high-performance peripheral via the SPI interface.
One of the design requirements is scalability with the ability to address a few to several thousand FPGA registers. Given this requirement, a byte width address would be too constrictive limiting us to 256 items. However, a 2-byte wide interface would be acceptable as it allows the uC to address up to 65536 items within the FPGA.
We could stop at this point and identify a header for our SPI frame. It would contain a byte indicating the type of command (read or write), a byte for the payload length, two bytes for the address, a variable length payload and a CRC. The corresponding is 4 byes in length, the payload may be up to 250 bytes, and the CRC is 2 bytes.
This may be sufficient for many designs. However, this method carries an implicit assumption of half-duplex. To understand why, consider the causality associated with the SPI transfer. Suppose the register controlling the board’s 16 LEDs was located at base address 0x0200. If we were to initiate a SPI transfer on this register, we would effectively read the FPGA register and then write to it. The read operation would be pointless as it would contain old data. Here we assume double buffering is implemented on all FPGA registers to prevent instability associated with the update of multi-byte registers.
For full-duplex we need to add another addressing field. Suppose we have a register that holds the FPGA board’s switch status located at address 0x0202. If we add another addressing field to the frame, we can now perform a meaningful read and write operation.
We can now read the switch status while simultaneously updating the LEDs.
The new header is 5 bytes in length, the payload may be up to 249 bytes, and the CRC is 2 bytes. Note that the command byte is no longer necessary.
- Payload Length: (1 byte) num bytes in header and payload
- To Address Field: (2 bytes) identifies FPGA register to be written
- From Address Field: (2 bytes) identifies FPGA hardware to be read
- Payload: (1 to 249 bytes)
- CRC: (2 bytes)
A close comparison between the two options reveals a one-byte difference. The half-duplex frame with its read / write command byte would require an 8-byte transfer to write to the FPGA board’s 16 LEDs. The corresponding full-duplex operation requires 9 bytes. However, the design can now simultaneously read the switches and it writes to the LEDs. Depending on you we look at this, we either wasted one byte in the process or saved eight plus the time overhead associated with the transaction.
After considering these design trade off, we will focus exclusively on the full-duplex option. The uC overhead and FPGA programming overhead are inconsequential. The speed difference for small payload transfers is minimal. For large payloads the change is inconsequential.
There is one downside to this full-duplex protocol that must be identified. It is a fact that each data transfer now involves a write operation. There will be times when it is inconvenient to do so. Rather than confound a write operation, we will reserve an empty / nonfunctional block of FPGA addresses from locations 0xFF00 to 0xFFFF. When a read only memory operation is desired, the uC will “write” to this empty location. With this simple modification, the full-duplex operation may be operated as a half-duplex read operation.
The command and response frames for the full-duplex uC to FPGA protocol are shown in Figure 2. The header content is closely related between the frames. We see the read address and length header information is echoed in the response. Note that there is a causality issue associated with the response. Since the fields are streamed with simultaneous byte-by-byte transfers, the response will be delayed. As an example, the length field which is the first byte sent cannot be the first field in the response because it has not yet been received when the FPGA locks in the first byte to be sent. This is not an issue as the write address is not important to the response frame. Instead, the FPGA can send program specific flags while the uC is streaming the length in the first byte.
The remainder of the frame fields are similar. The FPGA response will stream data starting with the base read address. Meanwhile the FPGA will simultaneously move the uC data to a temporary buffer. The respective CRC are calculated and appended to the frame.
Figure 2: Relationship between the command and response frames.
A measure of data integrity is provided by the CRC. Recall that the CRC is applied to the frame contents. Let’s consider the three-step process from the uC perspective. The uC will first prepare a frame buffer with the associated data. It will then call a function to calculate and append the CRC to the buffer. Finally, the uC will call function to transfer the data over SPI.
The FPGA will place the streaming data into a buffer while simultaneously calculating the command frame CRC. When the frame is complete the FPGA will compare the received and the calculated command frame CRC. If they match, the FPGA will quickly transfer the data from the buffer to the associated hardware registers. This is a relatively quick operation requiring approximately 5 uS for a full payload. If the received and calculated CRC are different the FPGA will discard the frame. It will also trigger a frame error mechanism.
A frame error counter is included for troubleshooting purposes as seen in the Figure 2 responce frams. Provisions could also be added for a physical wire leading from the FPGA to the uC. This would allow the uC a fast method to detect and then respond to frame errors.
The importance of the FPGA’s temporary frame buffer cannot be overemphasized. The FPGA will not act on new data until it has been validated by the CRC. This important topic will be covered in another article installment.
The integrity of the response frame requires careful programming in the uC. We must first recognize that the FPGA will immediately respond to the uC read request long before the command frame has been verified. As a result, it is up to the uC to verify the integrity of the FPGA developed data as well as the FPGA’s understanding of the command. This is done with an echo of the frame length and base read address in the response frame. The uC can identify error by comparing the response frame to the data in the command frame.
For example, suppose there was a bit error in the command frame’s read address field as it crosses from the uC to the FPGA. There is a good measure of integrity in the uC to FPGA write operation as the error has a very high likelihood of detection by the FPGA’s CRC machinery. The FPGA will disregard the frame. At the same time, we must consider causality associated with the FPGA’s parallel structures.
The FPGA will stream the read data long before it knows the command frame is corrupt. In fact, it will dutifully, stream data starting from the erroneous read address. It will even append a valid CRC to the frame.
When the transaction is complete the uC will have a full copy of the response frame. Before acting, it must validate the data. This is a two-step process:
- The uC must first verify the transmitted and received fields for length and read address match.
- It must then verify the response frame’s CRC.
When both operations are complete, the uC can then take appropriate action to handle the communications error.
Your comments and suggestions are welcome. Further discussion about high-level RTL system design methodology is especially welcome.