Implementing a Robust Microcontroller to FPGA SPI Interface: Part 4 - Double Buffer

This installment continues our exploration of a microcontroller (uC) to Field Programmable Gate Array (FPGA) interface.

  • Part 1 introduces a Verilog design philosophy that guides the development of larger systems. This is a critical piece introducing Register Transfer Level (RTL) design guidelines such as clock boundaries, use of strobes, and the necessity of double buffers.

  • Part 2 presents the SPI protocol. Recall that the chosen protocol is adapted from the 802.3 Ethernet frame with concepts such as variable payload length and a Cyclic Redundancy Check (CRC) to provide a measure of data integrity.

  • Part 3 presents a high-level view of the uC to FPGA interface. The most important part of that article is the block diagram repeated here as Figure 1 for your convenience.

This installment is focused on the upper right-hand-side of the block diagram. Specifically, the operation of the expandable double buffer.

Figure 1: Top-level FPGA block diagram showing the FPGA’s data flow.

Review of the PWM

Before introducing the double buffer, we will briefly explore the operation of the Verilog Pulse Width Modulator (PWM). This is important, as the double buffer is best viewed as an addressable interface to a hardware module such as the PWM.

The top-level interface for the PWM module is described in this Verilog code snippet. Observe that the module makes use of parameters for bit width and establishing limitations for minimum and maximum duty cycle. Finally, observe that the PWM module has a [B - 1:0] input vector to set the duty cycle. Not shown is the fact that the input is read at the start of each PWM duty cycle.

module PWM #(parameter 
    B = 12,                             // implies a 24.4 kHz PWM assuming a 100 MHz system clock
    D_MIN_PERCENT = 0,
    D_MAX_PERCENT = 95
)
(
    input wire clk,
    input wire enable,
    input wire [B - 1:0] d_in,
    output reg PWM,
    output reg [B - 1:0] cnt
);

Simultaneous Data Presentation

As shown in Figure 1, the PWM is designed to operate within the greater uC to FPGA SPI system. Recall that SPI naturally operates with byte-width data elements. This is in stark contrast to the PWM which operates using a data width defined by B. For convenience, let’s assume that the PWM was instantiated with a bit width (B) of 16-bits.

A problem arises when the system updates the registers associated with the PWM’s input. Without proper attention, the PWM may perform a read operation in the middle of an update. The result is a split of the drive bytes with one old and one new. This can result in a significant jump in duty cycle lasting for a single PWM cycle. This may go unnoticed if the PWM is used for a LED indicator. In more complex systems, the fault is equivalent to a strong impulse and could cause the system to ring or become unstable depending on when and how often the error occurs.

The solution is to implement a double buffer scheme as mentioned in the previous articles with an in-depth discussion to follow. One set of registers are used to capture the individual bytes. A second wider register is updated when the full N-byte data have been gatherd. This second register – the double buffer – is then used to drive other modules such as the representative PWM.

Double Buffer Module

The block diagram for the double buffer module is included as Figure 2. Internally, it consists of four major sections. The most important is the output register. In this example, it is 16-bits wide making it suitable for driving the 16-bit PWM. The output register is driven by individual 8-bit registers, in this example they are labeled LSB and MSB. Observe that all register updates are initiated by the double buffer’s control section. This is a synchronous operation where all elements respond to the tick of the primary 100 MHz clock.

Figure 2: Block diagram of the double buffer showing the relationships between the individual 8-bit buffers and the output buffer.

It important to understand that each double buffer module is instantiated with specific address and a particular byte width as shown in this code listing. Note that the 16-bit address, 8-bit data, and write strobe are all involved in loading the buffer. The data transfer begins when the 16-bit address input matches the instantiated address.

module double_buffer #(
    parameter BYTE_WIDTH = 2,
    parameter BASE_ADDRESS = 16'h0200  // Starting address for the MSB
) (
    input wire clk,
    input wire [7:0] data,
    input wire [15:0] address,
    input wire write_strobe,
    output reg [((8 * BYTE_WIDTH) - 1): 0] double_buffer_out,
    output reg new_data_strobe
);

As implied by Figure 1 and 2, there is an underlying 8-bit transfer process for this uC to FPGA interface. There is also a consecutive write operation implicit in the command frame first introduced in the article #2. The command frame is repeated here as Figure 3 for convenience. As an example, let’s assume the PWM and associated double buffer were instantiated with address 0x0200. The command frame’s write address would be set to 0x0200 and the first two bytes of the payload would hold the desired 16-bit PWM value.

Figure 3: Command and response frames that form the foundation of the uC to FPGA SPI protocol.

When the command frame is received and validated, the MSG Write block (see Figure 1) will assert address 0x0200 which points to the PWM’s double buffer. It will place the first payload byte onto the data buss. Finally, it will assert the write strobe for one clock cycle. This loads the MSB as shown in Figure 2 (big-endian).

Continuing with the consecutive write, the MSG writer, advances the address, asserts the next data byte, and then pulses the write strobe thereby loading the LSB into the double buffer. This process continues for every byte in the command frame as governed by the frame’s byte length field.
Intrinsically, the message writer does not understand the length of the associated double buffer(s). It is only concerned with the three-step process of asserting address, data, and write strobe. It is up to the double buffer module(s) to understand when they are addressed and when they have received the requisite number of bytes as specified by the BYTE_WIDTH parameter.

Since the base address and byte width of the double buffer is known upon instantiation, it’s easy to determine when all bytes have been received. In this PWM example, the double buffer counts to two and then sends a strobe to load the output register.

Tech Tip: Data may be accessed Most Significant Byte (MSB) or Least Significant Byte (LSB) first. The terms to describe the order is “endian.” If the MSB occurs first, the system is big-endian. If LSB is first, the system is little-endian. The double buffer and associated frame as described in this article are big-endian.

Double Buffer Code

The double buffer’s Verilog code is attached to the end of this note. The code closely follows the Figure 2 block diagram with the understanding that it may be expanded to an N-byte width. This is accomplished by changing BYTE_WIDTH parameter.

The key to this code is the use of the Verilog generate operator. Recall that the generate feature allows hardware to be generated iteratively. It operates like a factory making widgets. Except, in this case we are making 8-bit registers with the total number of assemblies equal to the BYTE_WIDTH parameter. We can see this in the Vivado hierarchical design window captured here as Figure 4. These “manufactured” blocks appear with their consecutive naming scheme defined in the generate loop.

Figure 4: The generated byte-width registers are seen in the double-buffer instantiation.

Observe that each generated 9-bit register includes a corresponding local_write_strobe. This is an important design aspect as it is used by the “control” section to load the associated 8-bit register.

In addition to the registers, the generate loop manufactures an 8-bit vector to which the output of each 8-bit register is connected. These N by 8-bit bundles are then concatenated and passed to the N-byte output register.

The final part of the code determines when N-bytes have been collected. It then updates the output register and sends a new_data_strobe.

The control section has three basic functions:

  1. Activate the module when the base address matches the instantiated address.
  2. Maintain a counter to point to the “manufactured” 8-bit register. This counter is essential for consecutive writes.
  3. Strobe the associated 8-bit register.
  4. Strobe the output buffer when N 8-bit registers have been filled.

Tech Tip: A vector is a one-dimensional array of wires. An example is “input wire [15:0] address” which defines a 16-bit vector named address.

Closing for Part 4.

While this code is certainly complicated, the Verilog generate operator allows a great deal of flexibility. It eliminates the need to construct independent modules for each desired byte width.

In the next installment we will explore the operation of a Verilog based CRC generator. When that article is complete you will see how the frame as presented in Figure 3 is handled by the FPGA.

Your comments and suggestions are welcomed. Further discussion about high-level RTL system design methodology is especially welcome.

Best Wishes,

APDahlen

//**************************************************************************************************
//
// Module: Parameterized and Addressed Double Buffer
//
//  This RTL is subject to Terms and Conditions associated with the 
//  DigiKey TechForum. Refer to:
//  https://www.digikey.com/en/terms-and-conditions?_ga=2.92992980.817147340.1696698685-139567995.1695066003
//
//  Should you find an error, please leave a comment in the forum space below.
//  If you are able, please provide a recommendation for improvement.
//
//**************************************************************************************************
//           ______________________________________________
//          |                                              |
//          |   Module: double_buffer                      |
//          |                                              |
//          |    Parameters:                               |
//          |        BYTE_WIDTH = 4                        |
//          |        BASE_ADDRESS = 16'h0000               |
//          |______________________________________________|
//          |                                              |
//      ==8=| data                       double_buffer_out |=V==
//      ----| write_strobe                 new_data_strobe |----
//          |                                              |
//      ----| write_strobe                                 |
//          |                                              |
//      ----| clk                                          |
//          |______________________________________________|
//
//
//** Description ***********************************************************************************
//
//  This module works with a stream of parallel data. When the streamed address matches the 
//  parameterized address the module will store the byte width data into a buffer. The 
//  module than accepts consecutive data storing each byte in the associated buffer. When the
//  module has received BYTE_WIDTH bytes, the data are transferred to the output buffer. 
//   
//** Instantiation *********************************************************************************
//
//  double_buffer #(.BYTE_WIDTH(2), .BASE_ADDRESS(PWM_1_address) )
//    PWM_1_driver( 
//      .clk(clk), 
//      .data(MSG_writer_data), 
//      .address(MSG_writer_address),
//      .write_strobe(MSG_writer_strobe),
//      .double_buffer_out(PWM_1_drive),
//      .new_data_strobe(new_data_strobe)               // optional
//  );
//
//** Signal Inputs: ********************************************************************************
//
//  1) clk: High speed system clock (typically 100 MHz)
//
//  2) data: An 8-bit input. Data will be captured when address is within 
//     BASE_ADDRESS + BYTE_WIDTH (non-inclusive).  
//
//  3) address: A 16-bit input. The module will respond to BYTE_WIDTH consecutive address starting at 
//     BASE_ADDRESS extending to BASE_ADDRESS + BYTE_WIDTH (non-inclusive).  
//
//  4) write strobe: When pulsed, the data will be locked into the associated input buffers provided 
//     the address is within BASE_ADDRESS + BYTE_WIDTH (non-inclusive).  
//
//** Signal Outputs ********************************************************************************
//
//  1) double_buffer_out: This BYTE_WIDTH is the output buffer in this double buffer module. It is
//     updated in a single clock cycle after all the individual buffers have been updated. There 
//     is a one clock delay between the filling the last byte-wide buffer and a double buffer update.  
//
//  2) new_data_strobe: a pulse active for the same clock cycle in which the double_buffer is updated.
//
//** Comments **************************************************************************************
//
//  1) TODO, Consider eliminating the one clock cycle delay and eliminating the most significant 
//     byte input register. Use care for a single byte instantiation.
//
//**************************************************************************************************
 
module double_buffer #(
    parameter BYTE_WIDTH = 4,
    parameter BASE_ADDRESS = 16'h0000  // Starting address for the MSB
) (
    input wire clk,
    input wire [7:0] data,
    input wire [15:0] address,
    input wire write_strobe,
    output reg [((8 * BYTE_WIDTH) - 1): 0] double_buffer_out,
    output reg new_data_strobe
);
 
//** CONSTANT DECLARATIONS *************************************************************************
 
    /* General shortcuts */
        localparam T = 1'b1;
        localparam F = 1'b0;
 
//** Body *******************************************************************************************
 
wire [8*BYTE_WIDTH-1:0] concat_result;

reg delay_one_clk;
 
wire [7:0] buffer_data_out[BYTE_WIDTH-1:0]; // Array of 8-bit wires
 
// Generate N 8-bit registers

    generate
        genvar i;
        for (i = 0; i < BYTE_WIDTH; i=i+1) begin: gen_regs
 
        wire local_write_strobe = write_strobe && (address == BASE_ADDRESS + i); // Asserted only when the address matches
 
            local_8bit_reg register_inst (
                .clk(clk),
                .data(data),
                .write_strobe(local_write_strobe),
                .q(buffer_data_out[i])
            );
        end
    endgenerate
 
// Concatenate the outputs of the N 8-bit buffers dynamically

    generate
        genvar j;
        assign concat_result[7:0] = buffer_data_out[0];
        for (j = 1; j < BYTE_WIDTH; j=j+1) begin: gen_concat       // Note the use of the + operator with the loop starting at 1 not 0
            assign concat_result[j*8 +: 8] = buffer_data_out[j];  
        end
    endgenerate
 
 
    always @(posedge clk) begin           // Delay one clock for the double buffer operation to 
                                          // allow data to be clocked into the last of the 
                                          // individual registers.

        new_data_strobe <= F;             // Default 
        delay_one_clk <= F;   
 
        if (write_strobe && (address == (BASE_ADDRESS + BYTE_WIDTH - 1))) begin
            delay_one_clk <= T;
        end

        if (delay_one_clk)begin
            double_buffer_out <= concat_result;
            new_data_strobe <= T;
     end
    end
endmodule
 
module local_8bit_reg (
    input wire clk,
    input wire [7:0] data,
    input wire write_strobe,
    output reg [7:0] q
);
 
always @(posedge clk) begin
    if (write_strobe) begin
        q <= data;
    end
end
endmodule