How to Program a Blocking Delay Function in the 8051 Microcontroller

Readers may be familiar with the delay( ) function featured in the Arduino IDE. It’s a simple function that provides a blocking delay applicable to all member in the Arduino line of microcontrollers. As you transition to bare-metal microcontroller programming, you may find yourself looking for a similar code. Unfortunately, you are unlikely to find such a delay function as part of the 8051’s “standard library.”

In this article we will briefly explore hardware delay methods and then present an 8051 Busy Bee solution using a rigidly defined set of assumptions.

Justification for the apparently missing 89051 delay functions

Perhaps the greatest reason for this apparent delay( ) omission is flexibility. Understand that the Arduino code is in control of a large section of the underlying microcontroller. Clock speed and dedicated timers are all predefined and hidden in the background. This hidden consistency allows a simple programming experience where the delay( ) function performs equal across all platforms. To do otherwise would require the Arduino programmer to know a great deal about the architecture and operation of peripherals. Such an action goes against the Arduino purpose of providing accessibility to beginners and hobbyists.

Things are very different in the typical 8051 environment. Nothing is hidden and you are generally expected to configure all the necessary peripherals. You are also in charge of the clock and free to choose from high-speed external, internal, 32.768 kHz or even a deep sleep with only the watchdog timer to periodically wake up the microcontroller. Every one of these options would corrupt or even stop a function such as delay( ) from operating as expected.

Locate a copy of the Busy Bee reference manual and let’s get started.

Tech Tip: The term blocking implies that the main code of the microcontroller is blocked (does nothing) for the entire duration of the delay. This is generally acceptable for small delays and simple problems but can lead unacceptable operation. For example, a microcontroller will be unresponsive to a button press while a blocking delay is in progress. Alternative to this problem including interrupts and non-blocking delays.

There are many ways to construct a delay reotine

Let’s start with a recognition that there are many ways to construct a delay function in a microcontroller. A short list may include:

  • carefully constructed assembler code featuring a NOP (do nothing instruction). Here the programmer would calculate the microcontroller’s clock cycles based on the characteristics of each assembled command.

  • hardware timer with a vectored interrupt. The operation associated with the delay could be embedded into the Interrupt Service Routine (ISR) or the interrupt could maintains a system time similar to the Arduino millis( ) function.

  • free running hardware timer without interrupt. Here the hardware timer is constantly running. This is similar to watching a wall clock with a modulo 60 operation. Suppose the current time is 50 seconds, and you want a 20 second delay. You would then wait until the second hand reached 10 seconds. Such a solution in the 8051 would be modulo 256 or 25536 depending on which timer type was selected. The “speed” of the operation is dependent on the system clock and timer prescale configuration.

  • controlled hardware timer without using interrupts. With this method the user program will stop the timer, preload the timer to a known value, enable the timer, and the wait for the overflow to occur.

There are advantages and disadvantages to each option. The choice is largely dependent on the needs of the individual project, skill of the programmer, future maintainability, and available hardware resources. For example, the hardware timer with a vectored interrupt is a very good solution if the project requires a consistent heartbeat. This precision periodic timing would be good for a Proportional Integral Derivative Controller or waveform generation using a DAC.

Focus on the blocking delay

In this post we will present a solution using a hardware timer without interrupt. Recall that a microcontroller’s hardware timer typically counts up just like a clock’s second hand. The Silicon Lab’s EFM8BB1 8051 used in this code’s development features a collection of timers including four 16-bit general purpose hardware timers with backwards compatibility to the standard 8051.

The 16-bit timer has a modulo 2^{16} operation. Like a clock that counts from 0 to 60, the 16-bit timer will count from 0 to 65535 before rolling back to 0. This overflow event is special as the hardware will automatically set an interrupt flag. This operation is suggested in the highly simplified Figure 1 block diagram.

Figure 1: Simplified block diagram of the Busy Bee timer #2.

Tech Tip: A peripheral’s interrupt flag does not automatically initiate an interrupt. It will only do so if the associated interrupt enable bit is set in the interrupt or extended interrupt enable resister.

This interrupt flag is key to the blocking delay described in the code appended to this article. The operation of the delay as described as:

  1. Configure the timer with appropriate presale and mode as 1:1 and 16 bit auto reload respectively. Also alias the run control and interrupt flag using the sbit operator. These operations are performed once before the while(1) superloop.

  2. Turn off the timer and clear the interrupt flag.

  3. Load the timer with the desired delay value.

  4. Enable the timer’s run bit.

  5. Spin in a while loop (do nothing) until the interrupt flag. This operation blocks all other main( ) code until the flag is set.

Tech Tip: The original 8051 is a unique architecture with the ability to operate on registers at the bit level. This results in fast code without the need to use masks to select a specific bit within a register. This is accomplished using Keil C51 sbit assembler statement.

This backwards capability is retained in derivatives such as the EFM8BB1. However, the newer products contain significantly more peripherals and associated SFRs than the original 8051. Unfortunately, not all registers are bit addressable. Only registers that end in 0x0 or 0x8 can use this convenient and fast sbit operation. Carefully review the reference manual reveals that the TMR2CN0 with an address of 0xC8 is bit-addressable.

Calling and math overhead associated with the blocking delay

Before concluding this note, we need to consider the overhead associated with calling the function and calculating the reload value. For example, this little piece of code poses a significant problem:

    tmr_load = -((n_us * 49) >> 1);

It attempts to be clever and account for the 24.5 clock ticks per microsecond. Instead of using type float to account for the half bit, it multiplies by 49 and then divides by 2 using shift right operations; one for the high and then one for the low with carry in.

The last step is to subtract the reload value from 2^{16}. The shorthand is to negate the result. Stated another way, in the module 65536 environment, 65536 – x provides the same answer as 0 – x because 65536 = 0 (module 65536). This is the same as a clock where 60 = 0 (modulo 60).

On the next line of code, we add 100 to the tmr_load variable. This is a crude method used to account for the function calling overhead as well as the time it takes to compute the tmr_load. Recall that the Busy Bee does not have a hardware multiplier. Consequently, it takes time to perform the 8-bit by 16-bit multiplication via the 8-bit ALU.

One casualty to this function’s overhead is the ability to accept small delays. With an overhead of approximately 4 us, it is impossible to delay for anything smaller. Should these small delays be required the aforementioned hand-coded assembler with the NOP operation should be used.

The real-world results are shown in Figure 2 where we see a pin high for 5 us, low for 5 us, and then high again. Similar results were obtained for the associated blk_ms_delay where a call of blk_ms_delay(2000) was within 0.01 ms as measured by the Digilent Analog Discovery. Do you agree that the blocking function provides reasonable performance?

Figure 2: Oscilloscope measurement for real-world signal with programmed 5 us on and 5 us off.


As previous mentioned, the attached code is highly dependent on the Busy Bee’s clock, prescale, and timer configuration. You will need to modify the code to account for any conditions that deviate from the assumption made in this code’s development.

Perhaps you will modify the code to use a slower oscillator to save energy. There are also tricks to place the microcontroller in a deep sleep. Then again, you may want to wake up the microcontroller with the highest speed possible to quickly perform an operation before returning to sleep.

That flexibility is what makes microcontroller bare-metal programming more difficult than the high-level programming you may have used in the past. It’s also the key to unlocking the performance.

Please leave any comments or suggestions below.

Best Wishes,


 * This code was developed on the EFM8BB1 featuring a 24.5 MHz internal clock.
 * Note that the TMR2CN0 SFR is located at 0xC8. Since it ends in 0x08, it is one of the
 * bit-addressable memory locations. This is convenient as it allows the Keil C51
 * compiler to use the sbit construct.
 * Be sure to include these statements:
 *     sbit TR2 = TMR2CN0^2;              // Timer 2 run control
 *     sbit TF2 = TMR2CN0^7;              // TMR2 16-bit overflow on the 0xFFFF to 0x0000 transition.
 * Also, don't forget to configure timer 2:
 *     TMR2CN0 = 0x00;                    // default for timer 2: clear overload flag, 16-bit, timer off
 *     CKCON0 |= CKCON0_T2ML__SYSCLK;     // use system clock

 * @brief Microsecond blocking delay based on T2
 * Given a 24.5 MHz system clock, this function provides a delay between 5 and 1000 us.
 * The actual max value without an overflow error is 1337. However, it's easier and
 * less error prone to remember 5 to 1000.
 * Note that this function cannot be used for single us delays as it takes longer than that to
 * perform the function calls and math to calculate the appropriate timer delay.
 * An empirical correction is added to account for the calling overhead and delay calculations.
 * This function assumes a 24.5 MHz system clock. It also assumes a 1:1 pre-scale or Timer 2.
 * For fast computation:
 *     1) multiply n_us by 49
 *     2) divide by 2 using a shift right
 * @param n_us Identifies the number of microseconds to delay.
 * @warning There are no guard rails for overflow.
 * @warning Delays less than 5 us will be extended to 5 us.
 * @warning For improved performance replace the reload calculation with a lookup table.

void blk_us_delay (uint16_t n_us){

    uint16_t  tmr_load;

    if (n_us < 5){                // Extend small delays to 5 us
        n_us = 5;

    tmr_load = -((n_us * 49) >> 1);

    tmr_load += 100;              // Estimate accounting or the calling overhead
                                  // and the machine cycles to perform the previous
                                  // 8 by 16-bit multiplication.

    TR2 = 0;                      // Stop timer
    TF2 = 0;                      // Clear timer overflow flag

    TMR2L = tmr_load;
    TMR2H = tmr_load >> 8;

    TR2 = 1;                      // Start timer
    while (!TF2);                 // Wait for the overflow
    TR2 = 0;                      // Stop timer to save power


 * @brief Millisecond blocking delay based on blk_us_delay which in turn is based on T2.
 * Provides a delay between 1 and 65,534 ms.
 * @param n_ms Identifies the number of milliseconds to delay.
 * @warning For improved accuracy be sure to use a hardware timer with a large pre-scale value.
void blk_ms_delay (uint16_t n_ms){

  uint16_t i;

  for(i = 0; i < n_ms; i++){
      blk_us_delay (1000);