ESP32-S3 Experiencing Rare Freezes During Long-Term WiFi and MQTT Operation

Hello everyone,

I’m currently developing an ESP32-S3 based monitoring and control system that collects sensor data, displays local status information, and periodically transmits measurements to a remote server using MQTT.

The system performs well during normal testing, but during long-duration operation I’m occasionally seeing rare freezes that are difficult to reproduce consistently. Sometimes the device runs continuously for several days, while at other times it may stop responding after several hours of heavy network activity.

Current setup:

  • ESP32-S3

  • WiFi connectivity

  • MQTT communication

  • Multiple I2C sensors

  • OLED display

  • FreeRTOS tasks

  • External 12V power supply with local regulation

Observed behavior:

  • Device becomes unresponsive

  • MQTT communication stops

  • Web interface no longer responds

  • No obvious reset or watchdog event

I’ve reviewed memory usage, task priorities, and network handling, but I’m still trying to determine whether the issue is software-related, EMI-related, or caused by some hardware design limitation.

Questions:

  1. Have you encountered similar long-term stability issues with ESP32-S3 devices?

  2. What debugging techniques have been most useful for identifying rare freezes?

  3. Are there common FreeRTOS or WiFi pitfalls that only appear after extended runtime?

  4. Have you found hardware issues to be responsible for problems that initially appeared to be software-related?

PCB Design Question:

The next revision will move away from development boards and onto a dedicated PCB integrating the ESP32-S3, power management circuitry, and sensor interfaces. The board will likely be manufactured through PCBWay.

For those who have taken ESP32 designs into production:

  • What PCB layout practices have had the biggest impact on system stability?

  • How important is RF layout compared to power supply design?

  • Have you found additional ESD or surge protection worthwhile?

  • Which fabrication or testing options have provided the most value in terms of reliability?

I’d appreciate hearing from anyone who has deployed ESP32 systems in real-world environments and encountered similar long-term stability challenges.

Hello @jovelyngentallan02,

You’re asking a bit too much for a single TechForum post. The topics you’re raising have been covered by detailed articles, reference designs, and even entire books. I’d suggest seeking those out. That said, I’ll offer the following:

Have you encountered similar long-term stability issues with ESP32-S3 devices?

Yes. All embedded devices, including ESP32-based ones, are subject to long-term stability issues.

What debugging techniques have been most useful for identifying rare freezes?

You need to isolate the issue. Start by determining whether it’s software, EMI, or hardware:

  • Hardware: Set up a duplicate and see if the problem reproduces.
  • EMI: Move the device to a quieter environment and see if the issues go away.
  • Software: Run an extended debugging session and try to catch the fault. Is it a hard fault, a race condition, a heap exhaustion, etc.?

Are there common FreeRTOS or WiFi pitfalls that only appear after extended runtime?

If there are, they’ll have been discussed in the specialized forums and support channels for your specific components. I’d suggest seeking those out and checking there.

Have you found hardware issues to be responsible for problems that initially appeared to be software-related?

Yes, and vice versa. If you have a specific question, feel free to ask.

What PCB layout practices have had the biggest impact on system stability?

Nearly every aspect of PCB layout has an impact on system stability.

How important is RF layout compared to power supply design?

They are both very important.

Have you found additional ESD or surge protection worthwhile?

This highly dependent on your system and environment. I can’t imagine it would hurt…

Which fabrication or testing options have provided the most value in terms of reliability?

Not sure what you’re asking here. Please be more specific.

There is another option if you are running into design problems like this. If you aren’t sure how to isolate the issue, you could try looking into our Design and Integration services. There are a lot of companies with years of experience that may also be able to help out by means of consultation or source where one could get something designed for your application.

What version of ESP-IDF are you using? GitHub - espressif/esp-idf: Espressif IoT Development Framework. Official development framework for Espressif SoCs. · GitHub

This framework is under continued development, post issues there..

Regards,