ESP8266 Reboot Cycled Caused by Flash Memory Corruption — Fixed!

After I released the ESP8266-based OpenGarage, there has been a weird bug that kept bugging me: a small fraction of the controllers (let’s say 1 out of 10) would crash on restart, and that leads to an infinite reboot cycle. Once this happens, it seems there is no way to stop it until you reflash the controller with NodeMCU firmware, and then flash OpenGarage firmware back. It’s completely crazy because first, this issue is hard to reproducing — most controllers would work just fine, but once in a while, one would go nuts; second, upon crashing you can check the stack dump but the information is basically gibberish and does not help debugging at all; third, just reflashing the OpenGarage firmware itself is not sufficient — something seems to have been permanently altered in the flash memory such that you have to flash a completely different firmware (NodeMCU for example) to stop it; finally, even this is not a permanent fix — if you reboot the controller regularly for a few times, this issue will eventually happen again.

For the longest time I thought there is a bug in the OpenGarage firmware code that causes this to happen. But the above symptom defies any standard reasoning and explanation. This has kept bugging me until yesterday, I finally sat down and did a number of reproducible tests that helped me finally figure out the cause. It turns out this has to do partly with the ESP8266 hardware quality and partly software. The short story is that every time ESP8266 restarts, the firmware (more specifically, ESP8266 Arduino library) writes the current WiFi SSID and password to flash memory (along with other flash memory operations such as reading config files); over time this may cause a corruption at some point (possibly due to flash memory quality); once this happens, it will keep crashing on restart, and the only way to stop it is to erase the corrupted region (that’s what reflashing NodeMCU firmware accomplishes).

In fact, this issue has already been reported in the ESP8266 Arduino forum. I wish I had discovered this earlier. The solution is actually quite straightforward: simply add WiFi.persistent(false); right at the beginning of the code, that will stop the library from writing SSID and password to flash memory every time. This is unnecessary anyways since OpenGarage firmware already stores SSID and password in the config file.

This simple trick seems to have magically fixed the annoying bug. To be fair, I believe this bug only happens when you manage WiFi SSID and password separately in a config file stored in flash memory space — if the SSID and password are hardcoded into the firmware the issue wouldn’t happen. In any case, I wrote this blog post to document the issue and fix, and hope this may be useful for others who encounter similar issues with ESP8266.