AM3358-based board will not reboot; no issue with cold boot

Hi there!

We have custom hardware based on the OSD3358 C-SiP. I have updated to Linux 5.10.56-bone-rt-r48. We are using U-Boot v2021.07 with a config closely related to the am335x_evm_defconfig (slightly modified to suit our needs). Previously, using 2019.04 and various kernel versions (up to a slightly older version of 5.10.x) with the eewiki U-Boot patches, everything was going swimmingly. Sometime recently (I am not sure when, which is unfortunate because I have no clue what changed to cause this issue), the board became unable to reboot. Booting is fine, but when Linux requests a warm reset, U-Boot fails to boot with no indication of an error. It also sometimes prints out more or less on the console. The most I have seen it print is the U-Boot SPL version string (so 1 line). Sometimes it doesn’t even do that. In all cases, it will hang at this point.

The only patches to U-Boot are to change the partition used for the boot and to add our board’s data to the board file, much like the BBB and co. This patch has not changed since 2019.04, with the exception of not using the eewiki patch, the majority of which seems to have been mainlined since 2019.04. I am not sure if this is a Linux issue or a U-Boot issue, but given that it at least manages to make it to the SPL before hanging, I am reasonably convinced this is a U-Boot issue. Any assistance would be appreciated, as our system must be able to reboot. I can provide further information as necessary.

Thanks!

HI @will_eccles , this kinda smells like the RTC’s oscillator… (RTC is used for reset/poweroff on the am335x)…

Try switching it to internal 32KCLK…

Regards,

Hi Robert,

The code you linked is the result of your patches to U-Boot and does not exist in 2021.07 mainline U-Boot. However, as far as I can work out, I should be able to fix this by replacing (1 << 3) with (0 << 3) here:

I will try this and see what happens. However, previously, when we were using 2019.04 with the eewiki patches, I believe we would have been using the external clock, and rebooting seemed fine there. Any thoughts?

Here’s v2021.10-rc2, i spent too much time fighting v2021.01/v2021.07…

Regards,

I have just gone back through our version control and determined that on u-boot 2019.04 (which has the same patches and changes as the 2021.10-rc2 you just linked), our board would not have used the internal 32KCLK. This should match the behavior we see now, as the default is to use the external clock. Is there something Linux has to do when it’s shutting down that it might not be doing? Perhaps updating to 5.10.56-bone-rt-r48 from 5.10.35-bone-rt-r37 caused an issue? Is there something that must be configured in our Linux configuration?

I’ve always done it in u-boot… Looking at the rtc property, maybe we can do it in the kernel now:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/rtc/rtc-omap.txt?h=v5.14-rc7

- clocks: Any internal or external clocks feeding in to rtc
clocks = <&clk_32k_rtc>, <&clk_32768_ck>;

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am33xx-l4.dtsi

			rtc: rtc@0 {
				compatible = "ti,am3352-rtc", "ti,da830-rtc";
				reg = <0x0 0x1000>;
				interrupts = <75
					      76>;
			};

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am335x-bone-common.dtsi#n396

&rtc {
	clocks = <&clk_32768_ck>, <&clk_24mhz_clkctrl AM3_CLK_24MHZ_CLKDIV32K_CLKCTRL 0>;
	clock-names = "ext-clk", "int-clk";
};

Regards,

Switching to the internal clock did not make a difference. Here’s an example of what happens when the reboot fails. Notice that the UART output is corrupted (outlined in red)–this happens sometimes, but not others.

It appears that updating to 2021.10-rc2 would be quite a bit more effort than I anticipated. We have been trying to move away from external dependencies (e.g. your u-boot fork) by writing our own patches. For 2021.07, I have had to do very minimal patching to get the board to boot and run properly, but reboot seems to be busted. We may have to switch to your fork after all, but I’d like to see if reboot can be fixed in 2021.07 relatively simply rather than going through all the effort to switch. Any idea what might be wrong with rebooting? I am relatively unfamiliar with the process of booting/rebooting with the AM335x, but to the extent of my knowledge and debugging, I can’t seem to find anything which should hinder warm resets.

I have made an odd development. Previously, we were using the am335x-evm device tree for u-boot, then switching to the (modified) am335x-osd3358-sm-red device tree for Linux. This has not caused us issues in the past, but I decided to try throwing the am335x-boneblack device tree into the u-boot config to see if it made a difference (our hardware should be broadly compatible with either device tree, as u-boot doesn’t do anything special with peripherals in our use-case). I also enabled a little more error output (I thought I had it enabled before, but I did not). With this setup, I was able to get an interesting error message when it tried to reboot:
image
This is interesting for a couple of reasons:

  1. The last line appears to be the output of the halt() function in u-boot, which we should have seen before, even without changing the output settings (I think).
  2. This implies to me that the SPL is doing something wrong while trying to initialize the MMC, likely because it is not operating under the assumption that the MMC is already initialized (which I think it should be on a warm reset).

Ideally we would like to boot from MMC0, as we’re trying to boot from the SD card and not the eMMC. However, the SPL appears to be trying MMC1, which is odd. At a normal cold boot, it also says this, but then successfully boots from MMC0. Very strange indeed. I will look for changes related to the MMC initialization between 2019.04 and 2021.07, as this seems to be the issue (or at least part of it).

Also, we have CONFIG_SPL_DM_RESET disabled, and I’m wondering if enabling reset drivers in the SPL would help here. I am not entirely sure what this entails nor what it brings to the table in our specific case, and I suspect it’s of very little consequence.

I’ve tried using the beaglebone u-boot 2021.10-rc2 and that outright won’t boot (it gets to “starting kernel” and never succeeds). I think I might just revert back to 2019.04 and pretend this never happened.

Well, the issue turned out to be in the kernel. The bootloader is fine–with no changes, reverting to an older version of our kernel config fixed the issue. I just redid our config and it works fine now. Not sure what the difference is (there’s a huge diff between the two, and it’s a lot to narrow down), but whatever it is fixed the problem apparently.