Sometimes gdb and kgdb is not working for my Wandboard Dual RevD1

Hi,
I am using Wandboard Dual Rev D1 board. About the Linux software I am using Kernel version 4.14.232 and Ubuntu 20.04 supplied from digi key source. I have cross compiled gdb using the toolchain provided by digi key source.
I have enabled all the config options required for KGDB support in the kernel and flashed to the board.
While using gdb and kgdb to debug kernel some times in the host system it is getting hanged in the gdb command as follows:

$ sudo ./arm-linux-gnueabi-gdb /home/ayyappan/Wandboard/armv7-multiplatform/KERNEL/vmlinux
[sudo] password for ayyappan: 
GNU gdb (GDB) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=arm-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/ayyappan/Wandboard/armv7-multiplatform/KERNEL/vmlinux...
(gdb) set serial baud 115200
(gdb) set debug remote 1
(gdb) target remote /dev/ttyUSB0

But some times I am able to get through as follows:

> $ sudo ./arm-linux-gnueabi-gdb /home/ayyappan/Wandboard/armv7-multiplatform/KERNEL/vmlinux
> GNU gdb (GDB) 9.1
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "--host=x86_64-linux-gnu --target=arm-linux-gnueabi".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /home/ayyappan/Wandboard/armv7-multiplatform/KERNEL/vmlinux...
> (gdb) set serial baud 115200
> (gdb) target remote /dev/ttyUSB0
> Remote debugging using /dev/ttyUSB0
> Ignoring packet error, continuing...
> warning: unrecognized item "timeout" in "qSupported" response
> Ignoring packet error, continuing...
> Remote replied unexpectedly to 'vMustReplyEmpty': timeout
> (gdb) b panic
> Breakpoint 1 at 0xc003c654: file kernel/panic.c, line 134.
> (gdb) b sys_sync
> Breakpoint 2 at 0xc0196818: file fs/sync.c, line 110.
> (gdb) i b
> Num     Type           Disp Enb Address    What
> 1       breakpoint     keep y   0xc003c654 in panic at kernel/panic.c:134
> 2       breakpoint     keep y   0xc0196818 in sys_sync at fs/sync.c:110
> (gdb) q
> ayyappan@ayyappan-Inspiron-5559:~/Wandboard/temp/trial/arm-gdb/bin$

In both the cases, before that in target through minicom I am giving these commands
and getting the response in the target as follows:

> Ubuntu 20.04.2 LTS arm ttymxc0
> 
> default username:password is [ubuntu:temppwd]
> 
> ubuntugin: 
> Password: 
> 
> The programs included with the Ubuntu system are free software;
> the exact distribution terms for each program are described in the
> individual files in /usr/share/doc/*/copyright.
> 
> Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
> applicable law.
> 
> To run a command as administrator (user "root"), use "sudo <command>".
> See "man sudo_root" for details.
> 
> ubuntu@arm:~$ echo ttymxc0 > /sys/module/kgdboc/parameters/kgdboc
> ubuntu@arm:~$ sudo -i
> [sudo] password for ubuntu: 
> root@arm:~# echo "1" > /proc/sys/kernel/sysrq
> root@arm:~# echo g > /proc/sysrq-trigger
> [  125.338165] sysrq: DEBUG
> [  125.340723] KGDB: Entering KGDB
> [  125.343897] Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP THUMB2

May I know what is the reason for this unstable behaviour in gdb executed in the host.
Note: I have not observed hitting the breakpoint as I dont know as of now how to trigger a breakpoint.

Please help me to fix this unstable behaviour.

Thanks.

I’m actually surprised that even worked! i had disabled KGDB a while back in my builds…

[  125.343897] Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP THUMB2

Do we know the instruction that cause this?

Regards,

Hi,
While browsing through the source code, I observe these:

/*
 * GDB assumes that we're a user process being debugged, so
 * it will send us an SWI command to write into memory as the
 * debug trap. When an SWI occurs, the next instruction addr is
 * placed into R14_svc before jumping to the vector trap.
 * This doesn't work for kernel debugging as we are already in SVC
 * we would loose the kernel's LR, which is a bad thing. This
 * is  bad thing.
 *
 * **By doing this as an undefined instruction trap, we force a mode**
 * **switch from SVC to UND mode, allowing us to save full kernel state**.
 *
 * We also define a KGDB_COMPILED_BREAK which can be used to compile
 * in breakpoints. This is important for things like sysrq-G and for
 * the initial breakpoint from trap_init().
 *
 * Note to ARM HW designers: Add real trap support like SH && PPC to
 * make our lives much much simpler. :)
 */
#define BREAK_INSTR_SIZE	4
#define GDB_BREAKINST		0xef9f0001
#define KGDB_BREAKINST		0xe7ffdefe
#define KGDB_COMPILED_BREAK	0xe7ffdeff
#define CACHE_FLUSH_IS_SAFE	1

#ifndef	__ASSEMBLY__

**static inline void arch_kgdb_breakpoint(void)**
{
	asm(__inst_arm(0xe7ffdeff));
}

in architecture specific include folder in kgdb.h file

**and in debug_core.c file** 

#ifdef CONFIG_MAGIC_SYSRQ
static void sysrq_handle_dbg(int key)
{
	if (!dbg_io_ops) {
		pr_crit("ERROR: No KGDB I/O module available\n");
		return;
	}
	if (!kgdb_connected) {
#ifdef CONFIG_KGDB_KDB
		if (!dbg_kdb_mode)
			pr_crit("KGDB or $3#33 for KDB\n");
#else
		pr_crit("Entering KGDB\n");
#endif
	}

	kgdb_breakpoint();
}

noinline void kgdb_breakpoint(void)
{
	atomic_inc(&kgdb_setting_breakpoint);
	wmb(); /* Sync point before breakpoint */
	**arch_kgdb_breakpoint();**
	wmb(); /* Sync point after breakpoint */
	atomic_dec(&kgdb_setting_breakpoint);
}
EXPORT_SYMBOL_GPL(kgdb_breakpoint);

And I guess from the first bold styled comment I can ignore the OOPS as it was switching to UND mode,so I guess it is common to generate undef instruction to trigger a trap. May be I am wrong,because I have not triggered a scenario to hit the breakpoints I have set and tested that yet.

Please correct me if I am missing anything and is this OOPs error we have to fix to work normally?

Hi,

I have disabled the CONFIG_THUMB2_KERNEL option in kernel and I am able to eliminate the following error message.

[  125.343897] Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP THUMB2

But still sometimes I am getting that unstable behavior but when working I am able to set and trigger breakpoints.

When I added fix in the following link I am getting compilation error.

The reason of the error is

int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
{
	int err;

	/* patch_text() only supports int-sized breakpoints */
	BUILD_BUG_ON(sizeof(int) != BREAK_INSTR_SIZE);

in arch/arm/kernel/kgdb.c

I did not dig deep for the reason for the above BUILD_BUG_ON usage at this place and avoided changing those lines to be safe.

Please help me to avoid the unstable behavior.

Thanks.

Is there a specific reason you are using a v4.14.x based kernel? LTS v5.10.x is out, and v5.14.x is considered stable. There is a good chance your kgdb issues may have been fixed.

Regards,

Sure,
I will give a try. No special reason to use 4.14.x. In a thought that latest versions have new features implemented and the bugs might not be exposed. If 5.14.x is considered stable I can try that.

Thanks.