I have always wondered how the linux kernel translates addresses to symbol names. This mechanism (called kallsyms
) is used in multiple places in the kernel, for example in the panic()
call the kernel prints out the call trace with function names and offsets with no problem at all, also the ftrace
system is able to show the user all function names that the kernel has run, similarly, using kallsyms
.
This article written in Chinese explains the translation mechanism pretty well, and is worth a read if one is interested in knowing how it is done. However when discussing this topic with a professor, he asked me how I can find the raw bytes of the symbols used by kallsyms
which are explained in the linked article above e.g. kallsyms_addresses
and kallsyms_num_syms
, etc. So, I set to find out where they are.
It is unlikely for me and you to use the exact same kernel image, so the offsets and symbols won’t be the same. Moreover, I’m using an ARMv8 image of linux version 5.4.
From the article linked above we know that the kallsyms
symbols are in the .rodata
section, lets see where it is:
1 | $ aarch64-linux-gnu-readelf -l vmlinux |
Nice, .rodata
is located at the very beginning of the second segment, very convenient. The space between the first segment and the second is obtained by subtracting the offsets of the two segments, 0x910000 - 0x10000 = 0x900000
. The first segment is placed at the very beginning of the kernel image, therefore starting from file offset 0x900000
of the kernel image should be the place where .rodata
belongs. Now we just have to see where all those symbols are relative to the start of the .rodata
section.
1 | $ aarch64-linux-gnu-readelf -s vmlinux | grep kallsyms |
Note that one of, if not the most important symbol kallsyms_addresses
is not present. This threw me off quite a bit, but eventually I found that in the file which is responsible for generating the symbol information /scripts/kallsyms.c
it is explained that there are two ways of storing the addresses of kernel symbols, the first way uses kallsyms_addresses
and it simply stores all the symbol addresses in the kallsyms_addresses
“array”. The second way is to save offsets relative to a base address, which is the address of the symbol with the lowest address. The two ways in summary:
kallsyms_addresses
stores all the addresskallsyms_relative_base
stores the base address andkallsyms_offsets
store all the offsets
Let’s first look for kallsyms_relative_base
, its address is 0xffff800010a73318
, and the address of .rodata
is 0xffff800010980000
, this gives us the offset 0xf3318
, adding back the offset of .rodata
in the image (0x900000
) we get the final file offset of the symbol 0x9f3318
.
Check this in hexdump:
1 | $ hexdump Image |
Confirm that 0xffff800010080000
is the first symbol address.
We can calculate the file offset of the symbol offsets kallsyms_offsets
the same way, the result is 0x9a9558
1 | 09a9550 656c 0000 0000 0000 0000 0000 0000 0000 |
It is an array with increasing elements of four bytes each, starting from 0x00000000
, my guess for the reason why there are two symbols of offset 0x00000000
is that there are two symbols both refering to the very start of the kernel image.
Lastly let’s see how many kernel symbols are there (kallsyms_num_syms
), 0xffff800010a73320 - 0xffff800010980000 + 0x900000 = 0x9f3320
. It’s just the next 8 bytes of kallsyms_relative_base
.
1 | 09f3320 276f 0001 0000 0000 fe04 6568 05be 5f54 |
so the value is 0x1276f
, 75631 in decimal.