Linux version: v6.0
Architecture: ARMv8
Foreword
During the 5.10 release cycle, KVM ARM had many code improvements in preparation for the google pkvm project. We are going to talk about one part of it here: the new page table walker.
Previously the logic for walking page tables were implemented wherever it was needed, e.g. functions create_hyp_{p4d, pud, pmd, pte}_mappings
. This approach causes the code for walking page tables to be duplicated. To improve this, a new modularized page table walker was introduced in 5.10. Other page table accesses benefit from using this same facility.
Important Structures
Some information need to be provided when using the new page table walker:
- The page table that is going to be accessed (
struct kvm_pgtable
) - Operations to be done to the page table, and when to do them (
struct kvm_pgtable_walker
) - The virtual address range to be accessed (
struct kvm_pgtable_walk_data
)
kvm_pgtable
This struct stores the metadata of a page table. The comments are self-explanatory.
1 | /** |
kvm_pgtable_walker
The user sets up this struct with the callback used during the walk, also when to invoke it.
1 | /** |
kvm_pgtable_walk_data
What range to walk, it also points to instances of the two previous structs.
1 | struct kvm_pgtable_walk_data { |
Implementation of the New Walker
Call kvm_pgtable_walk
to use the new page table walker:
kvm_pgtable_walk
1 | int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size, |
This function expects the caller to provide pgt
and walker
initialized, then it creates a kvm_pgtable_walk_data
together with addr
and size
. Then calls _kvm_pgtable_walk
.
_kvm_pgtable_walk
This function is responsible for the root pages of the page table, see the comments added for more info.
1 | static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data) |
__kvm_pgtable_walk
This function iterates over the entries in a page.
1 | static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data, |
__kvm_pgtable_visit
1 | static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data, |
Page tables are a n-ary tree data structures (512-ary for 4K pages), the page table walker uses recursion to access the tree with options for pre- and post-order traversal. level
stands for which level and __kvm_pgtable_visit
is responsible for:
calling the callback (
cb
) at the specified momentssteping
data→addr
to indicate progresscalling
__kvm_pgtable_walk
when the entry points to the next level table
And __kvm_pgtable_walk
iterates over the entries in a single page.
Usage: create_hyp_mappings
Linux runs in EL1 when initializing KVM, it uses create_hyp_mappings
to setup the EL2 page tables before entering EL2.
create_hyp_mappings
is called in init_hyp_mode
, which is a crucial function in KVM initialization. It creates mappings for these areas for EL2.
- EL2 code (
__hyp_text_start
~__hyp_text_end
) - EL2 read only data (
__hyp_rodata_start
~__hyp_rodata_end
) - EL1 read only data (
__start_rodata
~__end_rodata
) - EL2 BSS (
__hyp_bss_start
~__hyp_bss_end
) - EL1 BSS (
__hyp_bss_end
~__bss_stop
) - EL2 stack
- EL2 percpu area
This function only prepares the page tables, does not enter EL2 and start the address translation mechanism.
1 | /** |
The comments above the function are very helpful. from
and to
are the EL1 virtual address region to map to EL2, and prot
specifies the protection. The interesting part is that EL2 virtual address range is not needed, KVM itself has a mechanism for translating EL1 VAs to EL2 VAs, specifically it uses kern_hyp_va
for the translation.
__create_hyp_mappings
takes the lock kvm_hyp_pgd_mutex
then call kvm_pgtable_hyp_map
. Note that hyp_pgtable
is the kvm_pgtable
needed for the walk.
1 | int __create_hyp_mappings(unsigned long start, unsigned long size, |
kvm_pgtable_hyp_map
Now this function calls the new page table walker kvm_pgtable_walk
. It creates and passes the argument kvm_pgtable_walker
(1), alongside hyp_pgtable
(named pgt
here) to kvm_pgtable_walk
(2).
1 | int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys, |
As the cb
and set up to only run when visiting leaf nodes (flags: KVM_PGTABLE_WALK_LEAF
), hyp_map_walker
allocates pages for each level of the page table and installs the addresses in the corresponding entries.