疑惑点

我们知道 CPU 指令执行速度非常快，但是访问存储非常慢。CPU 访问寄存器的速度最快，访问磁盘的速度最慢。同时寄存器的价格也最贵，磁盘的价格最便宜。为了在速度和成本之间做平衡取舍，计算机体系采用了多级缓存存储架构，从最快到最慢依次为 “寄存器 - 一级缓存 - 二级缓存 - 三级缓存 - 内存RAM - 磁盘”。那 CPU 工作过程中要读取数据，是怎么决定把哪些东西放进寄存器，哪些东西放进缓存的呢？是不是有一条既定的线路，比如数据都会从内存（或磁盘）到逐级缓存，再到寄存器？

答案

CPU 在工作过程中如果要读取某个内存数据，MMU 组件会判断这个内存地址的数据是否在缓存中，如果不在，就将所需的数据加载到逐级缓存。 (题外话： MMU 组件的作用是将虚拟地址转换为物理地址，它的工作方式也用到了名称为 TLB 的缓存组件。）
如果还涉及到需要算术逻辑单元参与（ALU），需要将数据加载到寄存器，具体使用哪个寄存器由编译器决定。

StackOverFlow 参考

来自 StackOverFlow - how-does-the-cpu-decide-which-data-it-puts-in-what-memory-ram-cache-registers
以下是引用

The answer to this question is an entire course in itself! A very brief summary of what (usually) happens is that:

You, the programmer, specify what goes in RAM. Well, the compiler does it on your behalf, but you're in control of this by how you declare your variables.

Whenever your code accesses a variable the CPU's MMU will check if the value is in the cache and if it is not, then it will fetch the 'line' that contains the variable from RAM into the cache. Some CPU instruction sets may allow you to prevent it from doing so (causing a stall) for specific low-frequecy operations, but it requires very low-level code to do so. When you update a value, the MMU will perform a 'cache flush' operation, committing the cached memory to RAM. Again, you can affect how and when this happens by low-level code. It will also depend on the MMU configuration such as whether the cache is write-through, etc.

If you are going to do any kind of operation on the value that will require it being used by an ALU (arithmetic Logic Unit) or similar, then it will be loaded into an appropriate register from the cache. Which register will depend on the instruction the compiler generated.

Some CPUs support Dynamic Memory Access (DMA), which provides a shortcut for operations that do not really require the CPU to be involved. These include memory-to-memory copies and the transfer of data between memory and memory-mapped peripheral control blocks (such as UARTs and other I/O blocks). These will cause data to be moved, read or written in RAM without actually affecting the CPU core at all.

At a higher level, some operating systems that support multiple processes will save the RAM allocated to the current process to the hard disk when the process is swapped out, and load it back in again from the disk when the process runs again. (This is why you may find 'Page Files' on your C: drive and the options to limit their size.) This allows all of the running processes to utilise most of the available RAM, even though they can't actually share it all simultaneously. Paging is yet another subject worthy of a course on its own. (Thanks to Leeor for mentioning this.)