美文网首页
计算机组成与设计RISC-V版导读--Chapter2

计算机组成与设计RISC-V版导读--Chapter2

作者: sarai_c7eb | 来源:发表于2020-02-19 19:47 被阅读0次

    chapter2 Instructions: Language of the Computer

    秉承着计算机专家的一贯思路,首先介绍一下子指令;顺序一般是从简单到复杂,从普通到特殊;

    主要有如下章节:
    2.1 Introduction
    2.2 Operations of the Computer Hardware
    2.3 Operands of the Computer Hardware
    2.4 Signed ad Unsigned Numbers
    2.5 Representing Instructions in the Computer
    2.6 Logical Operations
    2.7 Instructions for Making Decisions
    2.8 Supporting Procedures in Computer Hardware
    2.9 Communicating with People
    2.10 RISC-V Addressing for Wide Immediates and Addresses
    2.11 Parallelism and Instructions: Synchronization
    2.12 Translating and Starting a Program
    2.13 A C Sort Example to Put it All Together
    2.14 Arrays versus Pointers
    2.15 Advanced Material: Compiling C and Interpreting Java
    2.16 Real Stuff: MIPS Instructions
    2.17 Real Stuff:x86 Instructions
    2.18 Real Stuff: The Rest of the RISC-V Instruction Set
    2.19 Fallacies and Pitfalls
    2.20 Concluding Remarks
    2.21 Historical Perspective and Further Reading

    下面分章节介绍如下:

    2.1 Introduction

    两个概念

    • instruction:计算机的词汇
    • instruction set:计算机词汇表

    本章以top-down方式讲解instructions,从一种解释语言到最终计算机能理解的语言;

    2.2 Operations of the Computer Hardware

    首先举一个RISC-V的汇编例子:

    add a,b,c
    

    此指令意为将变量b,c相加后存入变量a中;

    从而引入一个重要原则:每RISC-V指令只执行一个操作(加,减等),每个指令最多3个变量;
    This notation is rigid in that each RISC-V arithmetic instruction performs only one operation and must always have exactly three variables;

    e.g. a=b+c+d+e;

    add a,b,c // The sum of b and c is placed in a
    add a,a,d // The sum of b and c and d is now in a
    add a,a,e // The sum of b,c,d and e is now in a
    

    双斜杠之后为注释部分;

    本节列出两个表格,一个为operands,主要我寄存器和memory的表示;
    第二个表为汇编指令表;

    最后引入一个重要原则:
    Design Principle 1: Simplicity favors regularity

    2.3 Operands of the Computer Hardware

    the operands of arithmetic instructions are restricted: they must be from a limited number of special locations built directly in hardware called registers.
    算术运算指令的operands只能来自寄存器;
    RISC-V的寄存器大小为64bits,在本书中64bits称为双字,一个字为32bits;
    registers与普通编程语言的变量相比的一个重要区别就是:它只有有限个数,32个;

    从而引入第二个重要原则:
    Design Principle 2: Smaller is faster

    原因为太多引起硬件寻址等时间延长,从而降低性能;

    registers的表示方法为"X"+数字;还有其他一些特殊寄存器后面会接着介绍;
    下面举上节例子,将变量指定register,然后替换上节的变量表示;

    Memory Operands
    首先,register很有限,如果程序需要大的数组或structure,则需要存储到memory中:
    Memory的特点就是大,访问速度慢;
    而arithmetic操作只能操作register,如果对memory操作,而需要将memory数据先读到寄存器,对寄存器操作,然后将寄存器写入memory;
    从memory到register操作为load;从register到memory操作为store;

    如g=h+A[8]
    A为100个doublewords的array;编译器将g,h编译为x20和x21, A的base address存储到x20

    ld x9,8(x22)   //Temporary reg x9 gets A[8]
    add x20,x21,x9 //g=h+A[8]
    

    8(x22)为memory表示法,8为offset,x22存储base address;

    Since 8-bit bytes are useful in many programs, virtually all architctures today address individual bytes. 今天所有的architecture还是使用byte寻址,RISC-V也不例外;
    所以register和memory的地址都表示为0x8,0x10,0x18,0x20......

    每64bits中各个byte是如何排序的,一般有两种方式:

    • big-endian: 从右到左为0,1,2......
    • small-endian:从左到右为0,1,2......
      RISC-V采用small_endian格式;在访问byte或者word级别数据时候要特别注意大小端问题;

    如A[12]=h+A[8]

    ld x9,64(x22)   //Temporary reg x9 gets A[8]
    add x9,x21,x9   //Temporary reg xg gets h+A[8]
    sd x9,96(x22)   //Stores h+A[8] back into A[12]
    

    编译器会将频繁访问的变量存入register,把剩下的存入memory;
    the process of putting less frequently used variables into memory is called spilling registers

    Constant or Immediate Operands

    RISC-V将近一半的指令需要操作立即数或者说常量;
    如将常量4与x22寄存器相加:
    程序编译后将常量4写入memory。

    ld x9,AddConstant4(x3) //x9=constant 4
    add x22,x22,x9         //x22=x22+x9(where x9=4)
    

    另一种方法为立即数指令:

    addi x22,x22,4 //x22=x22+4
    

    注意常量/立即数为0也很常用,比如需要负数时候可以用0减去一个数值;而RISC-V中将x0寄存器固定接为0;

    此方法更为高效和常用

    2.4 Signed ad Unsigned Numbers

    首先介绍计算机的进制:

    • 二进制;
    • 二进制和十进制的转换;
    • 二进制溢出的概念;

    如何表示负数的一些思路,后来引入补码的概念;
    如何从补码计算出一个正、负数的值;
    如补码为1111的十进制计算
    1\times(-2)^3+1\times 2^2+1\times 2^1+1\times 2^0=-1
    补码运算;
    lbu: load byte unsigned
    lb: load byte

    补码的缺点

    1. 取补码相当于取反加1,比较复杂;
    2. bit扩展麻烦,如从16bits扩展到64bit,则需要将最高位复制48份,然后copy其他16bit;

    2.5 Representing Instructions in the Computer

    RISC-V指令Fields介绍
    将汇编语言一步步转为二进制指令

    2-1.PNG

    2.6 Logical Operations

    介绍左移,右移,AND,OR,XOR NOT逻辑操作;

    2.7 Instructions for Making Decisions

    主要讲解下面两个指令:

    beq rs1,rs2,L1
    
    • branch if equal
    bne rs1,rs2,L1
    
    • branch if not equal

    Compiling if-then-else into conditional Branches
    f,g,h,i,j对应x19-x23

    f g h i j
    x19 x20 x21 x22 x23

    如下C语言如何编译为RISC-V格式

    if(i==j) f=g+h; else f=g-h
    
    bne x22,x23,Else
    add x19,x20,x21
    ELSE:sub x19,x20,x21
    Exit:
    

    Loops

    C语言loops举例

    while(save[i]==k)
      i+=1;
    
    • i :x22
    • k:x24
    • base of array save: x25
    Loop: slli x10,x22,3 //Temp reg x10=i*8
    add x10,x10,x25     //x10=address of save[i]
    ld x9,0(x10)        //Temp reg x9=save[i]
    bne x9,x24,Exit     //go to Exit if save[i]!=k
    addi x22,x22,1      //i=i+1
    beq x0,x0,Loop      //to to Loop
    Exit:
    

    介绍一个流行词:basic block;
    a basic block is a sequence of instructions without branches, except possibly at the end, and without branch trgets or branch labels, eccept possibly at the beginning.
    one of the first early phases of compilation is breaking the program into basic blocks;

    blt和bge将寄存器的值当做补码形式比较;
    bltu和bgeu将寄存器的值当做unsigned比较;

    Bounds Check Shortcut
    如何用unsigned方法比较signed寄存器;

    branch to IndexOutOfBounds if x20>=x11 or if x20 if negative;
    bgeu x20,x11,IndexOutOfBounds
    

    Case/Switch Statement
    branch address table /branch table

    2.8 Supporting Procedures in Computer Hardware

    A procedure or function is one tool programmers use to structure programs, both to make them easier to understand and to allow code to be reused.

    procedure和其他program部分的接口是parameter
    In the execution of a procedure, the program must follow 6 steps:

    1. Put parameters in a place where the prcedure can access them.
    2. Transfer control to the procedure.
    3. Acquire the storage resource needed for the procedure.
    4. Perform the desired task.
    5. Put the result value in a place whre the calling program can access it.
    6. Return control to the point of origin, since a procedure can be called from several points in a program.

    由于寄存器访问快,程序需要尽可能利用寄存器;
    RISC-V software follows the following convention for procedure calling in allocating its 32 regs;

    • x10-x17: 8 parameter registers in which to pass parameters or return values.
    • x1: one return address register to return to the point of origin;

    为procedure而设计的一个指令:

    jal x1,ProcedureAddress //jump to ProcedureAddress and write return address to x1
    

    jal:jump and link instrunction.此指令完成2个动作:

    1. 跳到ProcedureAddress开始执行程序;
    2. 将ProcedureAddress保存到x1寄存器;

    被执行的procedure如何取到x1地址,并继续执行:

    jalr x0,0(x1)
    

    jal的另一种用法:

    jal x0,Label //unconditionally brach to Label.
    

    Using More Registers
    如果一个procedure需要用很多的registers,当使用完成之后,寄存器的值需要恢复,如何存储这些寄存器的值呢?
    从而引入stack的概念,中文翻译为堆栈,procedure操作完成后将寄存器值pop出来重新写入寄存器;
    把如下C程序翻译为assemble;

    long long int leaf_example(long long int g,long long int h,
                               long long int i,long long int j){
        long long int f;
        f=(g+h)-(i+j);
        return f;
    }
    

    g/h/i/j对应x10/x11/x12/x13,f对应x20

    leaf_example:
    addi sp,sp,-24
    sd x5,16(sp)
    sd x6,8(sp)
    sd x20,0(sp)
    
    add x5,x10,x11
    add x6,x12,x13
    sub x20,x5,x6
    addi x10,x20,0
    
    ld x20,0(sp)
    ld x6,8(sp)
    ld x5,16(sp)
    addi sp,sp,24
    jaalr x0,0(x1)
    

    RISC-V规定了19个temporary registers,并将其份为2类:

    • x5-x7 and x28-x31:not preserved by callee on a procedure call;
    • x8-x9 and x18-x27:saved registers that must be preserved on a procedure call
      所以上面汇编中可以省去line3/4/13/14 store和load操作;

    Nested Procedures
    procedure that do not call others are called leaf procedures.
    如果世界上只有leaf procedures,那么世界就变的非常简单;通常情况下一个procedure会调用其他的procedure,甚至会调用自己,我们称之为recursive procedure.
    那么register如何在no-leaf procedure中使用呢?

    举例说明:

    以下C回归调用例子为例

    long long int fact(long long int n){
      if(n<1) 
          return(1)
      else 
          return (n*fact(n-1))
    }
    

    parameter n存在x10中

    FACT:
      addi sp,sp,-16   //adjust stack for 2 items
          sd   x1,8(sp)    //save the return address
          sd   x10,0(sp)   //save the argument n
    
          addi x5,x10,-1   //x5=n-1
          bge  x5,x0,L1    //if(n-1)>=0,go to L1
    
          addi x10,x0,1    //return 1
          addi sp,sp,16    //pop 2 items off stack
          jalr x0,0(x1)    //return to caller
    
    L1: 
      addi x10,x10,-1  //n>=1: argument gets(n-1)
          jal  x1,FACT     //call fact with(n-1)
    
          addi x6,x10,0    //return from jal:move result of fact(n-1) to x6
          ld   x10,0(sp)   //restore argument n
          ld   x1,8(sp)    //restore the return address
          addi sp,sp,16    //adjust stack point to pop 2 items
    
          mul  x10,x10,x6  //return n*fact(n-1)
          jalr x0,0(x1)    //return to the caller 
    

    注意jal、jalr,ld 等指令指挥这指令跳来跳去,并且将stack的参数一步步的推出;

    C语言变量可以按照2类分:

    • type
      • integer
      • character
    • storage class
      • automatic
      • static
        automatic对于一个procedure而言是local的,即程序调用完成后自动消失;而static则贯穿始终;一般用static标识声明;

    总结一下那些需要preserved,而哪些又不需要:

    Preserved Not Preserved
    saved registers:x8-x9,x18-x19 Temprary registers:x5-x7,x28-x31
    Stack pointer register:x2(sp) Argument/result register:x10-x17
    Frame pointer:x8(fp)
    Return address:x1(ra)
    Stack above the stack pointer Stack below the stack pointer

    Allocating Space for New Data on the Stack
    Stack被称为栈,存储程序调用的automatic型的local变量;
    The segment of the stack containing a procdure's saved registers and local variables is called a procedure frame or activation record.

    一个procedure通常使用一个stack segment,而这个stack segment就成为procedure frame/activation record。

    一些RISC-V编译器使用FP(frame pointer)指向procedure frame的第一个doubleword。使用FP的好处是可以很方面的定位local parameter的位置,方便调试与定位;
    A frame pointer offers a stable base register within a procedure for local memory-references.

    Stack一般是从高地址往低地址递减的;

    Allocating Space for New Data on the Heap
    Heap被称为堆,存储程序调用的static型变量;
    Heap是从低地址向高地址增加的;堆的第一个地址是保留的reserved,接着是text segement(the home of the RISC-V machine code),再往上就是static data segment

    C语言中用malloc用free来分配和释放heap空间;
    C语言这种手动分配和释放空间的机制带来了很多bug,相比JAVA则不会;

    最后用一个表格来说明RISC-V寄存器的约定:

    Name Register number Usage Presrved on call
    x0 0 The constant value 0 n.a
    x1(ra) 1 Return address(link register) yes
    x2(sp) 2 Stack pointer yes
    x3(gp) 3 Global pointer yes
    x4(tp) 4 Thread pointer yes
    x5-x7 5-7 Temporaries no
    x8-x9 8-9 Saved yes
    x10-x17 10-17 Arguments/results no
    x18-x27 18-27 Saved yes
    x28-x31 28-31 Temporaries no

    2.9 Communicating with People

    计算机如何显示呢,通常使用8bit的ASCII码;

    为了方便charactor操作,RISC-V提供了2两个instructions

    lbu x12,0(x10) //Read byte from source
    sb x12,0(x11)  //Write byte to destination
    

    lbu:load byte unsigned,将byte load到目的寄存器的最右端;
    sb:store byte,将最右侧的byte存入memory;
    多个charactors会组成string,那么string如何表示,一般有3种方式:

    1. string第一个地址保留,给出string长度;
    2. 用一个伴随变量来表示string长度;
    3. 最后用一个特殊字符表示string结束;

    C语言用了3,而JAVA用了1.
    而JAVA使用Unicode characters,所以它使用16bit来表示一个charactor;
    于是RISC-V需要使用load/store halfword指令来load和store一个charactor;

    lhu x19,0(x10) //Read halfword from source
    sh  x10,0(x11)  //Write halfword to destination
    

    2.10 RISC-V Addressing for Wide Immediates and Addresses

    RISC-V的指令为32bits,这样可以简化硬件的设计,但是对长度过长常量则需要其他的手段解决;
    这节就重点讲解branch和jump中的手段;
    接着讲解RISC-V 指令编码表;

    2.11 Parallelism and Instructions: Synchronization

    data race:比如两个程序访问同一个数据,一个程序没有完成写,而另一个已经开始读,这样就叫data race。

    In computing, synchronization mechanisms are typically built with user-level software routines that rely on hardwre-supplied synchronization instructions.
    软件routines之间通过硬件提供的同步指令来构造同步机制;
    we focus on the implementation of lock and unlock synchronization.
    对于单核processor而言,unlock/lock很容易执行;

    The critical ability we require to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to atomically read and modify a memory location;

    以atomic exchange/atomic swap为例,它主要完成两个存储之间的数值交换;
    Lock=0表示unlock,Lock=1则表示lock;
    从而引出lr.d(load reserved double word)和sc.d(store conditional doubleword)

            addi x12,x0,1       //copy locked value
    again:  lr.d x10,(x20)      //load-reserved to read lock
            bne  x10,x0,again   //check if it is 0 yet
            sc.d x11,x12,(x20)  //attempt to store new value
            bne  x11,x0,again   //branch if store fails
    
            sd x0,0(x20)        //free lock by writing 0
    

    2.12 Translating and Starting a Program

    本节主要讲从磁盘一段C语言文件到计算机可以执行的程序之间的4个步骤:


    2-2.png

    Compiler
    Compiler将高级语言编译为通用的汇编(通用的汇编是因为没有目的器件)
    Assembler
    Assembler就是器件对应的编译器,将通用汇编编译成对应器件支持的汇编;
    Linker

    如何做到修改了一行代码而不用重新开始编译;

    Dynamically Linked Libraries
    传统的link libraries的方法是static方法,有一些缺点:

    • The library routines become part of the executable code.木已成舟,再难修改;
    • 所有的routines都要被load,管你要不要;

    这些缺点导致了DLL(Dynamically linked libraries)的出现;
    DLL,where the library routines are not linked an loaded until the program is run.
    Starting a JAVA Program
    初步认识一些JVM和JIT吧

    2-3.png

    各个平台有不同的JVM,如windows JVM,UNIX JVM等,分别负责将JVM虚拟机最终解释为Windows和Unix instruction;

    2.13 A C Sort Example to Put it All Together

    举连个C语言程序的例子来解释当前章的内容

    1. swap 函数
    2. sort函数,调用swap

    2.14 Arrays versus Pointers

    this section shows C and RISC-V assembly versions of two procedures to clear a sequence of doublewords in memory.

    2.15 Advanced Material: Compiling C and Interpreting Java

    This section gives a bref overview of how the C compiler works and how Java is executed.

    在线阅读内容;

    2.16 Real Stuff: MIPS Instructions

    简单介绍MIPS指令,和RISC-V极其相似;

    2.17 Real Stuff:x86 Instructions

    将近40年的发展,每个milestone加入一系列的指令,大概有13个milestones,作者将其比喻为金色的手铐,其庞杂性限制了它的发展,目前每年有350M x86 chips,而却又14billion ARM chips。从此可以看到起差距之大;

    接着介绍一下x86的寄存器结构和Data Addressing Modes,接着介绍x86 integer operations和Instruction Encoding。

    总结一下就是x86现在不流行了;

    2.18 Real Stuff: The Rest of the RISC-V Instruction Set

    RISC-V指令集分为base和extension

    这个section简单介绍base之外的其他指令,以及介绍一下extension的分类;

    2.19 Fallacies and Pitfalls

    fallacy:

    1. More powerful instructions mean higher performance
    2. Write in assembly language to obtain the highest performance
    3. The importance of commercial binary compatibility means successful instruction sets don't change

    Pitfall:

    1. Forgetting that sequential word or double word addersses in machines with byte addrssing do not differ by one.
    2. Using a pointer to an automatic variable outside its defining procedure.

    2.20 Concluding Remarks

    2.21 Historical Perspective and Further Reading

    相关文章

      网友评论

          本文标题:计算机组成与设计RISC-V版导读--Chapter2

          本文链接:https://www.haomeiwen.com/subject/lwhzfhtx.html