美文网首页
C++之Function语意学(virtual)

C++之Function语意学(virtual)

作者: fooboo | 来源:发表于2019-10-20 10:33 被阅读0次

    这篇准备通过《深度探索C++对象模型》的第四章节Function语意学,简单描述下多重继承下的函数调用,因为最近是在再次阅读此书,顺便记下一些实现并反汇编出具体的实现。

    之前面试过一些候选人,对于虚函数的理解,只是停留在一个虚表,但只要一深入问一点就不清楚了。可能背题只是应付面试或者笔试,但如果想要走更深入一些,还是要理解下原理,甚至自己gdb看一下。当然,这些只是个人选择和兴趣。

    之前由于阅读STL源码剖析不是特别仔细,有些接口的全名和使用差不多都忘记了。这里也贴上之前分析过的:
    类实例的构造
    为什么会继续基类虚函数的默认形参?
    基类指针指向派生类数组的一些问题
    C++对象模型及性能优化杂谈一
    C++对象模型及性能优化杂谈二

    这里不分析inline/static function的情况,只是单纯分析通过对象和引用/指针调用(非)虚成员函数的表现,及多重继承下的情况。

    情况一:

      1 #include <iostream>
      2 using std::cout;
      3 using std::endl;
      4 
      5 class base {
      6 public:
      7     void show() { cout << "base" << endl;}
      8 private:
      9     int mvalue;
     10 };
     11 
     12 int main() {
     13     base obj;
     14     obj.show();
     15     base* pbase = &obj;
     16     pbase->show();
     17     pbase = nullptr;
    

    以下经过返回汇编后的结果,由于mac系统反汇编出的结果有点不一样,这里直接gdb进汇编代码看出:

       0x0000000100000e30 <+0>: push   %rbp
       0x0000000100000e31 <+1>: mov    %rsp,%rbp
       0x0000000100000e34 <+4>: sub    $0x10,%rsp //rsp=rsp-16
       0x0000000100000e38 <+8>: lea    -0xc(%rbp),%rax
    => 0x0000000100000e3c <+12>:    mov    %rax,%rdi //obj的地址
       0x0000000100000e3f <+15>:    callq  0x100000ebe
       0x0000000100000e44 <+20>:    lea    -0xc(%rbp),%rax
       0x0000000100000e48 <+24>:    mov    %rax,-0x8(%rbp)
       0x0000000100000e4c <+28>:    mov    -0x8(%rbp),%rax
       0x0000000100000e50 <+32>:    mov    %rax,%rdi //obj的地址
       0x0000000100000e53 <+35>:    callq  0x100000ebe
    
    (gdb) si
    base::show (this=0x0) at struct.cpp:7
    

    从上面反汇编出来的结果看出,没有合成构造函数,所以不会调用,这里并不需要编译器合成一个;再之,成员函数show没有用到类实例中的成员变量,所以这里的thisnull;这里都到某一跳转callq 0x100000ebe;成员函数show 被编译器name mangling成__ZN4base4showEv;所以通过对象或指针调用非表态成员函数(no virtual),最终都会被转化成类似:__ZN4base4showEv(&obj),传进对象地址,所以这里的性能其实都是一样的。

    带虚函数的单重继承:

      1 #include <iostream>
      2 using std::cout;
      3 using std::endl;
      4 
      5 class base {
      6 public:
      7     virtual void show() { cout << "base" << endl;}
      8 private:
      9     int mvalue;
     10 };
     11 
     12 class derived : public base {
     13 public:
     14     void show() {
     15         cout << "derived" << endl;
     16     }
     17 };
     18 
     19 int main() {
     20     base obj;
     21     obj.show();
     22     base* pbase = new derived;//new (std::nothrow) derived; check pbase
     23     pbase->show();
     24     delete pbase;
     25     pbase = nullptr;
    

    这里作为基类的base,并没有声明一个virtual ~base(),这里测试就省略掉,实际使用中还是要加上。
    在起始处:

       0x0000000100000cef <+0>: push   %rbp
       0x0000000100000cf0 <+1>: mov    %rsp,%rbp
       0x0000000100000cf3 <+4>: push   %rbx
       0x0000000100000cf4 <+5>: sub    $0x28,%rsp
    => 0x0000000100000cf8 <+9>: lea    -0x30(%rbp),%rax
       0x0000000100000cfc <+13>:    mov    %rax,%rdi
       0x0000000100000cff <+16>:    callq  0x100000dac
    

    因为base有虚函数,所以会对base合成一个默认构造函数,此时并不初始化mvalue值(内建类型的整型一般是0),只是设置base实例的vptr:$2 = {_vptr.base = 0x0, mvalue = 0}

    Dump of assembler code for function base::base():
       0x0000000100000ca4 <+0>: push   %rbp
       0x0000000100000ca5 <+1>: mov    %rsp,%rbp
       0x0000000100000ca8 <+4>: mov    %rdi,-0x8(%rbp)
       0x0000000100000cac <+8>: mov    0x365(%rip),%rax        # 0x100001018
       0x0000000100000cb3 <+15>:    lea    0x10(%rax),%rax
       0x0000000100000cb7 <+19>:    mov    -0x8(%rbp),%rdx
    => 0x0000000100000cbb <+23>:    mov    %rax,(%rdx)
       0x0000000100000cbe <+26>:    nop
       0x0000000100000cbf <+27>:    pop    %rbp
       0x0000000100000cc0 <+28>:    retq   
    End of assembler dump.
    

    最后构造函数返回时:

    (gdb) p *this
    $21 = (base) {_vptr.base = 0x100001060 <vtable for base+16>, mvalue = 0}
    (gdb) p /a *(void**)0x100001060@1
    $22 = {0x100000c12 <base::show()>}
    

    接着调用obj.show:

       0x0000000100000d04 <+21>:    lea    -0x30(%rbp),%rax
       0x0000000100000d08 <+25>:    mov    %rax,%rdi
       0x0000000100000d0b <+28>:    callq  0x100000da6
    

    接着new derived分配内存:

       0x0000000100000d10 <+33>:    mov    $0x10,%edi
       0x0000000100000d15 <+38>:    callq  0x100000dd0
       0x0000000100000d1a <+43>:    mov    %rax,%rbx
    => 0x0000000100000d1d <+46>:    mov    %rbx,%rdi
    

    此时:

    0x0000000100000d1d  22      base* pbase = new derived;
    (gdb) p pbase
    $26 = (base *) 0x0
    

    这里new derived分三步走,分配内存,进行构造,再设置pbase,其他平台可能在后两步顺序对调下:

    Dump of assembler code for function derived::derived():
       0x0000000100000cc2 <+0>: push   %rbp
       0x0000000100000cc3 <+1>: mov    %rsp,%rbp
       0x0000000100000cc6 <+4>: sub    $0x10,%rsp
       0x0000000100000cca <+8>: mov    %rdi,-0x8(%rbp)
       0x0000000100000cce <+12>:    mov    -0x8(%rbp),%rax
       0x0000000100000cd2 <+16>:    mov    %rax,%rdi
    => 0x0000000100000cd5 <+19>:    callq  0x100000db2
       0x0000000100000cda <+24>:    mov    0x33f(%rip),%rax        # 0x100001020
       0x0000000100000ce1 <+31>:    lea    0x10(%rax),%rax
       0x0000000100000ce5 <+35>:    mov    -0x8(%rbp),%rdx
       0x0000000100000ce9 <+39>:    mov    %rax,(%rdx)
       0x0000000100000cec <+42>:    nop
       0x0000000100000ced <+43>:    leaveq 
       0x0000000100000cee <+44>:    retq   
    End of assembler dump.
    
    Dump of assembler code for function base::base():
    => 0x0000000100000c86 <+0>: push   %rbp
       0x0000000100000c87 <+1>: mov    %rsp,%rbp
       0x0000000100000c8a <+4>: mov    %rdi,-0x8(%rbp)
       0x0000000100000c8e <+8>: mov    0x383(%rip),%rax        # 0x100001018
       0x0000000100000c95 <+15>:    lea    0x10(%rax),%rax
       0x0000000100000c99 <+19>:    mov    -0x8(%rbp),%rdx
       0x0000000100000c9d <+23>:    mov    %rax,(%rdx)
       0x0000000100000ca0 <+26>:    nop
       0x0000000100000ca1 <+27>:    pop    %rbp
       0x0000000100000ca2 <+28>:    retq   
    End of assembler dump.
    
    (gdb) p *this
    $3 = {_vptr.base = 0x100001060 <vtable for base+16>, mvalue = 0}
    

    当执行完base的构造函数时,回到derived的构造函数那边,会重新设置派生类derived自己的虚表:

    $5 = {<base> = {_vptr.base = 0x100001048 <vtable for derived+16>, mvalue = 0}, <No data fields>}
    
    (gdb) p pbase //调用derived 构造函数前
    $7 = (base *) 0x0
    (gdb) i r rbx
    rbx            0x100600440         4301259840
    
    (gdb) p pbase//调用derived 构造函数后
    $8 = (base *) 0x100600440
    

    最后几行中:

       0x0000000100000d25 <+54>:    mov    %rbx,-0x18(%rbp)
       0x0000000100000d29 <+58>:    mov    -0x18(%rbp),%rax//pbase地址
       0x0000000100000d2d <+62>:    mov    (%rax),%rax//虚表地址
       0x0000000100000d30 <+65>:    mov    (%rax),%rdx//show虚函数地址
    => 0x0000000100000d33 <+68>:    mov    -0x18(%rbp),%rax
       0x0000000100000d37 <+72>:    mov    %rax,%rdi//准备this参数
       0x0000000100000d3a <+75>:    callq  *%rdx//调用show虚函数
    
    (gdb) p /a *(void**)0x100001060@1
    $15 = {0x100000c12 <base::show()>}
    (gdb) p /a *(void**)0x100001048@1
    $16 = {0x100000c4c <derived::show()>}
    

    所以通过上面的调试,可以看出通过对象调用的虚函数,并不会引发虚机制,和调用普通函数一样__ZN4base4showEv(&obj);而通过指向派生类的基类指针,则会转化成类似:(* pbase->vptr[1])(pbase),所以从汇编代码上看,是多了一些指令。

    如果在base的show下面再加个hello的虚函数,并且derived类重写那反汇编后面一段:

       0x0000000100000cb9 <+62>:    mov    (%rax),%rax
    => 0x0000000100000cbc <+65>:    add    $0x8,%rax//hello的偏移量
       0x0000000100000cc0 <+69>:    mov    (%rax),%rdx
       0x0000000100000cc3 <+72>:    mov    -0x18(%rbp),%rax
       0x0000000100000cc7 <+76>:    mov    %rax,%rdi //hello函数地址
       0x0000000100000cca <+79>:    callq  *%rdx
    
    $2 = {_vptr.base = 0x100001048 <vtable for derived+16>, mvalue = 0}
    (gdb) p /a *(void**)0x100001048@1
    $3 = {0x100000b9e <derived::show()>}
    (gdb) p /a *(void**)0x100001048@2
    $4 = {0x100000b9e <derived::show()>, 0x100000bd8 <derived::hello()>}
    

    所以在获取到hello函数在虚表中的位置时,是需要调整位置add $0x8,%rax,类似(* pbase->vptr[2])(pbase);因为base是没有声明并定义virtual ~base所以没在虚表中,当然虚表中也包含其他执行期需要的信息,比如type_info for base等。

    多重继承下的virtual functions问题,因为这里涉及到指针的调整,所以当使用第一个base class指针时和第二及后面的base class指针时,是有些区别,后者需要对this指针作一定的偏移量,这里写上virtual析构可以查看虚表中的内容:

      5 class base1 {
      6 public:
      7     virtual void show() {}
      8     virtual ~base1() {}
      9 private:
     10     int mvalue;
     11 };
     12 
     13 class base2 {
     14 public:
     15     virtual void show() {}
     16     virtual ~base2() {}
     17 private:
     18     int mvalue;
     19 };
     20 
     21 class derived : public base1, public base2 {
     22 public:
     23     void show() {}
     24     virtual ~derived() {}
     25 };
     26 
     27 int main() {
     28     base1* pbase1 = new derived;
     29     pbase1->show();
     30     delete pbase1;
     31     pbase1 = nullptr;
     32 
     33     base2* pbase2 = new derived;
     34     pbase2->show();
     35     delete pbase2;
     36     pbase2 = nullptr;
     37     return 0;
     38 }
    

    因为之前分析过相关的多重继承,这里只是重点关注下class derived的虚表内容,和main中指针的调整:

    (gdb) p *(class derived*)pbase1
    $25 = {<base1> = {_vptr.base1 = 0x100001040 <vtable for derived+16>, mvalue = 0}, <base2> = {
        _vptr.base2 = 0x100001068 <vtable for derived+56>, mvalue = 0}, <No data fields>}
    (gdb) p /a *(void**)0x100001040@2
    $26 = {0x1000008ee <derived::show()>, 0x100000902 <derived::~derived()>}
    (gdb) p /a *(void**)0x100001068@2
    $27 = {0x1000008f9 <_ZThn16_N7derived4showEv>, 0x100000952 <_ZThn16_N7derivedD1Ev>}
    

    当执行完base1* pbase1 = new derived后的derived对象实例的内容为上面的情况,但查看pbase1只能看到class base1的那一部分:

    (gdb) p pbase1
    $28 = (base1 *) 0x100700020
    (gdb) p *pbase1
    $29 = {_vptr.base1 = 0x100001040 <vtable for derived+16>, mvalue = 0}
    

    因为pbase1声明时的静态类型为class base1,告诉编译器他的寻址范围是sizeof(class base1)这么大;接着调用derived::show

    => 0x0000000100000a55 <+57>:    test   %rax,%rax
       0x0000000100000a58 <+60>:    je     0x100000a69 <main()+77>
       0x0000000100000a5a <+62>:    mov    (%rax),%rdx
       0x0000000100000a5d <+65>:    add    $0x10,%rdx
       0x0000000100000a61 <+69>:    mov    (%rdx),%rdx
       0x0000000100000a64 <+72>:    mov    %rax,%rdi
       0x0000000100000a67 <+75>:    callq  *%rdx
       0x0000000100000a69 <+77>:    movq   $0x0,-0x18(%rbp)
    

    上面几行对应delete pbase1; pbase1 = nullptr;,会先判断pbase1是否为空,否则取class derived的析构函数,并准备this参数,调用~derived()

    Dump of assembler code for function derived::~derived():
    => 0x000000010000095c <+0>: push   %rbp
       0x000000010000095d <+1>: mov    %rsp,%rbp
       0x0000000100000960 <+4>: sub    $0x10,%rsp
       0x0000000100000964 <+8>: mov    %rdi,-0x8(%rbp)
       0x0000000100000968 <+12>:    mov    -0x8(%rbp),%rax
       0x000000010000096c <+16>:    mov    %rax,%rdi
       0x000000010000096f <+19>:    callq  0x100000b6e
       0x0000000100000974 <+24>:    mov    -0x8(%rbp),%rax
       0x0000000100000978 <+28>:    mov    $0x20,%esi
       0x000000010000097d <+33>:    mov    %rax,%rdi
       0x0000000100000980 <+36>:    callq  0x100000b7a
       0x0000000100000985 <+41>:    leaveq 
       0x0000000100000986 <+42>:    retq   
    End of assembler dump.
    

    这里发现一个现像,如果pbase1的内容为0x100700020,那么delete pbase1后不执行pbase1 = nullptr,然后再执行base2* pbase2 = new derived时,new derived的地址也是0x100700020,如果这里手误delete pbase1,那么会出现问题,比如破坏pbase2内存,double delete的情况。

       0x0000000100000a81 <+101>:   callq  0x100000b62
       0x0000000100000a86 <+106>:   test   %rbx,%rbx
       0x0000000100000a89 <+109>:   je     0x100000a91 <main()+117>
       0x0000000100000a8b <+111>:   lea    0x10(%rbx),%rax
       0x0000000100000a8f <+115>:   jmp    0x100000a96 <main()+122>
       0x0000000100000a91 <+117>:   mov    $0x0,%eax
    => 0x0000000100000a96 <+122>:   mov    %rax,-0x20(%rbp)
       0x0000000100000a9a <+126>:   mov    -0x20(%rbp),%rax
       0x0000000100000a9e <+130>:   mov    (%rax),%rax
       0x0000000100000aa1 <+133>:   mov    (%rax),%rdx
       0x0000000100000aa4 <+136>:   mov    -0x20(%rbp),%rax
       0x0000000100000aa8 <+140>:   mov    %rax,%rdi
       0x0000000100000aab <+143>:   callq  *%rdx
    
    (gdb) p pbase2
    $35 = (base2 *) 0x100700030
    (gdb) p *pbase2
    $36 = {_vptr.base2 = 0x100001068 <vtable for derived+56>, mvalue = 0}
    

    上面是base2* pbase2 = new derived的汇编,这里new一段内存后,会对返回的地址进行判断,不为0则会加上sizeof(class base1)的偏移量作为pbase2内容;接着准备参数调用虚函数。

    这句base2* pbase2 = new derived,差不多会编译成这样的语句:

    derived* temp = new derived;
    base2* pbase2 = temp ? temp + sizeof(base1) : 0;
    

    最后delete pbase2时需要调整指针位置:

       0x0000000100000aad <+145>:   mov    -0x20(%rbp),%rax
       0x0000000100000ab1 <+149>:   test   %rax,%rax
       0x0000000100000ab4 <+152>:   je     0x100000ac5 <main()+169>
       0x0000000100000ab6 <+154>:   mov    (%rax),%rdx
       0x0000000100000ab9 <+157>:   add    $0x10,%rdx
       0x0000000100000abd <+161>:   mov    (%rdx),%rdx
       0x0000000100000ac0 <+164>:   mov    %rax,%rdi
       0x0000000100000ac3 <+167>:   callq  *%rdx
    

    以上的程序只是测试使用,实际工程项目中的代码要更严谨些,有良好的编码习惯,比如作为base class的要带个virtual destructor,delete后要置null,对于new的使用要加std::nothrow并判断空,并作后续处理等。

    最后关于虚拟继承这个,其实算是比较复杂的,之前经历中也很少见到此类的使用场景,有兴趣自行参考书《深度探索C++对象模型》。

    相关文章

      网友评论

          本文标题:C++之Function语意学(virtual)

          本文链接:https://www.haomeiwen.com/subject/ysbwmctx.html