vfork 这么轻量，能有什么坏心思呢

作者: 虾饺的开发手记 | 来源:发表于2021-06-07 17:42 被阅读0次

vfork 这么轻量，能有什么坏心思呢
我能有什么坏心思呢？
妈妈能有什么坏心思呢
小朋友能有什么坏心思呢
小朋友能有什么坏心思呢？
同事能有多坏呢？
小孩子，能有什么坏心思呢？
小猫咪能有什么坏心思呢
小动物们能有什么坏心思呢？
荐书 | 《肠子的小心思》：肠子能有什么怀心思呢？

起因

这篇文章的起因是某个非常奇怪的 bug，bug 的部分 logcat 日志如下：

2021-06-07 12:59:02.603 10399-10399/com.example.android ....
2021-06-07 12:59:02.604 10458-10399/? ....
2021-06-07 12:59:02.604 10458-10399/? ....
2021-06-07 12:59:02.604 10458-10399/? ....
2021-06-07 12:59:02.605 10399-10433/com.example.android ....
2021-06-07 12:59:02.605 10399-10433/com.example.android ....
2021-06-07 12:59:02.605 10399-10433/com.example.android ....
2021-06-07 12:59:02.605 10399-10433/com.example.android ....
2021-06-07 12:59:02.606 10399-10399/com.example.android ....

已知：

log 时间后面的第一个数字是进程号（pid）
第二个数字是打印日志的线程的线程号（tid）
主线程的 tid 跟进程 pid 相等

根据上述信息，我们可以得到结论：

由于日志里出现了 “10399-10399”，我们可以推断出，这是某个 pid 为 10399 的进程的主线程打印的。实际上，这是我们应用程序 com.example.android 打印的日志，它的 pid 为 10399。
第 2 ~ 4 行日志，线程 10399 变成了进程 10458 的某个子线程

等等，线程 10399 怎么变成了另一个进程的线程。是不是有什么地方搞错了。

通过查看代码，最终发现，出错每次都发生在我们调用 Runtime.getRuntime().exec(...)时。我们有理由相信，是这个调用导致的问题。

Runtime.getRuntime().exec(…) 都干了些什么

从接口推断，exec 方法应该 fork 了一个新的进程，跟着 exec 一个新的可执行程序。由于 fork 进程不应该对原进程产生任何影响，这个 bug 似乎不应该发生才对……

尽管这个 bug 比较耸人听闻，既然发生了，我们还是得探查一下源码，查个究竟才行。Runtime.getRuntime().exec(...) 在 Android 11 上最终调用的是 native 方法 UNIXProcess::forkAndExec：

JNIEXPORT jint JNICALL
UNIXProcess_forkAndExec(...)
{
    // 这里省略处理输入输出重定向的代码

    int resultPid = startChild(c);

    // 这里省略一些资源清理代码

    return resultPid;
}

static pid_t
startChild(ChildStuff *c) {
#if START_CHILD_USE_CLONE
#define START_CHILD_CLONE_STACK_SIZE (64 * 1024)
    /*
     * See clone(2).
     * Instead of worrying about which direction the stack grows, just
     * allocate twice as much and start the stack in the middle.
     */
    if ((c->clone_stack = malloc(2 * START_CHILD_CLONE_STACK_SIZE)) == NULL)
        /* errno will be set to ENOMEM */
        return -1;
    return clone(childProcess,
                 c->clone_stack + START_CHILD_CLONE_STACK_SIZE,
                 CLONE_VFORK | CLONE_VM | SIGCHLD, c);
#else
  #if START_CHILD_USE_VFORK
    /*
     * We separate the call to vfork into a separate function to make
     * very sure to keep stack of child from corrupting stack of parent,
     * as suggested by the scary gcc warning:
     *  warning: variable 'foo' might be clobbered by 'longjmp' or 'vfork'
     */
    volatile pid_t resultPid = vfork();
  #else
    /*
     * From Solaris fork(2): In Solaris 10, a call to fork() is
     * identical to a call to fork1(); only the calling thread is
     * replicated in the child process. This is the POSIX-specified
     * behavior for fork().
     */
    pid_t resultPid = fork();
  #endif
    if (resultPid == 0)
        childProcess(c);
    assert(resultPid != 0);  /* childProcess never returns */
    return resultPid;
#endif /* ! START_CHILD_USE_CLONE */
}

根据配置的不同，startChild 会有 3 种不同的行为：

如果 START_CHILD_USE_CLONE，那么使用 clone(2) 来创建进程
如果 START_CHILD_USE_VFORK，那么使用 vfork(2)
其他情况下，使用 fork(2)

此外，在该源文件（UNIXProcess_md.c）还有一段注释：

/*
 * There are 3 possible strategies we might use to "fork":
 *
 * - fork(2).  Very portable and reliable but subject to
 *   failure due to overcommit (see the documentation on
 *   /proc/sys/vm/overcommit_memory in Linux proc(5)).
 *   This is the ancient problem of spurious failure whenever a large
 *   process starts a small subprocess.
 *
 * - vfork().  Using this is scary because all relevant man pages
 *   contain dire warnings, e.g. Linux vfork(2).  But at least it's
 *   documented in the glibc docs and is standardized by XPG4.
 *   http://www.opengroup.org/onlinepubs/000095399/functions/vfork.html
 *   On Linux, one might think that vfork() would be implemented using
 *   the clone system call with flag CLONE_VFORK, but in fact vfork is
 *   a separate system call (which is a good sign, suggesting that
 *   vfork will continue to be supported at least on Linux).
 *   Another good sign is that glibc implements posix_spawn using
 *   vfork whenever possible.  Note that we cannot use posix_spawn
 *   ourselves because there's no reliable way to close all inherited
 *   file descriptors.
 *
 * - clone() with flags CLONE_VM but not CLONE_THREAD.  clone() is
 *   Linux-specific, but this ought to work - at least the glibc
 *   sources contain code to handle different combinations of CLONE_VM
 *   and CLONE_THREAD.  However, when this was implemented, it
 *   appeared to fail on 32-bit i386 (but not 64-bit x86_64) Linux with
 *   the simple program
 *     Runtime.getRuntime().exec("/bin/true").waitFor();
 *   with:
 *     #  Internal Error (os_linux_x86.cpp:683), pid=19940, tid=2934639536
 *     #  Error: pthread_getattr_np failed with errno = 3 (ESRCH)
 *   We believe this is a glibc bug, reported here:
 *     http://sources.redhat.com/bugzilla/show_bug.cgi?id=10311
 *   but the glibc maintainers closed it as WONTFIX.
 *
 * Based on the above analysis, we are currently using vfork() on
 * Linux and fork() on other Unix systems, but the code to use clone()
 * remains.
 */

#define START_CHILD_USE_CLONE 0  /* clone() currently disabled; see above. */

#ifndef START_CHILD_USE_CLONE
  #ifdef __linux__
    #define START_CHILD_USE_CLONE 1
  #else
    #define START_CHILD_USE_CLONE 0
  #endif
#endif

/* By default, use vfork() on Linux. */
#ifndef START_CHILD_USE_VFORK
// Android-changed: disable vfork under AddressSanitizer.
//  #ifdef __linux__
  #if defined(__linux__) && !__has_feature(address_sanitizer) && \
      !__has_feature(hwaddress_sanitizer)
    #define START_CHILD_USE_VFORK 1
  #else
    #define START_CHILD_USE_VFORK 0
  #endif
#endif

总结起来就是，由于 overcommit 问题，Linux 默认使用 vfork，其他的系统默认用 fork。

原先我们以为，forkAndExec 应该是使用 fork 实现的，但实际上却是 vfork，难道会是 vfork 导致的问题？

vfork 真的人畜无害吗

vfork 和 fork 两个区别：

vfork 后的子进程和父进程共享内存空间（子进程对内存的修改，父进程可以读到）
父进程 vfork 需要等待子进程退出或执行了 exec 后才返回

设计 vfork 主要用于 fork 后马上执行 exec 的场景，但由于 Copy on Write，目前已不太建议使用。

难道 vfork 除了文档里说的，还会复用调用者进程的什么东西，使得子进程在 gettid 的时候，拿到了错误的 tid？

翻了一圈源码后，我得出结论：vfork 真的没有干什么特别的事，他规规矩矩地创建了一个新的进程（Linux 内核用 task_struct 表示），子进程的 pid、tgid 都是新分配的 pid（注：getpid 返回的是 tgid，gettid 返回的是 pid）；创建成功后，父进程就开始等待子进程。

相关代码在源文件 fork.c 的 SYSCALL_DEFINE0(vfork) 处，这里不深究了。

此时根据内核源码可以得出结论：vfork 出来的子进程只会跟父进程共享内存空间，不存在线程相关的交集。那么，可能出问题的就是内存了。

在继续分析问题之前，这里需要讲一个小插曲。因为这个 bug 实在是太奇怪，以至于我都怀疑起自己对 getpid 理解；所以在发现这个问题的时候，我仔细地看了一遍 man page。关于 getpid 的 man page，有这样一段描述：

From glibc version 2.3.4 up to and including version 2.24, the glibc wrapper function for getpid() cached PIDs, with the goal of avoiding additional system calls when a process calls getpid() repeatedly. Normally this caching was invisible, but its correct operation relied on support in the wrapper functions for fork(2), vfork(2), and clone(2): if an application bypassed the glibc wrappers for these system calls by using syscall(2), then a call to getpid() in the child would return the wrong value (to be precise: it would return the PID of the parent process). In addition, there were cases where getpid() could return the wrong value even when invoking clone(2) via the glibc wrapper function. (For a discussion of one such case, see BUGS in clone(2).) Furthermore, the complexity of the caching code had been the source of a few bugs within glibc over the years.

Because of the aforementioned problems, since glibc version 2.25, the PID cache is removed: calls to getpid() always invoke the actual system call, rather than returning a cached value.

由于出现 bug 的代码，子进程 gettid 返回了父进程主线程的 tid，结合 glibc getpid 的这个缓存的问题，我们有理由怀疑，gettid 也有类似的问题。

Android 使用的 libc 实现是 bionic，bionic gettid 的源码如下：

// platform/bionic/libc/bionic/gettid.cpp
pid_t gettid() {
  pthread_internal_t* self = __get_thread();
  if (__predict_true(self)) {
    pid_t tid = self->tid;
    if (__predict_true(tid != -1)) {
      return tid;
    }
    self->tid = syscall(__NR_gettid);
    return self->tid;
  }
  return syscall(__NR_gettid);
}

由于我们的代码在主线程调用了 Runtime.getRuntime().exec(…)，vfork 出来的子进程会从调用 vfork 的代码后开始执行，那它的主线程的 TLS 跟父进程的主进程就是同一个，所以这里从缓存里读到了父进程的主线程的 tid。

tid 真相大白后，接下的问题是，问什么子线程打印的日志的 pid 是正确的？毕竟，bionic 的 getpid 也缓存了 pid：

// platform/bionic/libc/bionic/getpid.cpp
extern "C" pid_t __getpid();

pid_t __get_cached_pid() {
  pthread_internal_t* self = __get_thread();
  if (__predict_true(self)) {
    pid_t cached_pid;
    if (__predict_true(self->get_cached_pid(&cached_pid))) {
      return cached_pid;
    }
  }
  return 0;
}

pid_t getpid() {
  pid_t cached_pid = __get_cached_pid();
  if (__predict_true(cached_pid != 0)) {
    return cached_pid;
  }

  // We're still in the dynamic linker or we're in the middle of forking, so ask the kernel.
  // We don't know whether it's safe to update the cached value, so don't try.
  return __getpid();
}

答案其实就在 vfork 里。bionic 的 vfork 包裹函数在执行系统调用 vfork 之前，把缓存的 pid 给清掉了：

// platform/bionic/libc/arch-arm/bionic/vfork.S
ENTRY(vfork)
__BIONIC_WEAK_ASM_FOR_NATIVE_BRIDGE(vfork)
    // r3 = &__get_tls()[TLS_SLOT_THREAD_ID]
    mrc     p15, 0, r3, c13, c0, 3
    ldr     r3, [r3, #(TLS_SLOT_THREAD_ID * 4)]

    // Set cached_pid_ to 0, vforked_ to 1, and stash the previous value.
    mov     r0, #0x80000000
    ldr     r1, [r3, #12]
    str     r0, [r3, #12]

    mov     ip, r7
    ldr     r7, =__NR_vfork
    swi     #0
    mov     r7, ip

    teq     r0, #0
    bxeq    lr

    // rc != 0: reset cached_pid_ and vforked_.
    str     r1, [r3, #12]
    cmn     r0, #(MAX_ERRNO + 1)

    bxls    lr
    neg     r0, r0
    b       __set_errno_internal
END(vfork)

到这里我们剩余的最后一个是，为什么原本应该出现在父进程的日志，现在却在子进程打印了出来？（文章开头的第 2~4 行日志）

谨慎 hook close 函数

在子进程打印出来的日志，调用路径是我们设置的 close 函数的 hook。也就是说，子进程在调用 close 的时候，执行到了我们的 hook 函数，结果由于 tid 缓存的原因（更有甚者，我们的代码里面还自己缓存了 pid），导致了一系列诡异的问题。

子进程调用 close 是在前面我们提到的 startChild 里进行的：

static pid_t startChild(ChildStuff *c) {
    volatile pid_t resultPid = vfork();
    if (resultPid == 0)
        childProcess(c);
    return resultPid;
}

static int childProcess(void *arg) {
    ...

    /* close everything */
    if (closeDescriptors() == 0) { /* failed,  close the old way */
        int max_fd = (int)sysconf(_SC_OPEN_MAX);
        int fd;
        for (fd = FAIL_FILENO + 1; fd < max_fd; fd++)
            if (restartableClose(fd) == -1 && errno != EBADF)
                goto WhyCantJohnnyExec;
    }

    ...

    JDK_execvpe(p->argv[0], p->argv, p->envv);
}

一般情况下，我们打开一个文件时不会特意去设置 O_CLOEXEC (close on exec)，这些文件在 fork 后在子进程依然是可读的。对于通用性的 forkAndExec 实现，这些文件显然对子进程是不必要的，所以在这里都关掉了。

由于 forkAndExec 可能在任意地方被调用，如果我们 hook 了系统的 so 的 close 函数，就需要做好 close hook 在子进程被调用的准备，否则，它肯定会让你大吃一惊。

vfork 这么轻量，能有什么坏心思呢
起因这篇文章的起因是某个非常奇怪的 bug，bug 的部分 logcat 日志如下：已知： log 时间后面的...
我能有什么坏心思呢？
我从没想过像心机深，表面一套背地里一套，表面对人很好，背地里给你捅刀子，只为自己好……这样的话，有人会放在我身上 ...
妈妈能有什么坏心思呢
徐敏看着镜子里的自己，再次确定自己是徐小玉的亲生女儿，因为她们有一样的脸型眉骨。她试着笑了一下，真是糟糕，连笑起...
小朋友能有什么坏心思呢
小朋友能有什么坏心思呢？只不过在你刚收拾完被他弄满地的玩具，转身收拾沙发的时候，他一把把玩具箱翻过来，又是一地玩具...
小朋友能有什么坏心思呢？
前天腰疼，我在家里抱怨，大宝听到后主动说妈妈，我来给你按摩一下。他用小手帮我捶腰，力度刚刚好，节奏也不错，真的是很...
同事能有多坏呢？
同事能有多坏呢？以前我没有想过这个问题，我知道同事一场，最多也就是耍耍小心机罢了，毕竟在职六七年时间，做过两份工...
小孩子，能有什么坏心思呢？
“文老师，文老师，已经打铃了，小蔡上课还在外面玩。”今天唯一一节没事的空余时间，在想着等下第七节课的的团辅怎么操作...
小猫咪能有什么坏心思呢
养猫的好处自不必说，比如小猫很可爱啦，给枯燥的生活增添乐趣啦，可以摸它肥肥的肚子啦，等等。但硬币总是有两面嘛，养猫...
小动物们能有什么坏心思呢？
火爆全球的“去码头整点薯条”梗，就来自这本书！！小动物们能有什么坏心思呢？文：酉告不知大家发现没有，短视频平...
荐书 | 《肠子的小心思》：肠子能有什么怀心思呢？
呼~夏天来了，西瓜、冷饮、凉虾、冰粉、雪糕、冰淇淋……也一起来了！那么，你的肠胃准备好了吗？关于肠胃，或许你也...