【翻译笔记】Linux内核项目往来邮件归档：回复：/proc文件

作者: DragonGlass | 来源:发表于2022-07-28 18:07 被阅读0次

1996年08月05日周一，Peter P. Eiserloh写道：
“我们需要清晰化线程的概念。好多人都把线程和进程混为一谈。接下来的讨论无关当前Linux，是更高维度的讨论。”

以下是Linus在次日(1996年08月06日)的回复：

并不是！

不用将线程、进程视为不同实体。传统意义上确实是把他们分开的，但我个人认为这是一个常见的错误、一个历史包袱。

线程和进程本质上都只是执行上下文，硬要将他们区分开，是自己给自己加限制的行为。

“执行上下文”(Context of Execution)，以下简称COE

一个COE，只是这个COE的所有状态的集合。这些状态包括CPU状态，比如寄存器，内存管理单元状态(内存页映射)，权限状态(UID GID)，以及一系列通讯状态(打开的文件，信号句柄等)。

传统意义上，线程和进程的区别主要是，一个线程有CPU状态(以及包括一些最小化状态)，而其他上下文都来自于进程。但这只是COE的一种区分方式，而且并没有谁说这是唯一正确的。把自己这样限制住，是愚蠢的。

Linux的实际工作方式(或者说我设计的方式)，是没有进程或线程的概念的。一切都只是COE(也就是Linux的task，即任务)。不同的COE之间可以共享他们的一部分上下文。这里说的“共享”的一个子集，就是传统意义上的线程或进程。

注意哦，这只是子集(这部分子集很重要，但重要性并非来自设计，而是标准。我们显然想要在Linux上跑符合标准的线程)。

简单来说，不应该按照线程/进程的思路展开设计。内核应当按COE的方式思考和工作，这样pthreads库就可以导出受限/有限的pthreads接口，让用户可以按COE的方式思考和使用它们。

举个例子，当你以COE而不是传统的线程或进程方式思考：

你可以写一个外部"cd"程序，传统UNIX或者说进程/线程(不是好例子，但点在于，你可以有这类模块而不是限制在传统UNIX线程设定里)。

clone(CLONE_VM|CLONE_FS);

/* 子任务: execve("external-cd"); */
/* execve会解除关联，因此，我们用CLONE_VM共享虚存，加快clone()的速度 */
/* exec*, 在父进程中fork一个子任务，在子任务中调用exec*函数启动新的程序。*/
/* exec*函数一共有六个，其中execve为内核级系统调用 */
/* 其他（execl，execle，execlp，execv，execvp）都是调用execve的库函数。*/
/* CLONE_VM，子任务与父任务运行于相同的内存空间 */
/* CLONE_FS，子任务与父任务共享相同的文件系统，包括root、当前目录、umask */

你可以更自然地 vfork() （只需要最小内核就能行，但完全符合CUA的思维方式，即IBM的Common User Access思想，即便底层实现不同，也尽可能用一套统一的接口给用户使用）

clone(CLONE_VM);

/* 子任务: 继续运行，最后调用execve() */
/* 父任务: 等待子任务调execve */
/* CLONE_VM，子任务与父任务运行于相同的内存空间 */
/* fork创造的子任务是父任务的完整副本，复制了父任务的资源，包括内存的内容task_struct内容 */
/* vfork创建的子任务与父任务共享数据段，由vfork()创建的子任务将先于父任务运行 */
/* 内核连子任务的虚拟地址空间结构也不创建了，直接共享了父任务的虚拟空间，当然了，这种做法就顺水推舟的共享了父任务的物理空间 */
/* clone可以让你有选择性的继承父任务的资源，你可以选择想vfork一样和父任务共享一个虚存空间，从而使创造的是线程，你也可以不和父任务共享，你甚至可以选择创造出来的子任务和父任务不再是父子关系，而是兄弟关系。*/

你可以这样写一个外部的IO守护程序

clone(CLONE_FILES);
/*子任务: 打开文件描述符等*/
/*父任务: 使用子任务打开的文件描述符并vv(?)*/
/* CLONE_FILES，子任务与父任务共享相同的文件描述符（file descriptor）表 */

这三项都能做到，但如果你把自己绑死在(传统意义上的)线程/进程的思维方式里，就难了。

想想一个Web服务器，CGI脚本是按“执行线程”跑着的。这里你并不是用传统线程，因为传统线程总是需要共享整个地址空间，你需要把所有想要跑的业务逻辑与Web服务器的主逻辑编译链接(link)起来才行，因为一个线程并不能跑其他可执行文件。

按COE思考，你的任务现在只是选择执行一个所需的外部程序（来自另一个单独父任务的地址空间），或者比如与父任务共享除了文件描述符之外的所有东西。这样子任务就可以打开超多文件，而父任务里完全不用担心这部分逻辑。文件会在子任务退出后自动关闭，不消耗父任务的文件描述符资源(数量)。

线程化的inetd的例子
inetd是一个典型的多线程例子，它监控网络请求，然后根据请求类型，调用对应服务逻辑，含内部的比如echo(请回答)和timeofday(几点啦)，也可以调用外部的比如rlogind(处理一些远程登录什么的)。

设想你需要一个低开销的fork+exec，与其用fork()，不如写一个这样的多线程inetd：

每个子任务都经clone(CLONE_VM)创建，即共享地址空间而不共享文件描述符。
调外部服务比如rlogind，子任务通过execve调就行。
调inetd的内部服务，比如echo或timeofday，子任务执行自己逻辑然后退出就是了。

用(传统意义上的)线程和进程，可干不了这些。

Linus

注：
按Linus的陈述，clone(0)等同于fork()

附：
clone()函数的标识含义
CLONE_PARENT 创建的子进程的父进程是调用者的父进程，新进程与创建它的进程成了“兄弟”而不是“父子”
CLONE_FS 子进程与父进程共享相同的文件系统，包括root、当前目录、umask
CLONE_FILES 子进程与父进程共享相同的文件描述符（file descriptor）表
CLONE_NEWNS 在新的namespace启动子进程，namespace描述了进程的文件hierarchy
CLONE_SIGHAND 子进程与父进程共享相同的信号处理（signal handler）表
CLONE_PTRACE 若父进程被trace，子进程也被trace
CLONE_VFORK 父进程被挂起，直至子进程释放虚拟内存资源
CLONE_VM 子进程与父进程运行于相同的内存空间
CLONE_PID 子进程在创建时PID与父进程一致
CLONE_THREAD Linux 2.4中增加以支持POSIX线程标准，子进程与父进程共享相同的线程群

参考资料：
https://blog.csdn.net/gatieme/article/details/51417488

原文存档：
https://web.archive.org/web/20050816220029/http://www.ussg.iu.edu/hypermail/linux/kernel/9608.0/0191.html

原文：

Re: proc fs and shared pids

Linus Torvalds (torvalds@cs.helsinki.fi)
Tue, 6 Aug 1996 12:47:31 +0300 (EET DST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Bernd P. Ziller: "Re: Oops in get_hash_table"
Previous message: Linus Torvalds: "Re: I/O request ordering"

On Mon, 5 Aug 1996, Peter P. Eiserloh wrote:
>
> We need to keep a clear the concept of threads. Too many people
> seem to confuse a thread with a process. The following discussion
> does not reflect the current state of linux, but rather is an
> attempt to stay at a high level discussion.

NO!

There is NO reason to think that "threads" and "processes" are separate
entities. That's how it's traditionally done, but I personally think it's a
major mistake to think that way. The only reason to think that way is
historical baggage.

Both threads and processes are really just one thing: a "context of
execution". Trying to artificially distinguish different cases is just
self-limiting.

A "context of execution", hereby called COE, is just the conglomerate of
all the state of that COE. That state includes things like CPU state
(registers etc), MMU state (page mappings), permission state (uid, gid)
and various "communication states" (open files, signal handlers etc).

Traditionally, the difference between a "thread" and a "process" has been
mainly that a threads has CPU state (+ possibly some other minimal state),
while all the other context comes from the process. However, that's just
one way of dividing up the total state of the COE, and there is nothing
that says that it's the right way to do it. Limiting yourself to that kind of
image is just plain stupid.

The way Linux thinks about this (and the way I want things to work) is that
there is no such thing as a "process" or a "thread". There is only the
totality of the COE (called "task" by Linux). Different COE's can share parts
of their context with each other, and one subset of that sharing is the
traditional "thread"/"process" setup, but that should really be seen as ONLY
a subset (it's an important subset, but that importance comes not from
design, but from standards: we obviusly want to run standards-conforming
threads programs on top of Linux too).

In short: do NOT design around the thread/process way of thinking. The
kernel should be designed around the COE way of thinking, and then the
pthreads library can export the limited pthreads interface to users who
want to use that way of looking at COE's.

Just as an example of what becomes possible when you think COE as opposed
to thread/process:

You can do a external "cd" program, something that is traditionally
impossible in UNIX and/or process/thread (silly example, but the idea
is that you can have these kinds of "modules" that aren't limited to
the traditional UNIX/threads setup). Do a:

clone(CLONE_VM|CLONE_FS);
child: execve("external-cd");
/* the "execve()" will disassociate the VM, so the only reason we
used CLONE_VM was to make the act of cloning faster */

You can do "vfork()" naturally (it meeds minimal kernel support, but
that support fits the CUA way of thinking perfectly):

clone(CLONE_VM);
child: continue to run, eventually execve()
mother: wait for execve

you can do external "IO deamons":

clone(CLONE_FILES);
child: open file descriptors etc
mother: use the fd's the child opened and vv.

All of the above work because you aren't tied to the thread/process way of
thinking. Think of a web server for example, where the CGI scripts are done
as "threads of execution". You can't do that with traditional threads,
because traditional threads always have to share the whole address space, so
you'd have to link in everything you ever wanted to do in the web server
itself (a "thread" can't run another executable).

Thinking of this as a "context of execution" problem instead, your tasks can
now chose to execute external programs (= separate the address space from the
parent) etc if they want to, or they can for example share everything with
the parent except for the file descriptors (so that the sub-"threads" can
open lots of files without the parent needing to worry about them: they close
automatically when the sub-"thread" exits, and it doesn't use up fd's in the
parent).

Think of a threaded "inetd", for example. You want low overhead fork+exec, so
with the Linux way you can instead of using a "fork()" you write a
multi-threaded inetd where each thread is created with just CLONE_VM (share
address space, but don't share file descriptors etc). Then the child can
execve if it was a external service (rlogind, for example), or maybe it was
one of the internal inetd services (echo, timeofday) in which case it just
does it's thing and exits.

You can't do that with "thread"/"process".

Linus

【翻译笔记】Linux内核项目往来邮件归档：回复：/proc文件

Re: proc fs and shared pids

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读