袖珍分布式系统（三）

作者: 小聪明李良才 | 来源:发表于2017-01-23 22:46 被阅读656次

袖珍分布式系统（三）
袖珍分布式系统（一）
袖珍分布式系统（二）
袖珍分布式系统（四）
什么是分布式系统，如何学习分布式系统
FastDFS分布式文件系统(二)
分布式文件系统从设计到实现 - Replication（四）代
FastDFS分布式文件系统(一)
三分钟分布式CAP理论就这么复杂
Hadoop框架：Yarn基本结构和运行原理

本文是Distributed systems for fun and profit的第三部分，本文是阅读该文后的一些记录。

Time and order

先来看第一个问题：

What is order and why is it important?

为什么我们要关心是否 A happened before B?

回答这个问题之前，我们先来看下分布式编程的定义。

the art of solving the same problem that you can solve on a single computer using multiple computers.

在单机系统中，传统的模式是： a single program, one process, one memory space running on one CPU。我们做了很多努力来给编程者提供一种简单的编程模型，一种顺序执行的模型，让程序实际执行的顺序就是代码的顺序。

但是我们一旦来到分布式环境中，我们却发现再也没有这种简单的编程模型了，程序实际执行的顺序你忽然间就无法预测了，因为每个节点时钟不是严格同步的，当然你可以去用复杂的技术来实现所有节点的时钟同步，然后给予每个操作一个时间戳，从而得到一个全局的total order；另一个思路是通过一个communication system，给每个操作都编号，从而得到一个顺序，但是就像我们之前说的一样，在分布式系统中，通信是不可靠的，您不可能确定的知道另外一个节点的状态，我们可以看到以上两种方法都很难。

Total and partial order

在分布式环境中一种常见的状态是：partial order，即部分有，在集合中不是任意两个元素都是可比较的。

Total 和 partial order 都满足自反性和传递性。

If a ≤ b and b ≤ a then a = b

(antisymmetry);

If a ≤ b and b ≤ c then a ≤ c

(transitivity);

total order还满足：

a ≤ b or b ≤ a (totality) for all a, b in X

即所有集合中的元素都是有序的，而partial order则：

a ≤ a (reflexivity) for all a in X

即集合中的元素不是都有序，只满足自反性。

在单机系统中，所有的指令和消息的执行都是可预测的，是有一个total order的，因此程序的行为是可预测的，但是在分布式系统中想要实现total order，代价是巨大的，因为

communication is expensive, and time synchronization is diﬃcult and fragile

What is time?

Time is a source of order - it allows us to deﬁne the order of operations- which coincidentally also has an interpretation that people can understand (a second, a minute, a day and so on).

什么是时间？时间有时候就像一个计数器，只不过时间这个计数器比较重要，我们用这个计数器产生的数值来定义整个人类的最重要的概念：时间。

Timestamps really are a shorthand value for representing the state of the world from the start of the universe to the current moment - if something occurred at a particular timestamp, then it was potentially inﬂuenced by everything that happened before it.

什么是时间戳（Timestamps），Timestamps定义了世界从初始到现在的状态，如果某件事发生在一个特定的时间点上，是之前影响产生的结果。

此处有个大前提，所有的时间都以相同的速率前行着，time and timestamps在程序中应用的时候，有3个常用的解释：

Order
Duration
Interpretation

当我说：time is a source of order，我指的是：

we can attach timestamps to unordered events to order them【通过给事件安排一个时间戳，从而给事件排序】
we can use timestamps to enforce a speciﬁc ordering of operations or the delivery of messages (for example, by delaying an operation if it arrives out of order)【我们可以通过时间戳给操作重新排序】
we can use the value of a timestamp to determine whether something happened chronologically before something else【通过时间戳知道哪个事件发生在前】

Interpretation - time as a universally comparable value.

时间戳的绝对值解释为日期（date），对人们非常有用的概念

Duration - durations measured in time have some relation to the real world.

像算法一般只关心Duration，通过Duration来判断延迟，一个好的算法都希望能有低延迟。

Does time progress at the same rate everywhere?

对于问题：时间以同样的速率进行吗？有3个常见的回答：

"Global clock": yes
"Local clock": no, but
"No clock": no!

以上回答的的意思是：

the synchronous system model has a global clock,
the partially synchronous model has a local clock, and
in the asynchronous system model one cannot use clocks at all

下面分别来看下

Time with a "global-clock" assumption

全局时钟（global-clock）假设是我们有个非常精确的时钟，我们任何人在任何地方都能看到他。这也是平时生活中我们最习以为常的时钟，对于不同时钟间微小的差别我们并不关注。

在实际系统中，如果我们假设有个global-clock，即每个节点的时钟都是同步的，那么我们可以通过timestamps都就可以得到一个total order，但是在系统间维持时钟同步是非常难的，我们只能做到一定的范围内的同步。

目前忽略时钟之间的不同步问题做出来的系统有：

Facebook's Cassandra：通过时间戳来解决冲突
Google's Spanner：时间戳+偏差范围来定义顺序

Time with a "Local-clock" assumption

它假设了一个偏序关系

events on each system are ordered but events cannot be ordered across systems by only using a clock.

同一个机器上我们可以通过时间戳来排序，但是不同机器上的时间戳不能比较

Time with a "No-clock" assumption

不在使用时间戳，而是使用counter，通过传递消息来交换counter，从而定义不同机器之间的事件的前后顺序，比较有名的论文就是：time, clocks and the ordering of events。

How is time used in a distributed system?

时间的好处是：

Time can deﬁne order across a system (without communication)
Time can deﬁne boundary conditions for algorithms

在分布式系统中，定义事件的顺序非常重要，因为：

where correctness depends on (agreement on) correct event ordering, for example serializability in a distributed database【正确性依赖于事件的顺序】
order can be used as a tie breaker when resource contention occurs, for example if there are two orders for a widget, fulﬁll the ﬁrst and cancel the second one【当发生资源争用的时候可以用来做裁决】

当我们有全局时钟的时候，我们不需要通信就能够确定顺序，不幸的是，我们一般都没有全局时钟，因此只能通过通信来确定时序。

时间除了用来确定时序外，还能定义算法的边界，特别是可以区分 high latency 和 server or network link is down，而用来探测它俩区别的算法就是：failure detectors。

Failure detectors (time for cutoﬀ)

在分布式环境中，我们怎么知道一个节点是否宕机了呢？我们可以通过等待一定时间后认为节点失败了。

那问题是这个一定时间是多久呢？

这依赖于节点之间的延迟，算法一般不会直接设置一个精确的值，而是会对“一定时间”这个概念做个很好的抽象。

A failure detector is a way to abstract away the exact timing assumptions. Failure detectors are implemented using heartbeat messages and timers. Processes exchange heartbeat messages. If a message response is not received before the timeout occurs, then the process suspects the other process.

而failure detector是一个解决方案，通过心挑信息来探测存活性。

网友评论

本文标题：袖珍分布式系统（三）

本文链接：https://www.haomeiwen.com/subject/loxzbttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！