END-TO-END ARGUMENTS IN SYSTEM D

作者: 神奇的考拉 | 来源:发表于2019-09-16 11:20 被阅读0次

This paper presents a design principle that helps guide placement of functions among themodules of a distributed computer system. The principle, called the end-to-end argument,suggests that functions placed at low levels of a system may be redundant or of littlevalue when compared with the cost of providing them at that low level. Examplesdiscussed in the paper include bit error recovery, security using encryption, duplicatemessage suppression, recovery from system crashes, and delivery acknowledgement. Lowlevel mechanisms to support these functions are justified only as performanceenhancements.

Introduction

Choosing the proper boundaries between functions is perhaps the primary activity of thecomputer system designer. Design principles that provide guidance in this choice of functionplacement are among the most important tools of a system designer. This paper discusses oneclass of function placement argument that has been used for many years with neither explicitrecognition nor much conviction. However, the emergence of the data communication network asa computer system component has sharpened this line of function placement argument by makingmore apparent the situations in which and reasons why it applies. This paper articulates theargument explicitly, so as to examine its nature and to see how general it really is. The argumentappeals to application requirements, and provides a rationale for moving function upward in alayered system, closer to the application that uses the function. We begin by considering thecommunication network version of the argument.In a system that includes communications, one usually draws a modular boundary around thecommunication subsystem and defines a firm interface between it and the rest of the system.When doing so, it becomes apparent that there is a list of functions each of which might beimplemented in any of several ways: by the communication subsystem, by its client, as a joint venture, or perhaps redundantly, each doing its own version. In reasoning about this choice, therequirements of the application provide the basis for a class of arguments, which go as follows:

The function in question can completely and correctly be implemented only with theknowledge and help of the application standing at the end points of the communicationsystem. Therefore, providing that questioned function as a feature of the communicationsystem itself is not possible. (Sometimes an incomplete version of the function providedby the communication system may be useful as a performance enhancement.)

We call this line of reasoning against low-level function implementation the "end-to-endargument." The following sections examine the end-to-end argument in detail, first with a casestudy of a typical example in which it is used – the function in question is reliable datatransmission – and then by exhibiting the range of functions to which the same argument can beapplied. For the case of the data communication system, this range includes encryption, duplicatemessage detection, message sequencing, guaranteed message delivery, detecting host crashes,and delivery receipts. In a broader context the argument seems to apply to many other functionsof a computer operating system, including its file system. Examination of this broader contextwill be easier if we first consider the more specific data communication context, however.

End-to-end caretaking

Consider the problem of "careful file transfer." A file is stored by a file system, in the diskstorage of computer A. Computer A is linked by a data communication network with computerB, which also has a file system and a disk store. The object is to move the file from computer A'sstorage to computer B's storage without damage, in the face of knowledge that failures can occurat various points along the way. The application program in this case is the file transfer program,part of which runs at host A and part at host B. In order to discuss the possible threats to the file'sintegrity in this transaction, let us assume that the following specific steps are involved:

At host A the file transfer program calls upon the file system to read the file from the disk,where it resides on several tracks, and the file system passes it to the file transfer program infixed-size blocks chosen to be disk-format independent.
Also at host A the file transfer program asks the data communication system to transmit thefile using some communication protocol that involves splitting the data into packets. Thepacket size is typically different from the file block size and the disk track size.
The data communication network moves the packets from computer A to computer B.
At host B a data communication program removes the packets from the data communicationprotocol and hands the contained data on to a second part of the file transfer application, thepart that operates within host B.
At host B, the file transfer program asks the file system to write the received data on the diskof host B.

With this model of the steps involved, the following are some of the threats to the transaction thata careful designer might be concerned about:

The file, though originally written correctly onto the disk at host A, if read now may containincorrect data, perhaps because of hardware faults in the disk storage system.
The software of the file system, the file transfer program, or the data communication systemmight make a mistake in buffering and copying the data of the file, either at host A or hostB.
The hardware processor or its local memory might have a transient error while doing thebuffering and copying, either at host A or host B.
The communication system might drop or change the bits in a packet, or lose a packet ordeliver a packet more than once.
Either of the hosts may crash part way through the transaction after performing an unknownamount (perhaps all) of the transaction.

How would a careful file transfer application then cope with this list of threats? One approachmight be to reinforce each of the steps along the way using duplicate copies, timeout and retry,carefully located redundancy for error detection, crash recovery, etc. The goal would be to reducethe probability of each of the individual threats to an acceptably small value. Unfortunately,systematic countering of threat two requires writing correct programs, which task is quitedifficult, and not all the programs that must be correct are written by the file transfer applicationprogrammer. If we assume further that all these threats are relatively low in probability – lowenough that the system allows useful work to be accomplished – brute force countermeasuressuch as doing everything three times appear uneconomical.

The alternate approach might be called "end-to-end check and retry". Suppose that as an aid tocoping with threat number one, stored with each file is a checksum that has sufficient redundancyto reduce the chance of an undetected error in the file to an acceptably negligible value. Theapplication program follows the simple steps above in transferring the file from A to B. Then, asa final additional step, the part of the file transfer application residing in host B reads thetransferred file copy back from its disk storage system into its own memory, recalculates thechecksum, and sends this value back to host A, where it is compared with the checksum of theoriginal. Only if the two checksums agree does the file transfer application declare the transactioncommitted. If the comparison fails, something went wrong, and a retry from the beginning mightbe attempted.

If failures really are fairly rare, this technique will normally work on the first try; occasionally asecond or even third try might be required; one would probably consider two or more failures onthe same file transfer attempt as indicating that some part of the system is in need of repair.

Now let us consider the usefulness of a common proposal, namely that the communicationsystem provide, internally, a guarantee of reliable data transmission. It might accomplish thisguarantee by providing selective redundancy in the form of packet checksums, sequence numberchecking, and internal retry mechanisms, for example. With sufficient care, the probability ofundetected bit errors can be reduced to any desirable level. The question is whether or not thisattempt to be helpful on the part of the communication system is useful to the careful file transferapplication.

The answer is that threat number four may have been eliminated, but the careful file transferapplication must still counter the remaining threats, so it should still provide its own retries basedon an end-to-end checksum of the file. And if it does so, the extra effort expended in thecommunication system to provide a guarantee of reliable data transmission is only reducing thefrequency of retries by the file transfer application; it has no effect on inevitability or correctnessof the outcome, since correct file transmission is assured by the end-to-end checksum and retrywhether or not the data transmission system is especially reliable.

Thus the argument: in order to achieve careful file transfer, the application program that performsthe transfer must supply a file-transfer-specific, end-to-end reliability guarantee – in this case, achecksum to detect failures and a retry/commit plan. For the data communication system to goout of its way to be extraordinarily reliable does not reduce the burden on the applicationprogram to ensure reliability.

A too-real example

An interesting example of the pitfalls that one can encounter turned up recently at M.I.T.: Onenetwork system involving several local networks connected by gateways used a packet checksumon each hop from one gateway to the next, on the assumption that the primary threat to correctcommunication was corruption of bits during transmission. Application programmers, aware of this checksum, assumed that the network was providing reliable transmission, without realizingthat the transmitted data was unprotected while stored in each gateway. One gateway computerdeveloped a transient error in which while copying data from an input to an output buffer a bytepair was interchanged, with a frequency of about one such interchange in every million bytespassed. Over a period of time many of the source files of an operating system were repeatedlytransferred through the defective gateway. Some of these source files were corrupted by byteexchanges, and their owners were forced to the ultimate end-to-end error check: manualcomparison with and correction from old listings.

Performance aspects

It would be too simplistic to conclude that the lower levels should play no part in obtainingreliability, however. Consider a network that is somewhat unreliable, dropping one message ofeach hundred messages sent. The simple strategy outlined above, transmitting the file and thenchecking to see that the file arrived correctly, would perform more poorly as the length of the fileincreases. The probability that all packets of a file arrive correctly decreases exponentially withthe file length, and thus the expected time to transmit the file grows exponentially with filelength. Clearly, some effort at the lower levels to improve network reliability can have asignificant effect on application performance. But the key idea here is that the lower levels neednot provide "perfect" reliability.

Thus the amount of effort to put into reliability measures within the data communication systemis seen to be an engineering tradeoff based on performance, rather than a requirement forcorrectness. Note that performance has several aspects here. If the communication system is toounreliable, the file transfer application performance will suffer because of frequent retriesfollowing failures of its end-to-end checksum. If the communication system is beefed up withinternal reliability measures, those measures have a performance cost, too, in the form ofbandwidth lost to redundant data and delay added by waiting for internal consistency checks tocomplete before delivering the data. There is little reason to push in this direction very far, whenit is considered that the end-to-end check of the file transfer application must still be implementedno matter how reliable the communication system becomes. The "proper" tradeoff requirescareful thought; for example one might start by designing the communication system to providejust the reliability that comes with little cost and engineering effort, and then evaluate the residualerror level to insure that it is consistent with an acceptable retry frequency at the file transferlevel. It is probably not important to strive for a negligible error rate at any point below theapplication level.

Using performance to justify placing functions in a low-level subsystem must be done carefully.Sometimes, by examining the problem thoroughly, the same or better performance enhancementcan be achieved at the high level. Performing a function at a low level may be more efficient, ifthe function can be performed with a minimum perturbation of the machinery already included inthe low-level subsystem, but just the opposite situation can occur – that is, performing thefunction at the lower level may cost more – for two reasons. First, since the lower levelsubsystem is common to many applications, those applications that do not need the function willpay for it anyway. Second, the low-level subsystem may not have as much information as thehigher levels, so it cannot do the job as efficiently.

Frequently, the performance tradeoff is quite complex. Consider again the careful file transfer onan unreliable network. The usual technique for increasing packet reliability is some sort of per-packet error check with a retry protocol. This mechanism can be implemented either in thecommunication subsystem or in the careful file transfer application. For example, the receiver inthe careful file transfer can periodically compute the checksum of the portion of the file thus farreceived and transmit this back to the sender. The sender can then restart by retransmitting anyportion that arrived in error.

The end-to-end argument does not tell us where to put the early checks, since either layer can dothis performance-enhancement job. Placing the early retry protocol in the file transfer applicationsimplifies the communication system, but may increase overall cost, since the communicationsystem is shared by other applications and each application must now provide its own reliabilityenhancement. Placing the early retry protocol in the communication system may be moreefficient, since it may be performed inside the network on a hop-by-hop basis, reducing the delayinvolved in correcting a failure. At the same time, there may be some application that finds thecost of the enhancement is not worth the result but it now has no choice in the matter* . A greatdeal of information about system implementation is needed to make this choice intelligently.

Other examples of the end-to-end argument

Delivery guarantees

The basic argument that a lower-level subsystem that supports a distributed application may bewasting its effort providing a function that must by nature be implemented at the applicationlevel anyway can be applied to a variety of functions in addition to reliable data transmission.Perhaps the oldest and most widely known form of the argument concerns acknowledgement ofdelivery. A data communication network can easily return an acknowledgement to the sender forevery message delivered to a recipient. The ARPANET, for example, returns a packet known as"Request For Next Message" (RFNM)[1] whenever it delivers a message. Although thisacknowledgement may be useful within the network as a form of congestion control (originallythe ARPANET refused to accept another message to the same target until the previous RFNMhad returned) it was never found to be very helpful to applications using the ARPANET. Thereason is that knowing for sure that the message was delivered to the target host is not veryimportant. What the application wants to know is whether or not the target host acted on themessage; all manner of disaster might have struck after message delivery but before completionof the action requested by the message. The acknowledgement that is really desired is an end-to-end one, which can be originated only by the target application – "I did it", or "I didn't."

Another strategy for obtaining immediate acknowledgements is to make the target hostsophisticated enough that when it accepts delivery of a message it also accepts responsibility forguaranteeing that the message is acted upon by the target application. This approach caneliminate the need for an end-to-end acknowledgement in some, but not all applications. An end-to-end acknowledgement is still required for applications in which the action requested of thetarget host should be done only if similar actions requested of other hosts are successful. Thiskind of application requires a two-phase commit protocol[5,10,15], which is a sophisticated end-to-end acknowledgement. Also, if the target application may either fail or refuse to do therequested action, and thus a negative acknowledgement is a possible outcome, an end-to-endacknowledgement may still be a requirement.

Secure transmission of data

Another area in which an end-to-end argument can be applied is that of data encryption. Theargument here is threefold. First, if the data transmission system performs encryption anddecryption, it must be trusted to manage securely the required encryption keys. Second, the datawill be in the clear and thus vulnerable as it passes into the target node and is fanned out to thetarget application. Third, the authenticity of the message must still be checked by the application.If the application performs end-to-end encryption, it obtains its required authentication check, it can handle key management to its satisfaction, and the data is never exposed outside theapplication.Thus, to satisfy the requirements of the application, there is no need for the communicationsubsystem to provide for automatic encryption of all traffic. Automatic encryption of all trafficby the communication subsystem may be called for, however, to ensure something else – that amisbehaving user or application program does not deliberately transmit information that shouldnot be exposed. The automatic encryption of all data as it is put into the network is one morefirewall the system designer can use to ensure that information does not escape outside thesystem. Note however, that this is a different requirement from authenticating access rights of asystem user to specific parts of the data. This network-level encryption can be quiteunsophisticated – the same key can be used by all hosts, with frequent changes of the key. Noper-user keys complicate the key management problem. The use of encryption for application-level authentication and protection is complementary. Neither mechanism can satisfy bothrequirements completely.

Duplicate message suppression

A more sophisticated argument can be applied to duplicate message suppression. A property ofsome communication network designs is that a message or a part of a message may be deliveredtwice, typically as a result of time-out-triggered failure detection and retry mechanisms operatingwithin the network. The network can provide the function of watching for and suppressing anysuch duplicate messages, or it can simply deliver them. One might expect that an applicationwould find it very troublesome to cope with a network that may deliver the same message twice;indeed it is troublesome. Unfortunately, even if the network suppresses duplicates, theapplication itself may accidentally originate duplicate requests, in its own failure/retryprocedures. These application level duplications look like different messages to thecommunication system, so it cannot suppress them; suppression must be accomplished by theapplication itself with knowledge of how to detect its own duplicates.

A common example of duplicate suppression that must be handled at a high level is when aremote system user, puzzled by lack of response, initiates a new login to a time-sharing system.For another example, most communication applications involve a provision for coping with asystem crash at one end of a multi-site transaction: reestablish the transaction when the crashedsystem comes up again. Unfortunately, reliable detection of a system crash is problematical: theproblem may just be a lost or long-delayed acknowledgement. If so, the retried request is now aduplicate, which only the application can discover. Thus the end-to-end argument again: if theapplication level has to have a duplicate-suppressing mechanism anyway, that mechanism canalso suppress any duplicates generated inside the communication network, so the function can beomitted from that lower level. The same basic reasoning applies to completely omitted messagesas well as to duplicated ones.

Guaranteeing FIFO message delivery

Ensuring that messages arrive at the receiver in the same order they are sent is another functionusually assigned to the communication subsystem. The mechanism usually used to achieve suchfirst-in, first-out (FIFO) behavior guarantees FIFO ordering among messages sent on the samevirtual circuit. Messages sent along independent virtual circuits, or through intermediateprocesses outside the communication subsystem may arrive in an order different from the ordersent. A distributed application in which one node can originate requests that initiate actions atseveral sites cannot take advantage of the FIFO ordering property to guarantee that the actionsrequested occur in the correct order. Instead, an independent mechanism at a higher level than thecommunication subsystem must control the ordering of actions.

Transaction management

We have now applied the end-to-end argument in the construction of the SWALLOW distributeddata storage system[15], where it leads to significant reduction in overhead. SWALLOWprovides data storage servers called repositories that can be used remotely to store and retrievedata. Accessing data at a repository is done by sending it a message specifying the object to beaccessed, the version, and type of access (read/write), plus a value to be written if the access is awrite. The underlying message communication system does not suppress duplicate messages,since a) the object identifier plus the version information suffices to detect duplicate writes, andb) the effect of a duplicate read request message is only to generate a duplicate response, which iseasily discarded by the originator. Consequently, the low-level message communication protocolis significantly simplified.

The underlying message communication system does not provide delivery acknowledgementeither. The acknowledgement that the originator of a write request needs is that the data wasstored safely. This acknowledgement can be provided only by high levels of the SWALLOWsystem. For read requests, a delivery acknowledgement is redundant, since the responsecontaining the value read is sufficient acknowledgement. By eliminating deliveryacknowledgements, the number of messages transmitted is halved. This message reduction canhave a significant effect on both host load and network load, improving performance. This sameline of reasoning has also been used in development of an experimental protocol for remoteaccess to disk records[6]. The resulting reduction in path length in lower-level protocols wasimportant in maintaining good performance on remote disk access.

Identifying the ends

Using the end-to-end argument sometimes requires subtlety of analyis of applicationrequirements. For example, consider a computer communication network that carries somepacket voice connections, conversations between digital telephone instruments. For thoseconnections that carry voice packets, an unusually strong version of the end-to-end argumentapplies: if low levels of the communication system try to accomplish bit-perfect communication,they will probably introduce uncontrolled delays in packet delivery, for example, by requestingretransmission of damaged packets and holding up delivery of later packets until earlier oneshave been correctly retransmitted. Such delays are disruptive to the voice application, whichneeds to feed data at a constant rate to the listener. It is better to accept slightly damaged packetsas they are, or even to replace them with silence, a duplicate of the previous packet, or a noiseburst. The natural redundancy of voice, together with the high-level error correction procedure inwhich one participant says "excuse me, someone dropped a glass. Would you please say thatagain?" will handle such dropouts, if they are relatively infrequent.

However, this strong version of the end-to-end argument is a property of the specific application– two people in real-time conversation – rather than a property, say, of speech in general. If oneconsiders instead a speech message system, in which the voice packets are stored in a file forlater listening by the recipient, the arguments suddenly change their nature. Short delays indelivery of packets to the storage medium are not particularly disruptive so there is no longer anyobjection to low-level reliability measures that might introduce delay in order to achievereliability. More important, it is actually helpful to this application to get as much accuracy aspossible in the recorded message, since the recipient, at the time of listening to the recording, isnot going to be able to ask the sender to repeat a sentence. On the other hand, with a storagesystem acting as the receiving end of the voice communication, an end-to-end argument doesapply to packet ordering and duplicate suppression. Thus the end-to-end argument is not anabsolute rule, but rather a guideline that helps in application and protocol design analysis; onemust use some care to identify the end points to which the argument should be applied.

History, and application to other system areas

The individual examples of end-to-end arguments cited in this paper are not original; they haveaccumulated over the years. The first example of questionable intermediate deliveryacknowledgements noticed by the authors was the "wait" message of the M.I.T. CompatibleTime-Sharing System, which the system printed on the user's terminal whenever the user entereda command[3]. (The message had some value in the early days of the system, when crashes andcommunication failures were so frequent that intermediate acknowledgements provided someneeded reassurance that all was well.)

The end-to-end argument relating to encryption was first publicly discussed by Branstad in a1973 paper[2]; presumably the military security community held classified discussions beforethat time. Diffie and Hellman[4] and Kent[8] develop the arguments in more depth, andNeedham and Schroeder[11] devised improved protocols for the purpose.

The two-phase-commit data update protocols of Gray[5], Lampson and Sturgis[10] and Reed[13]all use a form of end-to-end argument to justify their existence; they are end-to-end protocols thatdo not depend for correctness on reliability, FIFO sequencing, or duplicate suppression withinthe communication system, since all of these problems may also be introduced by other systemcomponent failures as well. Reed makes this argument explicitly in the second chapter of hisPh.D. thesis on decentralized atomic actions[14].

End-to-end arguments are often applied to error control and correctness in application systems.For example, a banking system usually provides high-level auditing procedures as a matter ofpolicy and legal requirement. Those high-level auditing procedures will uncover not only high-level mistakes such as performing a withdrawal against the wrong account, it will also detectlow-level mistakes such as coordination errors in the underlying data management system.Therefore a costly algorithm that absolutely eliminates such coordination errors may be arguablyless appropriate than a less costly algorithm that just makes such errors very rare. In airlinereservation systems, an agent can be relied upon to keep trying, through system crashes anddelays, until a reservation is either confirmed or refused. Lower level recovery procedures toguarantee that an unconfirmed request for a reservation will survive a system crash are thus notvital. In telephone exchanges, a failure that could cause a single call to be lost is considered notworth providing explicit recovery for, since the caller will probably replace the call if itmatters[7]: All of these design approaches are examples of the end-to-end argument beingapplied to automatic recovery.

Much of the debate in the network protocol community over datagrams, virtual circuits, andconnectionless protocols is a debate about end-to-end arguments. A modularity argument prizes areliable, FIFO sequenced, duplicate-suppressed stream of data as a system component that is easyto build on, and that argument favors virtual circuits. The end-to-end argument claims thatcentrally-provided versions of each of those functions will be incomplete for some applications,and those applications will find it easier to build their own version of the functions starting with datagrams.

A version of the end-to-end argument in a non-communication application was developed in the1950's by system analysts whose responsibility included reading and writing files on largenumbers of magnetic tape reels. Repeated attempts to define and implement a "reliable tapesubsystem" repeatedly foundered, as flaky tape drives, undependable system operators, andsystem crashes conspired against all narrowly focused reliability measures. Eventually, it becamestandard practice for every application to provide its own application-dependent checks andrecovery strategy; and to assume that lower-level error detection mechanisms at best reduced thefrequency with which the higher-level checks failed. As an example, the Multics file backupsystem[17], even though it is built on a foundation of a magnetic tape subsystem format that provides very powerful error detection and correction features, provides its own error control inthe form of record labels and multiple copies of every file.

The arguments that are used in support of reduced instruction set computer (RISC) architectureare similar to end-to-end arguments. The RISC argument is that the client of the architecture willget better performance by implementing exactly the instructions needed from primitive tools; anyattempt by the computer designer to anticipate the client's requirements for an esoteric featurewill probably miss the target slightly and the client will end up reimplementing that featureanyway. (We are indebted to M. Satyanarayanan for pointing out this example.)

Lampson, in his arguments supporting the "open operating system,"[9] uses an argument similarto the end-to-end argument as a justification. Lampson argues against making any function apermanent fixture of lower-level modules; the function may be provided by a lower-level modulebut it should always be replaceable by an application's special version of the function. Thereasoning is that for any function you can think of, at least some applications will find that bynecessity they must implement the function themselves in order to meet correctly their ownrequirements. This line of reasoning leads Lampson to propose an "open" system in which theentire operating system consists of replaceable routines from a library. Such an approach has onlyrecently become feasible in the context of computers dedicated to a single application. It may bethe case that the large quantity of fixed supervisor function typical of large-scale operatingsystems is only an artifact of economic pressures that demanded multiplexing of expensivehardware and therefore a protected supervisor. Most recent system "kernelization" projects, infact, have focused at least in part on getting function out of low system levels[16,12]. Thoughthis function movement is inspired by a different kind of correctness argument, it has the sideeffect of producing an operating system that is more flexible for applications, which is exactlythe main thrust of the end-to-end argument.

Conclusions

End-to-end arguments are a kind of "Occam's razor" when it comes to choosing the functions tobe provided in a communication subsystem. Because the communication subsystem is frequentlyspecified before applications that use the subsystem are known, the designer may be tempted to"help" the users by taking on more function than necessary. Awareness of end-to-end argumentscan help to reduce such temptations.It is fashionable these days to talk about "layered" communication protocols, but without clearlydefined criteria for assigning functions to layers. Such layerings are desirable to enhancemodularity. End-to-end arguments may be viewed as part of a set of rational principles fororganizing such layered systems. We hope that our discussion will help to add substance toarguments about the "proper" layering.

Acknowledgements

Many people have read and commented on an earlier draft of this paper, including DavidCheriton, F.B. Schneider, and Liba Svobodova. The subject was also discussed at the ACMWorkshop in Fundamentals of Distributed Computing, in Fallbrook, California during December1980. Those comments and discussions were quite helpful in clarifying the arguments.

END-TO-END ARGUMENTS IN SYSTEM D

Introduction

End-to-end caretaking

A too-real example

Performance aspects

Other examples of the end-to-end argument

Delivery guarantees

Secure transmission of data

Duplicate message suppression

Guaranteeing FIFO message delivery

Transaction management

Identifying the ends

History, and application to other system areas

Conclusions

Acknowledgements

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Flink实践