美文网首页Rust
[Rust] lifetime annotation

[Rust] lifetime annotation

作者: 何幻 | 来源:发表于2020-07-31 18:59 被阅读0次

    1. 背景

    关于 Rust lifetime 的几个 “官方” 资料,更有助于理解。

    2. 资料

    (1)Learning Rust: The Tough Part - Lifetimes

    注:这个资料不建议阅读,好多写法比喻性较强,容易引起误解。

    (2)Rust By Example: Lifetimes

    注:该资料引入了 borrow checker 的概念

    A lifetime is a construct the compiler (or more specifically, its borrow checker) uses to ensure all borrows are valid(lifetime 编译器为了验证 borrow 有效性的机制). Specifically, a variable's lifetime begins when it is created and ends when it is destroyed. While lifetimes and scopes are often referred to together, they are not the same.

    Take, for example, the case where we borrow a variable via &. The borrow(borrow 必是一个引用) has a lifetime that is determined by where it is declared. As a result, the borrow is valid as long as it ends before the lender is destroyed. (只要不再使用该引用,就会被销毁,并不一定超出作用域)However, the scope of the borrow is determined by where the reference is used.

    (3)rust-lang / rfcs

    注:该资料区分了 scope 和 lifetime,并提到 CFG

    Extend Rust's borrow system to support non-lexical lifetimes -- these are lifetimes that are based on the control-flow graph根据控制流图分析引用的有效范围), rather than lexical scopes.

    The basic idea of the borrow checker is that values may not be mutated or moved while they are borrowed, but how do we know whether a value is borrowed? The idea is quite simple: whenever you create a borrow, the compiler assigns the resulting reference a lifetime. This lifetime corresponds to the span of the code where the reference may be used(lifetime 是指引用的有效范围). The compiler will infer this lifetime to be the smallest lifetime(一旦不再引用,lifetime 立即终止) that it can have that still encompasses all the uses of the reference.

    Note that Rust uses the term lifetime in a very particular way. In everyday speech, the word lifetime can be used in two distinct -- but similar -- ways:

    • The lifetime of a reference, corresponding to the span of time in which that reference is used.

    • The lifetime of a value, corresponding to the span of time before that value gets freed (or, put another way, before the destructor for the value runs).

    This second span of time, which describes how long a value is valid, is very important. To distinguish the two, we refer to that second span of time as the value's scope. Naturally, lifetimes and scopes are linked to one another. Specifically, if you make a reference to a value, the lifetime of that reference cannot outlive the scope of that value.(引用的 lifetime 不会比它引用 value 的作用域更长) Otherwise, your reference would be pointing into freed memory.

    (4)The Rust Programming Language:Validating References with Lifetimes

    注:该资料澄清了很多概念,值得仔细阅读

    Every reference in Rust has a lifetime, which is the scope for which that reference is valid(lifetime 机制是为了检测引用的有效性). Most of the time, lifetimes are implicit and inferred, just like most of the time, types are inferred. We must annotate types when multiple types are possible. In a similar way, we must annotate lifetimes when the lifetimes of references could be related in a few different ways. Rust requires us to annotate the relationships using generic lifetime parameters to ensure the actual references used at runtime will definitely be valid.(lifetime annotation 的目的是为了消除歧义

    fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
        if x.len() > y.len() {
            x
        } else {
            y
        }
    }
    

    The function signature now tells Rust that for some lifetime 'a, the function takes two parameters, both of which are string slices that live at least as long as lifetime 'a. The function signature also tells Rust that the string slice returned from the function will live at least as long as lifetime 'a. In practice, it means that the lifetime of the reference returned by the longest function is the same as the smaller(交集、或更小的那个) of the lifetimes of the references passed in. These constraints are what we want Rust to enforce. Remember, when we specify the lifetime parameters in this function signature, we’re not changing the lifetimes of any values passed in or returned. Rather, we’re specifying that the borrow checker should reject any values that don’t adhere to these constraints(添加 lifetime annotation 只是增加约束条件). Note that the longest function doesn’t need to know exactly how long x and y will live, only that some scope can be substituted for 'a that will satisfy this signature.

    When annotating lifetimes in functions, the annotations go in the function signature, not in the function body. Rust can analyze the code within the function without any help. However, when a function has references to or from code outside that function, it becomes almost impossible for Rust to figure out the lifetimes of the parameters or return values on its own. The lifetimes might be different each time the function is called. This is why we need to annotate the lifetimes manually.(函数每次调用 lifetime annotation 实例化为不同的值

    注:以上两段写的非常好,直击 lifetime annotation 本质

    When we pass concrete references to longest, the concrete lifetime that is substituted for 'a is the part of the scope of x that overlaps with the scope of ylongest 被调用时,'a 被实例化为 xy lifetime 的交集). In other words, the generic lifetime 'a will get the concrete lifetime that is equal to the smaller交集、或更小的那个) of the lifetimes of x and y. Because we’ve annotated the returned reference with the same lifetime parameter 'a, the returned reference will also be valid for the length of the smaller of the lifetimes of x and y只要 xy 有一个失效,返回值就失效).

    注:以上是 lifetime 的实例化约定

    You’ve learned that every reference has a lifetime and that you need to specify lifetime parameters for functions or structs that use references. However, in Chapter 4 we had a function in Listing 4-9, which is shown again in Listing 10-26, that compiled without lifetime annotations.

    fn first_word(s: &str) -> &str {
        let bytes = s.as_bytes();
    
        for (i, &item) in bytes.iter().enumerate() {
            if item == b' ' {
                return &s[0..i];
            }
        }
    
        &s[..]
    }
    

    注:下面介绍了一些重要历史

    The reason this function compiles without lifetime annotations is historical: in early versions (pre-1.0) of Rust, this code wouldn’t have compiled because every reference needed an explicit lifetime.(早期版本的 Rust 每个引用都需要标记 lifetime annotation) At that time, the function signature would have been written like this:

    fn first_word<'a>(s: &'a str) -> &'a str {
    

    After writing a lot of Rust code, the Rust team found that Rust programmers were entering the same lifetime annotations over and over in particular situations. These situations were predictable and followed a few deterministic patterns. The developers programmed these patterns into the compiler’s code so the borrow checker could infer the lifetimes in these situations and wouldn’t need explicit annotations.(编译器团队将 lifetime annotation 常见模式内置到了 Rust 语言中

    This piece of Rust history is relevant because it’s possible that more deterministic patterns will emerge and be added to the compiler. In the future, even fewer lifetime annotations might be required.(未来可能会有更多的场景不需要写 lifetime annotation 了

    The patterns programmed into Rust’s analysis of references are called the lifetime elision rules. These aren’t rules for programmers to follow; they’re a set of particular cases that the compiler will consider, and if your code fits these cases, you don’t need to write the lifetimes explicitly.

    The elision rules don’t provide full inference. If Rust deterministically applies the rules but there is still ambiguity as to what lifetimes the references have, the compiler won’t guess what the lifetime of the remaining references should be. In this case, instead of guessing, the compiler will give you an error that you can resolve by adding the lifetime annotations that specify how the references relate to each other.(当 lifetime 出现歧义时,仍需要手动标记 lifetime annotation

    注:下面几段很重要,介绍了编译器自动推断 lifetime annotation 的过程

    Lifetimes on function or method parameters are called input lifetimes, and lifetimes on return values are called output lifetimes.

    The compiler uses three rules to figure out what lifetimes references have when there aren’t explicit annotations.(编译器使用了三条规则推断 lifetime annotationThe first rule applies to input lifetimes, and the second and third rules apply to output lifetimes. If the compiler gets to the end of the three rules and there are still references for which it can’t figure out lifetimes, the compiler will stop with an error.(如果使用了这三条规则后,仍无法推断出完整的 lifetime annotation,编译器就会报错) These rules apply to fn definitions as well as impl blocks.

    第一条规则:为每个入参分配一个不同的 lifetime annotation

    • The first rule is that each parameter that is a reference gets its own lifetime parameter. In other words, a function with one parameter gets one lifetime parameter: fn foo<'a>(x: &'a i32); a function with two parameters gets two separate lifetime parameters: fn foo<'a, 'b>(x: &'a i32, y: &'b i32); and so on.

    第二条规则:如果只有一个入参,则将该入参的 lifetime annotation 设置为所有出参的 lifetime

    • The second rule is if there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters: fn foo<'a>(x: &'a i32) -> &'a i32.

    第三条规则:如果有多个入参,且其中一个是 &self&mut self,则把这个入参的 lifetime annotation 设置为所有出参的 lifetime

    • The third rule is if there are multiple input lifetime parameters, but one of them is &self or &mut self because this is a method, the lifetime of self is assigned to all output lifetime parameters. This third rule makes methods much nicer to read and write because fewer symbols are necessary.

    注:如果以上三条规则使用后,都无法为所有出入参设置 lifetime,就报错。

    Let’s pretend we’re the compiler. We’ll apply these rules to figure out what the lifetimes of the references in the signature of the first_word function in Listing 10-26 are. The signature starts without any lifetimes associated with the references:

    fn first_word(s: &str) -> &str {
    

    Then the compiler applies the first rule, which specifies that each parameter gets its own lifetime. We’ll call it 'a as usual, so now the signature is this:

    fn first_word<'a>(s: &'a str) -> &str {
    

    The second rule applies because there is exactly one input lifetime. The second rule specifies that the lifetime of the one input parameter gets assigned to the output lifetime, so the signature is now this:

    fn first_word<'a>(s: &'a str) -> &'a str {
    

    Now all the references in this function signature have lifetimes, and the compiler can continue its analysis without needing the programmer to annotate the lifetimes in this function signature.

    Let’s look at another example, this time using the longest function that had no lifetime parameters when we started working with it in Listing 10-21:

    fn longest(x: &str, y: &str) -> &str {
    

    Let’s apply the first rule: each parameter gets its own lifetime. This time we have two parameters instead of one, so we have two lifetimes:

    fn longest<'a, 'b>(x: &'a str, y: &'b str) -> &str {
    

    You can see that the second rule doesn’t apply because there is more than one input lifetime. The third rule doesn’t apply either, because longest is a function rather than a method, so none of the parameters are self. After working through all three rules, we still haven’t figured out what the return type’s lifetime is.(三条规则用完之后,仍然无法为返回值指定 lifetime) This is why we got an error trying to compile the code in Listing 10-21: the compiler worked through the lifetime elision rules but still couldn’t figure out all the lifetimes of the references in the signature.

    3. 总结

    (1)关于 reference 的有效性

    • lifetime 是编译器用于检测 reference 有效性的机制,称为 borrow check。
    • 编译器使用 CFG(control-flow graph)静态分析一个 reference 的有效范围。
    • reference 在最后一次使用后失效,这时它所引用 value 的 owner 甚至还未超出其作用域范围,即 value 还未被释放。

    (2)lifetime annotation 推断

    • 编译器使用了三条规则,为函数的出入参自动添加 lifetime annotation。
    • 如果三条规则应用过后仍然无法推断出返回值的 lifetime,就会报错。

    (3)lifetime annotation 的实例化方式

    • 显式的 lifetime annotation 只是一个额外标记,用于给编译器消除歧义。
    • 函数的每次调用,其 lifetime annotation 被 “实例化” 为不同的值。
    • lifetime annotation 总是被 “实例化” 为当前调用参数中 lifetime 最短的那个。

    4. 源码

    (1)borrow check

    github: rust v1.45.1 src/librustc_mir/borrow_check

    src/librustc_mir/borrow_check
    ├── borrow_set.rs
    ├── constraint_generation.rs
    ├── constraints
    │   ├── graph.rs
    │   └── mod.rs
    ├── def_use.rs
    ├── diagnostics
    │   ├── conflict_errors.rs
    │   ├── explain_borrow.rs
    │   ├── find_use.rs
    │   ├── mod.rs
    │   ├── move_errors.rs
    │   ├── mutability_errors.rs
    │   ├── outlives_suggestion.rs
    │   ├── region_errors.rs
    │   ├── region_name.rs
    │   └── var_name.rs
    ├── facts.rs
    ├── invalidation.rs
    ├── location.rs
    ├── member_constraints.rs
    ├── mod.rs
    ├── nll.rs
    ├── path_utils.rs
    ├── place_ext.rs
    ├── places_conflict.rs
    ├── prefixes.rs
    ├── region_infer
    │   ├── dump_mir.rs
    │   ├── graphviz.rs
    │   ├── mod.rs
    │   ├── opaque_types.rs
    │   ├── reverse_sccs.rs
    │   └── values.rs
    ├── renumber.rs
    ├── type_check
    │   ├── constraint_conversion.rs
    │   ├── free_region_relations.rs
    │   ├── input_output.rs
    │   ├── liveness
    │   │   ├── local_use_map.rs
    │   │   ├── mod.rs
    │   │   ├── polonius.rs
    │   │   └── trace.rs
    │   ├── mod.rs
    │   └── relate_tys.rs
    ├── universal_regions.rs
    └── used_muts.rs
    

    (2)使用位置

    github: rust v1.45.1 src/librustc_interface/passes.rs

    sess.time("MIR_borrow_checking", || {
        tcx.par_body_owners(|def_id| tcx.ensure().mir_borrowck(def_id));
    });
    

    参考

    Rust By Example: Lifetimes
    rust-lang / rfcs
    The Rust Programming Language: Validating References with Lifetimes
    github: rust v1.45.1 src/librustc_mir/borrow_check

    相关文章

      网友评论

        本文标题:[Rust] lifetime annotation

        本文链接:https://www.haomeiwen.com/subject/rawjrktx.html