美文网首页
记kube-apiserver访问etcd问题分析

记kube-apiserver访问etcd问题分析

作者: 无聊的上帝op | 来源:发表于2023-06-14 15:27 被阅读0次

问题背景

收到反馈kube-apiserver出现warn日志

日志内容

kube-apiserver[795001]: {"level":"warn","ts":"2023-06-13T15:26:52.464+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b4f6c0/#initially=[https://10.11.33.153:2379;https://10.11.33.154:2379;https://10.11.33.152:2379]","attempt":0,"error":"rpc error: code = OutOfRange desc = etcdserver: mvcc: required revision has been compacted"}

日志分析

这个错误信息来自于 etcd 客户端,具体错误是 "etcdserver: mvcc: required revision has been compacted"。这表示你试图访问的键值对的修订版本已经被压缩了。在 etcd 中,历史键值对的版本被保存一段时间,然后会被压缩以释放存储空间。当你试图访问一个已经被压缩的版本时,就会收到这个错误。

解决这个问题的方法取决于你的具体需求。如果你需要访问历史版本,你可能需要调整你的压缩策略,使得需要的版本不被压缩。如果你只是想访问最新版本,那么你可能需要更新你的请求,确保你不是在请求一个已经被压缩的版本。

当你尝试访问一个已经被压缩的修订版本的键值对时,你将不能访问到该数据。Etcd 的压缩操作会删除旧的数据版本,仅保留最新的数据版本。这意味着,如果你尝试访问一个已经被压缩(也就是删除)的数据版本,将会收到一个错误,正如你在日志中看到的那样。

在这种情况下,你需要重新获取该键的当前值,或者使用一个新的修订版本号来访问数据。如果你的应用依赖于旧的数据版本,你可能需要重新考虑你的数据策略,以便在数据被压缩后仍能正确运行。

问题分析

先说结论: k8s功能无影响,良性日志.

源码

https://github.com/etcd-io/etcd/blob/main/client/v3/retry_interceptor.go

// Copyright 2016 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// Based on github.com/grpc-ecosystem/go-grpc-middleware/retry, but modified to support the more
// fine grained error checking required by write-at-most-once retry semantics of etcd.

package clientv3

import (
    "context"
    "errors"
    "io"
    "sync"
    "time"

    "go.uber.org/zap"
    "google.golang.org/grpc"
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/metadata"
    "google.golang.org/grpc/status"

    "go.etcd.io/etcd/api/v3/v3rpc/rpctypes"
)

// unaryClientInterceptor returns a new retrying unary client interceptor.
//
// The default configuration of the interceptor is to not retry *at all*. This behaviour can be
// changed through options (e.g. WithMax) on creation of the interceptor or on call (through grpc.CallOptions).
func (c *Client) unaryClientInterceptor(optFuncs ...retryOption) grpc.UnaryClientInterceptor {
    intOpts := reuseOrNewWithCallOptions(defaultOptions, optFuncs)
    return func(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
        ctx = withVersion(ctx)
        grpcOpts, retryOpts := filterCallOptions(opts)
        callOpts := reuseOrNewWithCallOptions(intOpts, retryOpts)
        // short circuit for simplicity, and avoiding allocations.
        if callOpts.max == 0 {
            return invoker(ctx, method, req, reply, cc, grpcOpts...)
        }
        var lastErr error
        for attempt := uint(0); attempt < callOpts.max; attempt++ {
            if err := waitRetryBackoff(ctx, attempt, callOpts); err != nil {
                return err
            }
            c.GetLogger().Debug(
                "retrying of unary invoker",
                zap.String("target", cc.Target()),
                zap.String("method", method),
                zap.Uint("attempt", attempt),
            )
            lastErr = invoker(ctx, method, req, reply, cc, grpcOpts...)
            if lastErr == nil {
                return nil
            }
            c.GetLogger().Warn(
                "retrying of unary invoker failed",
                zap.String("target", cc.Target()),
                zap.String("method", method),
                zap.Uint("attempt", attempt),
                zap.Error(lastErr),
            )
            if isContextError(lastErr) {
                if ctx.Err() != nil {
                    // its the context deadline or cancellation.
                    return lastErr
                }
                // its the callCtx deadline or cancellation, in which case try again.
                continue
            }
            if c.shouldRefreshToken(lastErr, callOpts) {
                gtErr := c.refreshToken(ctx)
                if gtErr != nil {
                    c.GetLogger().Warn(
                        "retrying of unary invoker failed to fetch new auth token",
                        zap.String("target", cc.Target()),
                        zap.Error(gtErr),
                    )
                    return gtErr // lastErr must be invalid auth token
                }
                continue
            }
            if !isSafeRetry(c, lastErr, callOpts) {
                return lastErr
            }
        }
        return lastErr
    }
}

// streamClientInterceptor returns a new retrying stream client interceptor for server side streaming calls.
//
// The default configuration of the interceptor is to not retry *at all*. This behaviour can be
// changed through options (e.g. WithMax) on creation of the interceptor or on call (through grpc.CallOptions).
//
// Retry logic is available *only for ServerStreams*, i.e. 1:n streams, as the internal logic needs
// to buffer the messages sent by the client. If retry is enabled on any other streams (ClientStreams,
// BidiStreams), the retry interceptor will fail the call.
func (c *Client) streamClientInterceptor(optFuncs ...retryOption) grpc.StreamClientInterceptor {
    intOpts := reuseOrNewWithCallOptions(defaultOptions, optFuncs)
    return func(ctx context.Context, desc *grpc.StreamDesc, cc *grpc.ClientConn, method string, streamer grpc.Streamer, opts ...grpc.CallOption) (grpc.ClientStream, error) {
        ctx = withVersion(ctx)
        // getToken automatically. Otherwise, auth token may be invalid after watch reconnection because the token has expired
        // (see https://github.com/etcd-io/etcd/issues/11954 for more).
        err := c.getToken(ctx)
        if err != nil {
            c.GetLogger().Error("clientv3/retry_interceptor: getToken failed", zap.Error(err))
            return nil, err
        }
        grpcOpts, retryOpts := filterCallOptions(opts)
        callOpts := reuseOrNewWithCallOptions(intOpts, retryOpts)
        // short circuit for simplicity, and avoiding allocations.
        if callOpts.max == 0 {
            return streamer(ctx, desc, cc, method, grpcOpts...)
        }
        if desc.ClientStreams {
            return nil, status.Errorf(codes.Unimplemented, "clientv3/retry_interceptor: cannot retry on ClientStreams, set Disable()")
        }
        newStreamer, err := streamer(ctx, desc, cc, method, grpcOpts...)
        if err != nil {
            c.GetLogger().Error("streamer failed to create ClientStream", zap.Error(err))
            return nil, err // TODO(mwitkow): Maybe dial and transport errors should be retriable?
        }
        retryingStreamer := &serverStreamingRetryingStream{
            client:       c,
            ClientStream: newStreamer,
            callOpts:     callOpts,
            ctx:          ctx,
            streamerCall: func(ctx context.Context) (grpc.ClientStream, error) {
                return streamer(ctx, desc, cc, method, grpcOpts...)
            },
        }
        return retryingStreamer, nil
    }
}

// shouldRefreshToken checks whether there's a need to refresh the token based on the error and callOptions,
// and returns a boolean value.
func (c *Client) shouldRefreshToken(err error, callOpts *options) bool {
    if rpctypes.Error(err) == rpctypes.ErrUserEmpty {
        // refresh the token when username, password is present but the server returns ErrUserEmpty
        // which is possible when the client token is cleared somehow
        return c.authTokenBundle != nil // equal to c.Username != "" && c.Password != ""
    }

    return callOpts.retryAuth &&
        (rpctypes.Error(err) == rpctypes.ErrInvalidAuthToken || rpctypes.Error(err) == rpctypes.ErrAuthOldRevision)
}

func (c *Client) refreshToken(ctx context.Context) error {
    if c.authTokenBundle == nil {
        // c.authTokenBundle will be initialized only when
        // c.Username != "" && c.Password != "".
        //
        // When users use the TLS CommonName based authentication, the
        // authTokenBundle is always nil. But it's possible for the clients
        // to get `rpctypes.ErrAuthOldRevision` response when the clients
        // concurrently modify auth data (e.g, addUser, deleteUser etc.).
        // In this case, there is no need to refresh the token; instead the
        // clients just need to retry the operations (e.g. Put, Delete etc).
        return nil
    }

    return c.getToken(ctx)
}

// type serverStreamingRetryingStream is the implementation of grpc.ClientStream that acts as a
// proxy to the underlying call. If any of the RecvMsg() calls fail, it will try to reestablish
// a new ClientStream according to the retry policy.
type serverStreamingRetryingStream struct {
    grpc.ClientStream
    client        *Client
    bufferedSends []interface{} // single message that the client can sen
    receivedGood  bool          // indicates whether any prior receives were successful
    wasClosedSend bool          // indicates that CloseSend was closed
    ctx           context.Context
    callOpts      *options
    streamerCall  func(ctx context.Context) (grpc.ClientStream, error)
    mu            sync.RWMutex
}

func (s *serverStreamingRetryingStream) setStream(clientStream grpc.ClientStream) {
    s.mu.Lock()
    s.ClientStream = clientStream
    s.mu.Unlock()
}

func (s *serverStreamingRetryingStream) getStream() grpc.ClientStream {
    s.mu.RLock()
    defer s.mu.RUnlock()
    return s.ClientStream
}

func (s *serverStreamingRetryingStream) SendMsg(m interface{}) error {
    s.mu.Lock()
    s.bufferedSends = append(s.bufferedSends, m)
    s.mu.Unlock()
    return s.getStream().SendMsg(m)
}

func (s *serverStreamingRetryingStream) CloseSend() error {
    s.mu.Lock()
    s.wasClosedSend = true
    s.mu.Unlock()
    return s.getStream().CloseSend()
}

func (s *serverStreamingRetryingStream) Header() (metadata.MD, error) {
    return s.getStream().Header()
}

func (s *serverStreamingRetryingStream) Trailer() metadata.MD {
    return s.getStream().Trailer()
}

func (s *serverStreamingRetryingStream) RecvMsg(m interface{}) error {
    attemptRetry, lastErr := s.receiveMsgAndIndicateRetry(m)
    if !attemptRetry {
        return lastErr // success or hard failure
    }

    // We start off from attempt 1, because zeroth was already made on normal SendMsg().
    for attempt := uint(1); attempt < s.callOpts.max; attempt++ {
        if err := waitRetryBackoff(s.ctx, attempt, s.callOpts); err != nil {
            return err
        }
        newStream, err := s.reestablishStreamAndResendBuffer(s.ctx)
        if err != nil {
            s.client.lg.Error("failed reestablishStreamAndResendBuffer", zap.Error(err))
            return err // TODO(mwitkow): Maybe dial and transport errors should be retriable?
        }
        s.setStream(newStream)

        s.client.lg.Warn("retrying RecvMsg", zap.Error(lastErr))
        attemptRetry, lastErr = s.receiveMsgAndIndicateRetry(m)
        if !attemptRetry {
            return lastErr
        }
    }
    return lastErr
}

func (s *serverStreamingRetryingStream) receiveMsgAndIndicateRetry(m interface{}) (bool, error) {
    s.mu.RLock()
    wasGood := s.receivedGood
    s.mu.RUnlock()
    err := s.getStream().RecvMsg(m)
    if err == nil || err == io.EOF {
        s.mu.Lock()
        s.receivedGood = true
        s.mu.Unlock()
        return false, err
    } else if wasGood {
        // previous RecvMsg in the stream succeeded, no retry logic should interfere
        return false, err
    }
    if isContextError(err) {
        if s.ctx.Err() != nil {
            return false, err
        }
        // its the callCtx deadline or cancellation, in which case try again.
        return true, err
    }
    if s.client.shouldRefreshToken(err, s.callOpts) {
        gtErr := s.client.refreshToken(s.ctx)
        if gtErr != nil {
            s.client.lg.Warn("retry failed to fetch new auth token", zap.Error(gtErr))
            return false, err // return the original error for simplicity
        }
        return true, err

    }
    return isSafeRetry(s.client, err, s.callOpts), err
}

func (s *serverStreamingRetryingStream) reestablishStreamAndResendBuffer(callCtx context.Context) (grpc.ClientStream, error) {
    s.mu.RLock()
    bufferedSends := s.bufferedSends
    s.mu.RUnlock()
    newStream, err := s.streamerCall(callCtx)
    if err != nil {
        return nil, err
    }
    for _, msg := range bufferedSends {
        if err := newStream.SendMsg(msg); err != nil {
            return nil, err
        }
    }
    if err := newStream.CloseSend(); err != nil {
        return nil, err
    }
    return newStream, nil
}

func waitRetryBackoff(ctx context.Context, attempt uint, callOpts *options) error {
    waitTime := time.Duration(0)
    if attempt > 0 {
        waitTime = callOpts.backoffFunc(attempt)
    }
    if waitTime > 0 {
        timer := time.NewTimer(waitTime)
        select {
        case <-ctx.Done():
            timer.Stop()
            return contextErrToGrpcErr(ctx.Err())
        case <-timer.C:
        }
    }
    return nil
}

// isSafeRetry returns "true", if request is safe for retry with the given error.
func isSafeRetry(c *Client, err error, callOpts *options) bool {
    if isContextError(err) {
        return false
    }

    // Situation when learner refuses RPC it is supposed to not serve is from the server
    // perspective not retryable.
    // But for backward-compatibility reasons we need  to support situation that
    // customer provides mix of learners (not yet voters) and voters with an
    // expectation to pick voter in the next attempt.
    // TODO: Ideally client should be 'aware' which endpoint represents: leader/voter/learner with high probability.
    if errors.Is(err, rpctypes.ErrGRPCNotSupportedForLearner) && len(c.Endpoints()) > 1 {
        return true
    }

    switch callOpts.retryPolicy {
    case repeatable:
        return isSafeRetryImmutableRPC(err)
    case nonRepeatable:
        return isSafeRetryMutableRPC(err)
    default:
        c.lg.Warn("unrecognized retry policy", zap.String("retryPolicy", callOpts.retryPolicy.String()))
        return false
    }
}

func isContextError(err error) bool {
    return status.Code(err) == codes.DeadlineExceeded || status.Code(err) == codes.Canceled
}

func contextErrToGrpcErr(err error) error {
    switch err {
    case context.DeadlineExceeded:
        return status.Errorf(codes.DeadlineExceeded, err.Error())
    case context.Canceled:
        return status.Errorf(codes.Canceled, err.Error())
    default:
        return status.Errorf(codes.Unknown, err.Error())
    }
}

var (
    defaultOptions = &options{
        retryPolicy: nonRepeatable,
        max:         0, // disable
        backoffFunc: backoffLinearWithJitter(50*time.Millisecond /*jitter*/, 0.10),
        retryAuth:   true,
    }
)

// backoffFunc denotes a family of functions that control the backoff duration between call retries.
//
// They are called with an identifier of the attempt, and should return a time the system client should
// hold off for. If the time returned is longer than the `context.Context.Deadline` of the request
// the deadline of the request takes precedence and the wait will be interrupted before proceeding
// with the next iteration.
type backoffFunc func(attempt uint) time.Duration

// withRetryPolicy sets the retry policy of this call.
func withRetryPolicy(rp retryPolicy) retryOption {
    return retryOption{applyFunc: func(o *options) {
        o.retryPolicy = rp
    }}
}

// withMax sets the maximum number of retries on this call, or this interceptor.
func withMax(maxRetries uint) retryOption {
    return retryOption{applyFunc: func(o *options) {
        o.max = maxRetries
    }}
}

// WithBackoff sets the `BackoffFunc` used to control time between retries.
func withBackoff(bf backoffFunc) retryOption {
    return retryOption{applyFunc: func(o *options) {
        o.backoffFunc = bf
    }}
}

type options struct {
    retryPolicy retryPolicy
    max         uint
    backoffFunc backoffFunc
    retryAuth   bool
}

// retryOption is a grpc.CallOption that is local to clientv3's retry interceptor.
type retryOption struct {
    grpc.EmptyCallOption // make sure we implement private after() and before() fields so we don't panic.
    applyFunc            func(opt *options)
}

func reuseOrNewWithCallOptions(opt *options, retryOptions []retryOption) *options {
    if len(retryOptions) == 0 {
        return opt
    }
    optCopy := &options{}
    *optCopy = *opt
    for _, f := range retryOptions {
        f.applyFunc(optCopy)
    }
    return optCopy
}

func filterCallOptions(callOptions []grpc.CallOption) (grpcOptions []grpc.CallOption, retryOptions []retryOption) {
    for _, opt := range callOptions {
        if co, ok := opt.(retryOption); ok {
            retryOptions = append(retryOptions, co)
        } else {
            grpcOptions = append(grpcOptions, opt)
        }
    }
    return grpcOptions, retryOptions
}

// BackoffLinearWithJitter waits a set period of time, allowing for jitter (fractional adjustment).
//
// For example waitBetween=1s and jitter=0.10 can generate waits between 900ms and 1100ms.
func backoffLinearWithJitter(waitBetween time.Duration, jitterFraction float64) backoffFunc {
    return func(attempt uint) time.Duration {
        return jitterUp(waitBetween, jitterFraction)
    }
}

源码解析

所提供的Go代码是etcd客户端库(clientv3包)的一部分。它定义了客户端如何使用gRPC与etcd服务器交互,特别关注unary和流客户端交互的重试逻辑。

unaryClientInterceptor:这个函数创建一个gRPC unary 拦截器。unary 交互包括单个请求和单个响应。拦截器拦截对 unary RPC方法的调用,并提供自定义行为。在这种情况下,行为是重试失败请求的最大尝试次数。该代码处理各种错误场景,例如上下文错误和令牌刷新。

streamClientInterceptor:这个函数为服务器端流调用创建一个gRPC流拦截器。服务器端流包括一个请求,然后是多个响应。拦截器拦截对流RPC方法的调用,并提供自定义行为。与unary拦截器一样,此拦截器提供重试逻辑,但它也处理客户端发送的缓冲消息,因为在重试事件中可能需要重新发送这些消息。

shouldRefreshToken:该函数根据返回的错误和调用选项检查是否需要令牌刷新。它检查指示需要令牌刷新的特定错误类型。

refreshToken:如果authTokenBundle不是nil,这个函数通过调用getToken函数来刷新令牌。

serverStreamingRetryingStream:这是一个结构体,包装gRPC客户端流以添加重试行为。它维护一个已发送消息的缓冲区和一个标志,以指示是否有任何接收成功。它还提供了模拟gRPC ClientStream接口的方法,在适当的地方添加了重试逻辑。

isSafeRetry:该函数根据返回的错误和调用选项确定失败的请求是否可以安全重试。它处理各种错误场景,包括上下文错误和指示重试不安全的特定错误类型。

代码还包括几个支持重试逻辑的实用程序函数和方法,如waitRetryBackoff、receiveMsgAndIndicateRetry和reestablishStreamAndResendBuffer。

查看数据版本号

在 etcd 中,每一个键值对都有一个与之关联的修订版本号,也叫做 revision。这个修订版本号是一个全局的、单调递增的整数,它反映了键值对的修改历史。每当对键值对进行修改操作(包括添加、更新和删除)时,etcd 都会为其分配一个新的修订版本号。这个版本号可以帮助你追踪键值对的变化,例如,你可以使用它来查看键值对的历史版本。

在 etcd 的 Go 客户端中,你可以通过以下方式获取一个键值对的修订版本号

cli, err := clientv3.New(clientv3.Config{
    Endpoints:   []string{"localhost:2379"},
    DialTimeout: 5 * time.Second,
})
if err != nil {
    log.Fatal(err)
}
defer cli.Close()

resp, err := cli.Get(context.Background(), "my-key")
if err != nil {
    log.Fatal(err)
}

for _, kv := range resp.Kvs {
    fmt.Printf("key:%s value:%s revision:%d\n", kv.Key, kv.Value, kv.ModRevision)
}

相关文章

网友评论

      本文标题:记kube-apiserver访问etcd问题分析

      本文链接:https://www.haomeiwen.com/subject/ztfkydtx.html